alphabase.spectral_library.base¶
Classes:
|
Base spectral library in alphabase and alphapeptdeep. |
Functions:
|
Reannotate an SpecLibBase library with fragments from a different SpecLibBase. |
|
Get a list of column names that exist in the given dataframe. |
- class alphabase.spectral_library.base.SpecLibBase(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min: float = 400, precursor_mz_max: float = 6000, decoy: str = None)[source][source]¶
Bases:
objectBase spectral library in alphabase and alphapeptdeep.
- charged_frag_types¶
same as charged_frag_types in Parameters in
__init__().- Type:
list
- min_precursor_mz¶
same as precursor_mz_min in Parameters in
__init__().- Type:
float
- max_precursor_mz¶
same as precursor_mz_max in Parameters in
__init__().- Type:
float
- decoy¶
same as decoy in Parameters in
__init__().- Type:
str
Methods:
__init__([charged_frag_types, ...])annotate_fragments_from_speclib(donor_speclib)Annotate self.precursor_df with fragments from donor_speclib.
append(other[, dfs_to_append, remove_unused_dfs])Append another SpecLibBase object to the current one in place.
Append decoy sequence into precursor_df.
Return the available dense fragment dataframes By dynamically checking the attributes of the object.
Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_
Count the number of non-zero fragments for each Creates the column 'n_fragments' in self._precursor_df.
TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.
calc_precursor_isotope([max_isotope, ...])calc_precursor_isotope_info([...])Append isotope columns into self.precursor_df.
Calculate and append the isotope intensity columns into self.precursor_df.
Calculate precursor mz for self._precursor_df
Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz
copy()Return a copy of the spectral library object.
filter_fragment_number([...])Filter the top k fragments for each precursor based on a global setting and a precursor wise column.
Insert hash codes for peptides and precursors
load_df_from_hdf(hdf_file, df_name)Load specific dataset (dataframe) from hdf_file.
load_hdf(hdf_file[, load_mod_seq, ...])Load the hdf library from hdf_file
Sort nAA and reset_index for faster calculation (or prediction)
Remove unused fragments from all available fragment dataframes.
save_df_to_hdf(hdf_file, df_key, df[, ...])Save a new HDF group or dataset into existing HDF file
save_hdf(hdf_file[, save_mod_seq_in_other_df])Save library dataframes into hdf_file.
Attributes:
The fragment intensity dataframe with fragment types as columns (['b_z1', 'y_z2', ...])
The fragment mz dataframe with fragment types as columns (['b_z1', 'y_z2', ...])
Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.
Peptide dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to
precursor_df.Precursor dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to
peptide_df.- __init__(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min: float = 400, precursor_mz_max: float = 6000, decoy: str = None)[source][source]¶
- Parameters:
charged_frag_types (List[str], optional) – fragment types with charge. Defaults to [ ‘b_z1’,’b_z2’,’y_z1’, ‘y_z2’ ].
precursor_mz_min (float, optional) – Use this to clip precursor df. Defaults to 400.
precursor_mz_max (float, optional) – Use this to clip precursor df. Defaults to 6000.
decoy (str, optional) – Decoy methods, could be “pseudo_reverse” or “diann”. Defaults to None.
- annotate_fragments_from_speclib(donor_speclib, verbose=True)[source][source]¶
Annotate self.precursor_df with fragments from donor_speclib. The donor_speclib must have a fragment_mz_df and can optionally have a fragment_intensity_df. Fragment dataframes are updated inplace and overwritten.
- Parameters:
donor_speclib (SpecLibBase) – The donor library to annotate fragments from.
verbose (bool, optional) –
Print progress, by default True, for example:
2022-12-16 00:52:08> Speclib with 4 precursors will be reannotated with speclib with 12 precursors and 504 fragments 2022-12-16 00:52:08> A total of 4 precursors were succesfully annotated, 0 precursors were not matched
- append(other: SpecLibBase, dfs_to_append: List[str] = ['_precursor_df', '_fragment_df', '_fragment_intensity_df', '_fragment_mz_df', '_fragment_intensity_predicted_df'], remove_unused_dfs: bool = True)[source][source]¶
Append another SpecLibBase object to the current one in place. All matching dataframes in the second object will be appended to the current one. Dataframes missing in the current object will be ignored. All matching columns in the second object will be appended to the current one. Columns missing in the current object will be ignored. Dataframes and columns missing in the second object will raise an error.
- Parameters:
other (SpecLibBase) – Second SpecLibBase object to be appended.
dfs_to_append (list, optional) – List of dataframes to be appended. Defaults to [‘_precursor_df’,’_fragment_intensity_df’, ‘_fragment_mz_df’,’_fragment_intensity_predicted_df’].
remove_unused_dfs (bool, optional) – Remove dataframes from the current library that are not used in the append, this is crucial when using the remove unused fragments function after appending a library, inorder to have all fragment dataframes of the same size. When set to false the unused dataframes will be kept.
- Return type:
None
- append_decoy_sequence()[source][source]¶
Append decoy sequence into precursor_df. Decoy method is based on self.decoy(str).
` >>> decoy_lib = (decoy_lib_provider.get_decoy_lib( self.decoy, self)) >>> decoy_lib.decoy_sequence() >>> decoy_lib.append_to_target_lib() ... `
- available_dense_fragment_dfs() list[source][source]¶
Return the available dense fragment dataframes By dynamically checking the attributes of the object. a fragment dataframe is matched with the pattern ‘_fragment_[attribute_name]_df’
- Returns:
List of available fragment dataframes
- Return type:
list
- calc_and_clip_precursor_mz()[source][source]¶
Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_
- calc_fragment_count()[source][source]¶
Count the number of non-zero fragments for each Creates the column ‘n_fragments’ in self._precursor_df.
- calc_fragment_mz_df()[source][source]¶
TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.
- calc_precursor_isotope(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]¶
- calc_precursor_isotope_info(mp_process_num: int = 8, mp_process_bar=None, mp_batch_size=10000)[source][source]¶
Append isotope columns into self.precursor_df. See alphabase.peptide.calc_precursor_isotope for details.
- calc_precursor_isotope_intensity(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]¶
Calculate and append the isotope intensity columns into self.precursor_df. See alphabase.peptide.calc_precursor_isotope_intensity for details.
- Parameters:
max_isotope (int, optional) – The maximum isotope to calculate.
min_right_most_intensity (float, optional) – The minimum intensity of the right most isotope.
mp_batch_size (int, optional) – The batch size for multiprocessing.
mp_processes (int, optional) – The number of processes for multiprocessing.
- clip_by_precursor_mz_()[source][source]¶
Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz
- copy()[source][source]¶
Return a copy of the spectral library object.
- Returns:
A copy of the spectral library object.
- Return type:
- filter_fragment_number(n_fragments_allowed_column_name='n_fragments_allowed', n_allowed=999)[source][source]¶
Filter the top k fragments for each precursor based on a global setting and a precursor wise column. The smaller one will be used. Can be used to make sure that target and decoy have the same number of fragments.
- Parameters:
n_fragments_allowed_column_name (str, optional, default 'n_fragments_allowed') – The column name in self._precursor_df that contains the number of fragments allowed for each
n_allowed (int, optional, default 999) – The global setting for the number of fragments allowed for each
- property fragment_intensity_df: DataFrame¶
The fragment intensity dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])
- property fragment_mz_df: DataFrame¶
The fragment mz dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])
- key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'mono_isotope_idx', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'i_0', 'i_1', 'i_2', 'i_3', 'i_4', 'i_5', 'i_6', 'i_7', 'i_8', 'i_9']¶
Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.
- Type:
list of str
- load_df_from_hdf(hdf_file: str, df_name: str) DataFrame[source][source]¶
Load specific dataset (dataframe) from hdf_file.
- Parameters:
hdf_file (str) – The hdf file name
df_name (str) – The dataset/dataframe name in the hdf file
- Returns:
Loaded dataframe
- Return type:
pd.DataFrame
- load_hdf(hdf_file: str, load_mod_seq: bool = True, support_legacy_mods_format: bool = True, infer_charged_frag_types: bool = True)[source][source]¶
Load the hdf library from hdf_file
- Parameters:
hdf_file (str) – hdf library path to load
load_mod_seq (bool, optional) – By default, mod_seq_df is not used in the
save_hdf(), so this param is not used. However, for performance reason, users can save the susbset of non key numeric columns in mod_seq_df. For fast loading, set load_mod_seq to False to skip loading mod_seq_df. Defaults to True.support_legacy_mods_format (bool, optional) – If True, whitespaces in modifications will be replaced by underscores to match the internal data format. Defaults to True. DeprecationWarning: future versions will have a different default and eventually this flag will be dropped.
infer_charged_frag_types (bool, optional) – if True, infer the charged fragment types as defined in the hdf file, defaults to True. This is the default as users most likely don’t know the charged fragment types in the hdf file. If set to False, only charged frag types defined in charged_frag_types will be loaded.
- property peptide_df: DataFrame¶
Peptide dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to
precursor_df.
- property precursor_df: DataFrame¶
Precursor dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to
peptide_df.
- remove_unused_fragments()[source][source]¶
Remove unused fragments from all available fragment dataframes. Fragment dataframes are updated inplace and overwritten.
- save_df_to_hdf(hdf_file: str, df_key: str, df: DataFrame, delete_existing=False)[source][source]¶
Save a new HDF group or dataset into existing HDF file
- save_hdf(hdf_file: str, save_mod_seq_in_other_df: bool = False)[source][source]¶
Save library dataframes into hdf_file.
- Parameters:
hdf_file (str) – The hdf file path to save
save_mod_seq_in_other_df (bool) –
- If True: save self.precursor_df into two hdf groups in hdf_file,
library/precursor_df and library/mod_seq_df.
library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory:
’precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, [‘isotope_mz_m1’, ‘isotope_intensity_m1’], …
library/mod_seq_df contains all string columns and the other not essential columns:
’sequence’
’mods’
’mod_sites’
’proteins’, ‘genes’, …: optional columns
’mod_seq_hash’: one-to-one map back to precursor_df
’mod_seq_charge_hash’: one-to-one map back to precursor_df
- If False:
All columns of self.precursor_df will be saved into library/precursor_df.
Defaults to False.
- alphabase.spectral_library.base.annotate_fragments_from_speclib(speclib: SpecLibBase, fragment_speclib: SpecLibBase, verbose=True) SpecLibBase[source][source]¶
Reannotate an SpecLibBase library with fragments from a different SpecLibBase.
- Parameters:
speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the precursors to be annotated. All fragments mz and fragment intensities will be removed.
fragment_speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the donor precursors whose fragments should be used.
- Returns:
newly annotated spectral library
- Return type:
alphabase.spectral_library.library_base.SpecLibBase
- alphabase.spectral_library.base.get_available_columns(df, columns)[source][source]¶
Get a list of column names that exist in the given dataframe.
- Parameters:
df (pd.DataFrame) – The dataframe to check columns against
columns (list) – List of column names to check
- Returns:
List of column names that exist in the dataframe
- Return type:
list
Examples
>>> df = pd.DataFrame({'a': [1], 'b': [2]}) >>> get_available_columns(df, ['a', 'b', 'c']) ['a', 'b']