alphabase.spectral_library.base#

Classes:

SpecLibBase([charged_frag_types, ...])

Base spectral library in alphabase and alphapeptdeep.

Functions:

annotate_fragments_from_speclib(speclib, ...)

Reannotate an SpecLibBase library with fragments from a different SpecLibBase.

class alphabase.spectral_library.base.SpecLibBase(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min=400, precursor_mz_max=6000, decoy: str = None)[source][source]#

Bases: object

Base spectral library in alphabase and alphapeptdeep.

charged_frag_types#

same as charged_frag_types in Parameters in __init__().

Type:: list

min_precursor_mz#

same as precursor_mz_min in Parameters in __init__().

Type:: float

max_precursor_mz#

same as precursor_mz_max in Parameters in __init__().

Type:: float

decoy#

same as decoy in Parameters in __init__().

Type:: str

Methods:

`__init__`([charged_frag_types, ...])	param charged_frag_types: fragment types with charge.
`annotate_fragments_from_speclib`(donor_speclib)	Annotate self.precursor_df with fragments from donor_speclib.
`append`(other[, dfs_to_append, remove_unused_dfs])	Append another SpecLibBase object to the current one in place.
`append_decoy_sequence`()	Append decoy sequence into precursor_df.
`available_dense_fragment_dfs`()	Return the available dense fragment dataframes By dynamically checking the attributes of the object.
`calc_and_clip_precursor_mz`()	Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_
`calc_fragment_count`()	Count the number of non-zero fragments for each precursor.
`calc_fragment_mz_df`()	TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.
`calc_precursor_isotope`([max_isotope, ...])
`calc_precursor_isotope_info`([...])	Append isotope columns into self.precursor_df.
`calc_precursor_isotope_intensity`([...])	Calculate and append the isotope intensity columns into self.precursor_df.
`calc_precursor_mz`()	Calculate precursor mz for self._precursor_df
`clip_by_precursor_mz_`()	Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz
`copy`()	Return a copy of the spectral library object.
`filter_fragment_number`([...])	Filter the top k fragments for each precursor based on a global setting and a precursor wise column.
`hash_precursor_df`()	Insert hash codes for peptides and precursors
`load_df_from_hdf`(hdf_file, df_name)	Load specific dataset (dataframe) from hdf_file.
`load_hdf`(hdf_file[, load_mod_seq])	Load the hdf library from hdf_file
`refine_df`()	Sort nAA and reset_index for faster calculation (or prediction)
`remove_unused_fragments`()	Remove unused fragments from all available fragment dataframes.
`save_df_to_hdf`(hdf_file, df_key, df[, ...])	Save a new HDF group or dataset into existing HDF file
`save_hdf`(hdf_file)	Save library dataframes into hdf_file.

Attributes:

`fragment_intensity_df`	The fragment intensity dataframe with fragment types as columns (['b_z1', 'y_z2', ...])
`fragment_mz_df`	The fragment mz dataframe with fragment types as columns (['b_z1', 'y_z2', ...])
`key_numeric_columns`	Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.
`peptide_df`	Peptide dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to `precursor_df`.
`precursor_df`	Precursor dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to `peptide_df`.

__init__(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min=400, precursor_mz_max=6000, decoy: str = None)[source][source]#

Parameters:

charged_frag_types (List[str], optional) – fragment types with charge. Defaults to [ ‘b_z1’,’b_z2’,’y_z1’, ‘y_z2’ ].
precursor_mz_min (int, optional) – Use this to clip precursor df. Defaults to 400.
precursor_mz_max (int, optional) – Use this to clip precursor df. Defaults to 6000.
decoy (str, optional) – Decoy methods, could be “pseudo_reverse” or “diann”. Defaults to None.

annotate_fragments_from_speclib(donor_speclib, verbose=True)[source][source]#

Annotate self.precursor_df with fragments from donor_speclib. The donor_speclib must have a fragment_mz_df and can optionally have a fragment_intensity_df. Fragment dataframes are updated inplace and overwritten.

Parameters:

donor_speclib (SpecLibBase) – The donor library to annotate fragments from.

verbose (bool, optional) –

Print progress, by default True, for example:

2022-12-16 00:52:08> Speclib with 4 precursors will be reannotated with speclib with 12 precursors and 504 fragments
2022-12-16 00:52:08> A total of 4 precursors were succesfully annotated, 0 precursors were not matched

append(other: SpecLibBase, dfs_to_append: List[str] = ['_precursor_df', '_fragment_intensity_df', '_fragment_mz_df', '_fragment_intensity_predicted_df'], remove_unused_dfs: bool = True)[source][source]#

Append another SpecLibBase object to the current one in place. All matching dataframes in the second object will be appended to the current one. Dataframes missing in the current object will be ignored. All matching columns in the second object will be appended to the current one. Columns missing in the current object will be ignored. Dataframes and columns missing in the second object will raise an error.

Parameters:

other (SpecLibBase) – Second SpecLibBase object to be appended.
dfs_to_append (list, optional) – List of dataframes to be appended. Defaults to [‘_precursor_df’,’_fragment_intensity_df’, ‘_fragment_mz_df’,’_fragment_intensity_predicted_df’].
remove_unused_dfs (bool, optional) – Remove dataframes from the current library that are not used in the append, this is crucial when using the remove unused fragments function after appending a library, inorder to have all fragment dataframes of the same size. When set to false the unused dataframes will be kept.

Return type:

None

append_decoy_sequence()[source][source]#: Append decoy sequence into precursor_df. Decoy method is based on self.decoy(str). ` >>> decoy_lib = (decoy_lib_provider.get_decoy_lib( self.decoy, self)) >>> decoy_lib.decoy_sequence() >>> decoy_lib.append_to_target_lib() ... `

available_dense_fragment_dfs() → list[source][source]#

Return the available dense fragment dataframes By dynamically checking the attributes of the object. a fragment dataframe is matched with the pattern ‘_fragment_[attribute_name]_df’

Returns:: List of available fragment dataframes
Return type:: list

calc_and_clip_precursor_mz()[source][source]#: Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_

calc_fragment_count()[source][source]#: Count the number of non-zero fragments for each precursor. Creates the column ‘n_fragments’ in self._precursor_df.

calc_fragment_mz_df()[source][source]#: TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.

calc_precursor_isotope(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]#

calc_precursor_isotope_info(mp_process_num: int = 8, mp_process_bar=None, mp_batch_size=10000)[source][source]#: Append isotope columns into self.precursor_df. See alphabase.peptide.precursor.calc_precursor_isotope for details.

calc_precursor_isotope_intensity(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]#

Calculate and append the isotope intensity columns into self.precursor_df. See alphabase.peptide.precursor.calc_precursor_isotope_intensity for details.

Parameters:

max_isotope (int, optional) – The maximum isotope to calculate.
min_right_most_intensity (float, optional) – The minimum intensity of the right most isotope.
mp_batch_size (int, optional) – The batch size for multiprocessing.
mp_processes (int, optional) – The number of processes for multiprocessing.

calc_precursor_mz()[source][source]#: Calculate precursor mz for self._precursor_df

clip_by_precursor_mz_()[source][source]#: Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz

copy()[source][source]#

Return a copy of the spectral library object.

Returns:: A copy of the spectral library object.
Return type:: SpecLibBase

filter_fragment_number(n_fragments_allowed_column_name='n_fragments_allowed', n_allowed=999)[source][source]#

Filter the top k fragments for each precursor based on a global setting and a precursor wise column. The smaller one will be used. Can be used to make sure that target and decoy have the same number of fragments.

Parameters:

n_fragments_allowed_column_name (str, optional, default 'n_fragments_allowed') – The column name in self._precursor_df that contains the number of fragments allowed for each precursor.
n_allowed (int, optional, default 999) – The global setting for the number of fragments allowed for each precursor.

property fragment_intensity_df: DataFrame#: The fragment intensity dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])

property fragment_mz_df: DataFrame#: The fragment mz dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])

hash_precursor_df()[source][source]#: Insert hash codes for peptides and precursors

key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel']#

Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.

Type:: list of str

load_df_from_hdf(hdf_file: str, df_name: str) → DataFrame[source][source]#

Load specific dataset (dataframe) from hdf_file.

Parameters:

hdf_file (str) – The hdf file name
df_name (str) – The dataset/dataframe name in the hdf file

Returns:

Loaded dataframe

Return type:

pd.DataFrame

load_hdf(hdf_file: str, load_mod_seq: bool = False)[source][source]#

Load the hdf library from hdf_file

Parameters:

hdf_file (str) – hdf library path to load
load_mod_seq (bool, optional) – if also load mod_seq_df. Defaults to False.

property peptide_df: DataFrame#: Peptide dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to precursor_df.

property precursor_df: DataFrame#: Precursor dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to peptide_df.

refine_df()[source][source]#: Sort nAA and reset_index for faster calculation (or prediction)

remove_unused_fragments()[source][source]#: Remove unused fragments from all available fragment dataframes. Fragment dataframes are updated inplace and overwritten.

save_df_to_hdf(hdf_file: str, df_key: str, df: DataFrame, delete_existing=False)[source][source]#: Save a new HDF group or dataset into existing HDF file

save_hdf(hdf_file: str)[source][source]#

Save library dataframes into hdf_file. For self.precursor_df, this method will save it into two hdf groups in hdf_file: library/precursor_df and library/mod_seq_df.

library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory:

‘precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, [‘isotope_mz_m1’, ‘isotope_intensity_m1’], …

library/mod_seq_df contains all string columns and the other not essential columns: ‘sequence’,’mods’,’mod_sites’, [‘proteins’, ‘genes’]… as well as ‘mod_seq_hash’, ‘mod_seq_charge_hash’ columns to map back to precursor_df

Parameters:: hdf_file (str) – the hdf file path to save

alphabase.spectral_library.base.annotate_fragments_from_speclib(speclib: SpecLibBase, fragment_speclib: SpecLibBase, verbose=True) → SpecLibBase[source][source]#

Reannotate an SpecLibBase library with fragments from a different SpecLibBase.

Parameters:

speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the precursors to be annotated. All fragments mz and fragment intensities will be removed.
fragment_speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the donor precursors whose fragments should be used.

Returns:

newly annotated spectral library

Return type:

alphabase.spectral_library.library_base.SpecLibBase