alphabase.spectral_library.base

Classes:

SpecLibBase([charged_frag_types, ...])

Base spectral library in alphabase and alphapeptdeep.

Functions:

annotate_fragments_from_speclib(speclib, ...)

Reannotate an SpecLibBase library with fragments from a different SpecLibBase.

get_available_columns(df, columns)

Get a list of column names that exist in the given dataframe.

class alphabase.spectral_library.base.SpecLibBase(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min: float = 400, precursor_mz_max: float = 6000, decoy: str = None)[source][source]

Bases: object

Base spectral library in alphabase and alphapeptdeep.

charged_frag_types

same as charged_frag_types in Parameters in __init__().

Type:

list

min_precursor_mz

same as precursor_mz_min in Parameters in __init__().

Type:

float

max_precursor_mz

same as precursor_mz_max in Parameters in __init__().

Type:

float

decoy

same as decoy in Parameters in __init__().

Type:

str

Methods:

__init__([charged_frag_types, ...])

annotate_fragments_from_speclib(donor_speclib)

Annotate self.precursor_df with fragments from donor_speclib.

append(other[, dfs_to_append, remove_unused_dfs])

Append another SpecLibBase object to the current one in place.

append_decoy_sequence()

Append decoy sequence into precursor_df.

available_dense_fragment_dfs()

Return the available dense fragment dataframes By dynamically checking the attributes of the object.

calc_and_clip_precursor_mz()

Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_

calc_fragment_count()

Count the number of non-zero fragments for each Creates the column 'n_fragments' in self._precursor_df.

calc_fragment_mz_df()

TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.

calc_precursor_isotope([max_isotope, ...])

calc_precursor_isotope_info([...])

Append isotope columns into self.precursor_df.

calc_precursor_isotope_intensity([...])

Calculate and append the isotope intensity columns into self.precursor_df.

calc_precursor_mz()

Calculate precursor mz for self._precursor_df

clip_by_precursor_mz_()

Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz

copy()

Return a copy of the spectral library object.

filter_fragment_number([...])

Filter the top k fragments for each precursor based on a global setting and a precursor wise column.

hash_precursor_df()

Insert hash codes for peptides and precursors

load_df_from_hdf(hdf_file, df_name)

Load specific dataset (dataframe) from hdf_file.

load_hdf(hdf_file[, load_mod_seq, ...])

Load the hdf library from hdf_file

refine_df()

Sort nAA and reset_index for faster calculation (or prediction)

remove_unused_fragments()

Remove unused fragments from all available fragment dataframes.

save_df_to_hdf(hdf_file, df_key, df[, ...])

Save a new HDF group or dataset into existing HDF file

save_hdf(hdf_file[, save_mod_seq_in_other_df])

Save library dataframes into hdf_file.

Attributes:

fragment_intensity_df

The fragment intensity dataframe with fragment types as columns (['b_z1', 'y_z2', ...])

fragment_mz_df

The fragment mz dataframe with fragment types as columns (['b_z1', 'y_z2', ...])

key_numeric_columns

Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.

peptide_df

Peptide dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to precursor_df.

precursor_df

Precursor dataframe with columns 'sequence', 'mods', 'mod_sites', 'charge', etc, identical to peptide_df.

__init__(charged_frag_types: List[str] = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], precursor_mz_min: float = 400, precursor_mz_max: float = 6000, decoy: str = None)[source][source]
Parameters:
  • charged_frag_types (List[str], optional) – fragment types with charge. Defaults to [ ‘b_z1’,’b_z2’,’y_z1’, ‘y_z2’ ].

  • precursor_mz_min (float, optional) – Use this to clip precursor df. Defaults to 400.

  • precursor_mz_max (float, optional) – Use this to clip precursor df. Defaults to 6000.

  • decoy (str, optional) – Decoy methods, could be “pseudo_reverse” or “diann”. Defaults to None.

annotate_fragments_from_speclib(donor_speclib, verbose=True)[source][source]

Annotate self.precursor_df with fragments from donor_speclib. The donor_speclib must have a fragment_mz_df and can optionally have a fragment_intensity_df. Fragment dataframes are updated inplace and overwritten.

Parameters:
  • donor_speclib (SpecLibBase) – The donor library to annotate fragments from.

  • verbose (bool, optional) –

    Print progress, by default True, for example:

    2022-12-16 00:52:08> Speclib with 4 precursors will be reannotated with speclib with 12 precursors and 504 fragments
    2022-12-16 00:52:08> A total of 4 precursors were succesfully annotated, 0 precursors were not matched
    

append(other: SpecLibBase, dfs_to_append: List[str] = ['_precursor_df', '_fragment_df', '_fragment_intensity_df', '_fragment_mz_df', '_fragment_intensity_predicted_df'], remove_unused_dfs: bool = True)[source][source]

Append another SpecLibBase object to the current one in place. All matching dataframes in the second object will be appended to the current one. Dataframes missing in the current object will be ignored. All matching columns in the second object will be appended to the current one. Columns missing in the current object will be ignored. Dataframes and columns missing in the second object will raise an error.

Parameters:
  • other (SpecLibBase) – Second SpecLibBase object to be appended.

  • dfs_to_append (list, optional) – List of dataframes to be appended. Defaults to [‘_precursor_df’,’_fragment_intensity_df’, ‘_fragment_mz_df’,’_fragment_intensity_predicted_df’].

  • remove_unused_dfs (bool, optional) – Remove dataframes from the current library that are not used in the append, this is crucial when using the remove unused fragments function after appending a library, inorder to have all fragment dataframes of the same size. When set to false the unused dataframes will be kept.

Return type:

None

append_decoy_sequence()[source][source]

Append decoy sequence into precursor_df. Decoy method is based on self.decoy(str). ` >>> decoy_lib = (decoy_lib_provider.get_decoy_lib( self.decoy, self)) >>> decoy_lib.decoy_sequence() >>> decoy_lib.append_to_target_lib() ... `

available_dense_fragment_dfs() list[source][source]

Return the available dense fragment dataframes By dynamically checking the attributes of the object. a fragment dataframe is matched with the pattern ‘_fragment_[attribute_name]_df’

Returns:

List of available fragment dataframes

Return type:

list

calc_and_clip_precursor_mz()[source][source]

Calculate precursor mz for self._precursor_df, and clip the self._precursor_df using self.clip_by_precursor_mz_

calc_fragment_count()[source][source]

Count the number of non-zero fragments for each Creates the column ‘n_fragments’ in self._precursor_df.

calc_fragment_mz_df()[source][source]

TODO: use multiprocessing here or in the create_fragment_mz_dataframe function.

calc_precursor_isotope(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]
calc_precursor_isotope_info(mp_process_num: int = 8, mp_process_bar=None, mp_batch_size=10000)[source][source]

Append isotope columns into self.precursor_df. See alphabase.peptide.calc_precursor_isotope for details.

calc_precursor_isotope_intensity(max_isotope=6, min_right_most_intensity=0.001, mp_batch_size=10000, mp_process_num=8, normalize: Literal['mono', 'sum'] = 'sum')[source][source]

Calculate and append the isotope intensity columns into self.precursor_df. See alphabase.peptide.calc_precursor_isotope_intensity for details.

Parameters:
  • max_isotope (int, optional) – The maximum isotope to calculate.

  • min_right_most_intensity (float, optional) – The minimum intensity of the right most isotope.

  • mp_batch_size (int, optional) – The batch size for multiprocessing.

  • mp_processes (int, optional) – The number of processes for multiprocessing.

calc_precursor_mz()[source][source]

Calculate precursor mz for self._precursor_df

clip_by_precursor_mz_()[source][source]

Clip self._precursor_df inplace by self.min_precursor_mz and self.max_precursor_mz

copy()[source][source]

Return a copy of the spectral library object.

Returns:

A copy of the spectral library object.

Return type:

SpecLibBase

filter_fragment_number(n_fragments_allowed_column_name='n_fragments_allowed', n_allowed=999)[source][source]

Filter the top k fragments for each precursor based on a global setting and a precursor wise column. The smaller one will be used. Can be used to make sure that target and decoy have the same number of fragments.

Parameters:
  • n_fragments_allowed_column_name (str, optional, default 'n_fragments_allowed') – The column name in self._precursor_df that contains the number of fragments allowed for each

  • n_allowed (int, optional, default 999) – The global setting for the number of fragments allowed for each

property fragment_intensity_df: DataFrame

The fragment intensity dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])

property fragment_mz_df: DataFrame

The fragment mz dataframe with fragment types as columns ([‘b_z1’, ‘y_z2’, …])

hash_precursor_df()[source][source]

Insert hash codes for peptides and precursors

key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'mono_isotope_idx', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'i_0', 'i_1', 'i_2', 'i_3', 'i_4', 'i_5', 'i_6', 'i_7', 'i_8', 'i_9']

Key numeric columns to be saved into library/precursor_df in the hdf file for fast loading, others will be saved into library/mod_seq_df instead.

Type:

list of str

load_df_from_hdf(hdf_file: str, df_name: str) DataFrame[source][source]

Load specific dataset (dataframe) from hdf_file.

Parameters:
  • hdf_file (str) – The hdf file name

  • df_name (str) – The dataset/dataframe name in the hdf file

Returns:

Loaded dataframe

Return type:

pd.DataFrame

load_hdf(hdf_file: str, load_mod_seq: bool = True, support_legacy_mods_format: bool = True, infer_charged_frag_types: bool = True)[source][source]

Load the hdf library from hdf_file

Parameters:
  • hdf_file (str) – hdf library path to load

  • load_mod_seq (bool, optional) – By default, mod_seq_df is not used in the save_hdf(), so this param is not used. However, for performance reason, users can save the susbset of non key numeric columns in mod_seq_df. For fast loading, set load_mod_seq to False to skip loading mod_seq_df. Defaults to True.

  • support_legacy_mods_format (bool, optional) – If True, whitespaces in modifications will be replaced by underscores to match the internal data format. Defaults to True. DeprecationWarning: future versions will have a different default and eventually this flag will be dropped.

  • infer_charged_frag_types (bool, optional) – if True, infer the charged fragment types as defined in the hdf file, defaults to True. This is the default as users most likely don’t know the charged fragment types in the hdf file. If set to False, only charged frag types defined in charged_frag_types will be loaded.

property peptide_df: DataFrame

Peptide dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to precursor_df.

property precursor_df: DataFrame

Precursor dataframe with columns ‘sequence’, ‘mods’, ‘mod_sites’, ‘charge’, etc, identical to peptide_df.

refine_df()[source][source]

Sort nAA and reset_index for faster calculation (or prediction)

remove_unused_fragments()[source][source]

Remove unused fragments from all available fragment dataframes. Fragment dataframes are updated inplace and overwritten.

save_df_to_hdf(hdf_file: str, df_key: str, df: DataFrame, delete_existing=False)[source][source]

Save a new HDF group or dataset into existing HDF file

save_hdf(hdf_file: str, save_mod_seq_in_other_df: bool = False)[source][source]

Save library dataframes into hdf_file.

Parameters:
  • hdf_file (str) – The hdf file path to save

  • save_mod_seq_in_other_df (bool) –

    If True: save self.precursor_df into two hdf groups in hdf_file,

    library/precursor_df and library/mod_seq_df.

    library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory:

    ’precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, [‘isotope_mz_m1’, ‘isotope_intensity_m1’], …

    library/mod_seq_df contains all string columns and the other not essential columns:

    • ’sequence’

    • ’mods’

    • ’mod_sites’

    • ’proteins’, ‘genes’, …: optional columns

    • ’mod_seq_hash’: one-to-one map back to precursor_df

    • ’mod_seq_charge_hash’: one-to-one map back to precursor_df

    If False:

    All columns of self.precursor_df will be saved into library/precursor_df.

    Defaults to False.

alphabase.spectral_library.base.annotate_fragments_from_speclib(speclib: SpecLibBase, fragment_speclib: SpecLibBase, verbose=True) SpecLibBase[source][source]

Reannotate an SpecLibBase library with fragments from a different SpecLibBase.

Parameters:
  • speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the precursors to be annotated. All fragments mz and fragment intensities will be removed.

  • fragment_speclib (alphabase.spectral_library.library_base.SpecLibBase) – Spectral library which contains the donor precursors whose fragments should be used.

Returns:

newly annotated spectral library

Return type:

alphabase.spectral_library.library_base.SpecLibBase

alphabase.spectral_library.base.get_available_columns(df, columns)[source][source]

Get a list of column names that exist in the given dataframe.

Parameters:
  • df (pd.DataFrame) – The dataframe to check columns against

  • columns (list) – List of column names to check

Returns:

List of column names that exist in the dataframe

Return type:

list

Examples

>>> df = pd.DataFrame({'a': [1], 'b': [2]})
>>> get_available_columns(df, ['a', 'b', 'c'])
['a', 'b']