alphabase.spectral_library.flat#

Classes:

SpecLibFlat([charged_frag_types, ...])

Flatten the spectral library (SpecLibBase) by using parse_base_library().

class alphabase.spectral_library.flat.SpecLibFlat(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#

Bases: SpecLibBase

Flatten the spectral library (SpecLibBase) by using parse_base_library().

custom_fragment_df_columns#

‘mz’ and ‘intensity’ columns are required in fragment_df, others could be customized. It can include [‘type’,’number’,’position’,’charge’,’loss_type’].

Type:

list of str

min_fragment_intensity#

minimal intensity to keep in fragment_df.

Type:

float

keep_top_k_fragments#

top k highest peaks to keep in fragment_df.

Type:

float

Methods:

__init__([charged_frag_types, ...])

param min_fragment_intensity:

minimal intensity to keep, by default 0.001

available_dense_fragment_dfs()

Return the available dense fragment dataframes.

get_full_charged_types(frag_df)

Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa

load_hdf(hdf_file[, load_mod_seq])

Load the hdf library from hdf_file

parse_base_library(library[, ...])

Flatten an library object of SpecLibBase or its inherited class.

remove_unused_fragments()

Remove unused fragments from fragment_df.

save_hdf(hdf_file)

Save library dataframes into hdf_file.

to_SpecLibBase()

Convert the flat library to SpecLibBase object.

Attributes:

fragment_df

The flat fragment dataframe with columns (['mz', 'intensity'] + custom_fragment_df_columns.)

key_numeric_columns

SpecLibBase.key_numeric_columns + ['flat_frag_start_idx','flat_frag_stop_idx'].

protein_df

Protein dataframe

__init__(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#
Parameters:
  • min_fragment_intensity (float, optional) – minimal intensity to keep, by default 0.001

  • keep_top_k_fragments (int, optional) – top k highest peaks to keep, by default 1000

  • custom_fragment_df_columns (list, optional) – See custom_fragment_df_columns, defaults to [‘type’,’number’,’position’,’charge’,’loss_type’]

available_dense_fragment_dfs()[source][source]#

Return the available dense fragment dataframes. This method is inherited from SpecLibBase and will return an empty list for a flat library.

property fragment_df: DataFrame#

The flat fragment dataframe with columns ([‘mz’, ‘intensity’] + custom_fragment_df_columns.)

get_full_charged_types(frag_df: DataFrame) list[source][source]#

Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa

Parameters:

frag_df (pd.DataFrame) – The fragment dataframe

Returns:

charged_frag_types – The full set of charged fragment types in the form of a list of strings such as [‘a_z1’,’b_z1’,’c_z1’,’x_z1’,’y_z1’,’z_z1’]

Return type:

list

key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'flat_frag_start_idx', 'flat_frag_stop_idx']#

SpecLibBase.key_numeric_columns + [‘flat_frag_start_idx’,’flat_frag_stop_idx’].

load_hdf(hdf_file: str, load_mod_seq: bool = False)[source][source]#

Load the hdf library from hdf_file

Parameters:
  • hdf_file (str) – hdf library path to load

  • load_mod_seq (bool, optional) – if also load mod_seq_df. Defaults to False.

parse_base_library(library: SpecLibBase, keep_original_frag_dfs: bool = False, copy_precursor_df: bool = False, **kwargs)[source][source]#

Flatten an library object of SpecLibBase or its inherited class. This method will generate precursor_df and fragment_df The fragments in fragment_df can be located by flat_frag_start_idx and flat_frag_stop_idx in precursor_df.

Parameters:
  • library (SpecLibBase) – A library object with attributes precursor_df, fragment_mz_df and fragment_intensity_df.

  • keep_original_frag_dfs (bool, default True) – If fragment_mz_df and fragment_intensity_df are kept in this library.

  • copy_precursor_df (bool, default False) – If True, make a copy of precursor_df from library, otherwise flat_frag_start_idx and flat_frag_stop_idx columns will also append to the library.

property protein_df: DataFrame#

Protein dataframe

remove_unused_fragments()[source][source]#

Remove unused fragments from fragment_df. This method is inherited from SpecLibBase and has not been implemented for a flat library.

save_hdf(hdf_file: str)[source][source]#

Save library dataframes into hdf_file. For self.precursor_df, this method will save it into two hdf groups: hdf_file: library/precursor_df and library/mod_seq_df.

library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory: [‘precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘flat_frag_start_idx’, ‘flat_frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, ‘isotope_mz_m1’, ‘isotope_intensity_m1’, …]

library/mod_seq_df contains all string columns and the other not essential columns: ‘sequence’,’mods’,’mod_sites’, [‘proteins’, ‘genes’]… as well as ‘mod_seq_hash’, ‘mod_seq_charge_hash’ columns to map back to precursor_df

Parameters:

hdf_file (str) – the hdf file path to save

to_SpecLibBase() SpecLibBase[source][source]#

Convert the flat library to SpecLibBase object.

Returns:#

SpecLibBase

A SpecLibBase object with precursor_df, fragment_mz_df and fragment_intensity_df, and ‘_additional_fragment_columns_df’ if there was more than mz and intensity in the original fragment_df.