alphabase.spectral_library.flat#

Classes:

SpecLibFlat([charged_frag_types, ...])

Flatten the spectral library (SpecLibBase) by using parse_base_library().

class alphabase.spectral_library.flat.SpecLibFlat(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#

Bases: SpecLibBase

Flatten the spectral library (SpecLibBase) by using parse_base_library().

custom_fragment_df_columns#

‘mz’ and ‘intensity’ columns are required in fragment_df, others could be customized. It can include [‘type’,’number’,’position’,’charge’,’loss_type’].

Type:: list of str

min_fragment_intensity#

minimal intensity to keep in fragment_df.

Type:: float

keep_top_k_fragments#

top k highest peaks to keep in fragment_df.

Type:: float

Methods:

`__init__`([charged_frag_types, ...])	param min_fragment_intensity: minimal intensity to keep, by default 0.001
`available_dense_fragment_dfs`()	Return the available dense fragment dataframes.
`get_full_charged_types`(frag_df)	Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa
`load_hdf`(hdf_file[, load_mod_seq])	Load the hdf library from hdf_file
`parse_base_library`(library[, ...])	Flatten an library object of SpecLibBase or its inherited class.
`remove_unused_fragments`()	Remove unused fragments from fragment_df.
`save_hdf`(hdf_file)	Save library dataframes into hdf_file.
`to_SpecLibBase`()	Convert the flat library to SpecLibBase object.

Attributes:

`fragment_df`	The flat fragment dataframe with columns (['mz', 'intensity'] + `custom_fragment_df_columns`.)
`key_numeric_columns`	`SpecLibBase.key_numeric_columns` + ['flat_frag_start_idx','flat_frag_stop_idx'].
`protein_df`	Protein dataframe

__init__(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#

Parameters:

min_fragment_intensity (float, optional) – minimal intensity to keep, by default 0.001
keep_top_k_fragments (int, optional) – top k highest peaks to keep, by default 1000
custom_fragment_df_columns (list, optional) – See custom_fragment_df_columns, defaults to [‘type’,’number’,’position’,’charge’,’loss_type’]

available_dense_fragment_dfs()[source][source]#: Return the available dense fragment dataframes. This method is inherited from SpecLibBase and will return an empty list for a flat library.

property fragment_df: DataFrame#: The flat fragment dataframe with columns ([‘mz’, ‘intensity’] + custom_fragment_df_columns.)

get_full_charged_types(frag_df: DataFrame) → list[source][source]#

Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa

Parameters:: frag_df (pd.DataFrame) – The fragment dataframe
Returns:: charged_frag_types – The full set of charged fragment types in the form of a list of strings such as [‘a_z1’,’b_z1’,’c_z1’,’x_z1’,’y_z1’,’z_z1’]
Return type:: list

key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'flat_frag_start_idx', 'flat_frag_stop_idx']#: SpecLibBase.key_numeric_columns + [‘flat_frag_start_idx’,’flat_frag_stop_idx’].

load_hdf(hdf_file: str, load_mod_seq: bool = False)[source][source]#

Load the hdf library from hdf_file

Parameters:

hdf_file (str) – hdf library path to load
load_mod_seq (bool, optional) – if also load mod_seq_df. Defaults to False.

parse_base_library(library: SpecLibBase, keep_original_frag_dfs: bool = False, copy_precursor_df: bool = False, **kwargs)[source][source]#

Flatten an library object of SpecLibBase or its inherited class. This method will generate precursor_df and fragment_df The fragments in fragment_df can be located by flat_frag_start_idx and flat_frag_stop_idx in precursor_df.

Parameters:

library (SpecLibBase) – A library object with attributes precursor_df, fragment_mz_df and fragment_intensity_df.
keep_original_frag_dfs (bool, default True) – If fragment_mz_df and fragment_intensity_df are kept in this library.
copy_precursor_df (bool, default False) – If True, make a copy of precursor_df from library, otherwise flat_frag_start_idx and flat_frag_stop_idx columns will also append to the library.

property protein_df: DataFrame#: Protein dataframe

remove_unused_fragments()[source][source]#: Remove unused fragments from fragment_df. This method is inherited from SpecLibBase and has not been implemented for a flat library.

save_hdf(hdf_file: str)[source][source]#

Save library dataframes into hdf_file. For self.precursor_df, this method will save it into two hdf groups: hdf_file: library/precursor_df and library/mod_seq_df.

library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory: [‘precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘flat_frag_start_idx’, ‘flat_frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, ‘isotope_mz_m1’, ‘isotope_intensity_m1’, …]

library/mod_seq_df contains all string columns and the other not essential columns: ‘sequence’,’mods’,’mod_sites’, [‘proteins’, ‘genes’]… as well as ‘mod_seq_hash’, ‘mod_seq_charge_hash’ columns to map back to precursor_df

Parameters:: hdf_file (str) – the hdf file path to save

to_SpecLibBase() → SpecLibBase[source][source]#

Convert the flat library to SpecLibBase object.

Returns:#

SpecLibBase: A SpecLibBase object with precursor_df, fragment_mz_df and fragment_intensity_df, and ‘_additional_fragment_columns_df’ if there was more than mz and intensity in the original fragment_df.