alphabase.spectral_library.flat¶
Classes:
|
Flatten the spectral library (SpecLibBase) by using |
- class alphabase.spectral_library.flat.SpecLibFlat(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]¶
Bases:
SpecLibBaseFlatten the spectral library (SpecLibBase) by using
parse_base_library().- custom_fragment_df_columns¶
‘mz’ and ‘intensity’ columns are required in
fragment_df, others could be customized. It can include [‘type’,’number’,’position’,’charge’,’loss_type’].- Type:
list of str
- min_fragment_intensity¶
minimal intensity to keep in
fragment_df.- Type:
float
- keep_top_k_fragments¶
top k highest peaks to keep in
fragment_df.- Type:
float
Methods:
__init__([charged_frag_types, ...])calc_dense_fragments([additional_columns, ...])Create a hybrid SpecLibFlat which has both flat and dense fragment representations.
get_full_charged_types(frag_df)Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa
load_hdf(hdf_file[, load_mod_seq, ...])Load the hdf library from hdf_file
parse_base_library(library[, ...])Flatten an library object of SpecLibBase or its inherited class.
Remove unused fragments from fragment_df.
save_hdf(hdf_file)Save library dataframes into hdf_file.
to_speclib_base([flat_columns, ...])Convert the flat library to a new SpecLibBase object with dense fragment matrices.
Attributes:
The flat fragment dataframe with columns (['mz', 'intensity'] +
custom_fragment_df_columns.)SpecLibBase.key_numeric_columns+ ['flat_frag_start_idx','flat_frag_stop_idx'].Protein dataframe
- __init__(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]¶
- Parameters:
min_fragment_intensity (float, optional) – minimal intensity to keep, by default 0.001
keep_top_k_fragments (int, optional) – top k highest peaks to keep, by default 1000
custom_fragment_df_columns (list, optional) – See
custom_fragment_df_columns, defaults to [‘type’,’number’,’position’,’charge’,’loss_type’]
- calc_dense_fragments(additional_columns: list | None = None, charged_frag_types: list | None = None) None[source][source]¶
Create a hybrid SpecLibFlat which has both flat and dense fragment representations. Converts the flat fragment representation to dense matrices and stores them in the object.
Creates fragment_mz_df (using calculated m/z values) and fragment_intensity_df by default. For each additional column specified (e.g., ‘intensity’), creates a corresponding _fragment_<column>_df matrix. Including ‘mz’ in additional_columns will use observed rather than calculated m/z values.
Fragment types can be specified explicitly or inherited from self.charged_frag_types. Only fragments matching these types will be included in the dense matrices. Each fragment type (e.g., ‘b_z1’, ‘y_z2’) becomes a column in the resulting dense matrices.
Updates the precursor_df with new frag_start_idx and frag_stop_idx columns for the dense representation.
- Parameters:
additional_columns (Union[list, None], optional) – Additional fragment columns to convert to dense format, defaults to [‘intensity’]
charged_frag_types (Union[list, None], optional) – Fragment types to include in dense format, defaults to self.charged_frag_types
- Returns:
Modifies the SpecLibFlat object in place
- Return type:
None
- property fragment_df: DataFrame¶
The flat fragment dataframe with columns ([‘mz’, ‘intensity’] +
custom_fragment_df_columns.)
- get_full_charged_types(frag_df: DataFrame) list[source][source]¶
Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa
- Parameters:
frag_df (pd.DataFrame) – The fragment dataframe
- Returns:
charged_frag_types – The full set of charged fragment types in the form of a list of strings such as [‘a_z1’,’b_z1’,’c_z1’,’x_z1’,’y_z1’,’z_z1’]
- Return type:
list
- key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'mono_isotope_idx', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'i_0', 'i_1', 'i_2', 'i_3', 'i_4', 'i_5', 'i_6', 'i_7', 'i_8', 'i_9', 'flat_frag_start_idx', 'flat_frag_stop_idx']¶
SpecLibBase.key_numeric_columns+ [‘flat_frag_start_idx’,’flat_frag_stop_idx’].
- load_hdf(hdf_file: str, load_mod_seq: bool = False, infer_charged_frag_types: bool = True)[source][source]¶
Load the hdf library from hdf_file
- Parameters:
hdf_file (str) – hdf library path to load
load_mod_seq (bool, optional) – if also load mod_seq_df. Defaults to False.
infer_charged_frag_types (bool, optional) – if True, infer the charged fragment types as defined in the hdf file, defaults to True. This is the default as users most likely don’t know the charged fragment types in the hdf file. If set to False, only charged frag types defined in SpecLibBase.charged_frag_types will be loaded.
- parse_base_library(library: SpecLibBase, keep_original_frag_dfs: bool = False, copy_precursor_df: bool = False, **kwargs)[source][source]¶
Flatten an library object of SpecLibBase or its inherited class. This method will generate
precursor_dfandfragment_dfThe fragments in fragment_df can be located by flat_frag_start_idx and flat_frag_stop_idx in precursor_df.- Parameters:
library (SpecLibBase) – A library object with attributes precursor_df, fragment_mz_df and fragment_intensity_df.
keep_original_frag_dfs (bool, default True) – If fragment_mz_df and fragment_intensity_df are kept in this library.
copy_precursor_df (bool, default False) – If True, make a copy of precursor_df from library, otherwise flat_frag_start_idx and flat_frag_stop_idx columns will also append to the library.
- property protein_df: DataFrame¶
Protein dataframe
- remove_unused_fragments()[source][source]¶
Remove unused fragments from fragment_df. This method is inherited from
SpecLibBaseand has not been implemented for a flat library.
- save_hdf(hdf_file: str)[source][source]¶
Save library dataframes into hdf_file. For self.precursor_df, this method will save it into two hdf groups: hdf_file: library/precursor_df and library/mod_seq_df.
library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory: [‘precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘flat_frag_start_idx’, ‘flat_frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, ‘isotope_mz_m1’, ‘isotope_intensity_m1’, …]
library/mod_seq_df contains all string columns and the other not essential columns: ‘sequence’,’mods’,’mod_sites’, [‘proteins’, ‘genes’]… as well as ‘mod_seq_hash’, ‘mod_seq_charge_hash’ columns to map back to precursor_df
- Parameters:
hdf_file (str) – the hdf file path to save
- to_speclib_base(flat_columns: list | None = None, charged_frag_types: list | None = None) SpecLibBase[source][source]¶
Convert the flat library to a new SpecLibBase object with dense fragment matrices.
Creates a new SpecLibBase containing fragment_mz_df (using calculated m/z values). Flat columns like ‘intensity’ are transformed into dense matrices as fragment_intensity_df. For all columns specified in flat_columns, a corresponding _fragment_<column>_df matrix is created and assigned to the new SpecLibBase object.
Warning
If the column ‘mz’ is added to flat_columns, it will override the calculated m/z values in fragment_mz_df. To mitigate this behavior and get observed as calculated m/z values, rename the flat mz column to ‘mz_observed’ before calling to_speclib_base.
Fragment types can be specified explicitly or inherited from self.charged_frag_types. Only fragments matching these types will be included in the dense matrices. Each fragment type (e.g., ‘b_z1’, ‘y_z2’) becomes a column in the resulting dense matrices.
The precursor_df is copied and updated with new dense fragment indices, removing any flat-specific columns (flat_frag_start_idx, flat_frag_stop_idx).
- Parameters:
flat_columns (Union[list, None], optional) – Fragment columns from the flat representation to convert to dense format, defaults to [‘intensity’]
charged_frag_types (Union[list, None], optional) – Fragment types to include in dense format, defaults to self.charged_frag_types
- Returns:
A new SpecLibBase object with dense fragment representations
- Return type: