alphabase.spectral_library.flat#
Classes:
|
Flatten the spectral library (SpecLibBase) by using |
- class alphabase.spectral_library.flat.SpecLibFlat(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#
Bases:
SpecLibBase
Flatten the spectral library (SpecLibBase) by using
parse_base_library()
.- custom_fragment_df_columns#
‘mz’ and ‘intensity’ columns are required in
fragment_df
, others could be customized. It can include [‘type’,’number’,’position’,’charge’,’loss_type’].- Type:
list of str
- min_fragment_intensity#
minimal intensity to keep in
fragment_df
.- Type:
float
- keep_top_k_fragments#
top k highest peaks to keep in
fragment_df
.- Type:
float
Methods:
__init__
([charged_frag_types, ...])- param min_fragment_intensity:
minimal intensity to keep, by default 0.001
Return the available dense fragment dataframes.
get_full_charged_types
(frag_df)Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa
load_hdf
(hdf_file[, load_mod_seq])Load the hdf library from hdf_file
parse_base_library
(library[, ...])Flatten an library object of SpecLibBase or its inherited class.
Remove unused fragments from fragment_df.
save_hdf
(hdf_file)Save library dataframes into hdf_file.
Convert the flat library to SpecLibBase object.
Attributes:
The flat fragment dataframe with columns (['mz', 'intensity'] +
custom_fragment_df_columns
.)SpecLibBase.key_numeric_columns
+ ['flat_frag_start_idx','flat_frag_stop_idx'].Protein dataframe
- __init__(charged_frag_types: list = ['b_z1', 'b_z2', 'y_z1', 'y_z2'], min_fragment_intensity: float = 0.001, keep_top_k_fragments: int = 1000, custom_fragment_df_columns: list = ['type', 'number', 'position', 'charge', 'loss_type'], **kwargs)[source][source]#
- Parameters:
min_fragment_intensity (float, optional) – minimal intensity to keep, by default 0.001
keep_top_k_fragments (int, optional) – top k highest peaks to keep, by default 1000
custom_fragment_df_columns (list, optional) – See
custom_fragment_df_columns
, defaults to [‘type’,’number’,’position’,’charge’,’loss_type’]
- available_dense_fragment_dfs()[source][source]#
Return the available dense fragment dataframes. This method is inherited from
SpecLibBase
and will return an empty list for a flat library.
- property fragment_df: DataFrame#
The flat fragment dataframe with columns ([‘mz’, ‘intensity’] +
custom_fragment_df_columns
.)
- get_full_charged_types(frag_df: DataFrame) list [source][source]#
Infer the full set of charged fragment types from the fragment dataframe by full we mean a complete set of fragment types for each charge so if we have a fragment b_z1 we should also have a fragment y_z1 and vice versa
- Parameters:
frag_df (pd.DataFrame) – The fragment dataframe
- Returns:
charged_frag_types – The full set of charged fragment types in the form of a list of strings such as [‘a_z1’,’b_z1’,’c_z1’,’x_z1’,’y_z1’,’z_z1’]
- Return type:
list
- key_numeric_columns: list = ['ccs_pred', 'charge', 'decoy', 'frag_stop_idx', 'frag_start_idx', 'isotope_m1_intensity', 'isotope_m1_mz', 'isotope_apex_mz', 'isotope_apex_intensity', 'isotope_apex_offset', 'isotope_right_most_mz', 'isotope_right_most_intensity', 'isotope_right_most_offset', 'miss_cleavage', 'mobility_pred', 'mobility', 'nAA', 'precursor_mz', 'rt_pred', 'rt_norm_pred', 'rt', 'labeling_channel', 'flat_frag_start_idx', 'flat_frag_stop_idx']#
SpecLibBase.key_numeric_columns
+ [‘flat_frag_start_idx’,’flat_frag_stop_idx’].
- load_hdf(hdf_file: str, load_mod_seq: bool = False)[source][source]#
Load the hdf library from hdf_file
- Parameters:
hdf_file (str) – hdf library path to load
load_mod_seq (bool, optional) – if also load mod_seq_df. Defaults to False.
- parse_base_library(library: SpecLibBase, keep_original_frag_dfs: bool = False, copy_precursor_df: bool = False, **kwargs)[source][source]#
Flatten an library object of SpecLibBase or its inherited class. This method will generate
precursor_df
andfragment_df
The fragments in fragment_df can be located by flat_frag_start_idx and flat_frag_stop_idx in precursor_df.- Parameters:
library (SpecLibBase) – A library object with attributes precursor_df, fragment_mz_df and fragment_intensity_df.
keep_original_frag_dfs (bool, default True) – If fragment_mz_df and fragment_intensity_df are kept in this library.
copy_precursor_df (bool, default False) – If True, make a copy of precursor_df from library, otherwise flat_frag_start_idx and flat_frag_stop_idx columns will also append to the library.
- property protein_df: DataFrame#
Protein dataframe
- remove_unused_fragments()[source][source]#
Remove unused fragments from fragment_df. This method is inherited from
SpecLibBase
and has not been implemented for a flat library.
- save_hdf(hdf_file: str)[source][source]#
Save library dataframes into hdf_file. For self.precursor_df, this method will save it into two hdf groups: hdf_file: library/precursor_df and library/mod_seq_df.
library/precursor_df contains all essential numberic columns those can be loaded faster from hdf file into memory: [‘precursor_mz’, ‘charge’, ‘mod_seq_hash’, ‘mod_seq_charge_hash’, ‘frag_start_idx’, ‘frag_stop_idx’, ‘flat_frag_start_idx’, ‘flat_frag_stop_idx’, ‘decoy’, ‘rt_pred’, ‘ccs_pred’, ‘mobility_pred’, ‘miss_cleave’, ‘nAA’, ‘isotope_mz_m1’, ‘isotope_intensity_m1’, …]
library/mod_seq_df contains all string columns and the other not essential columns: ‘sequence’,’mods’,’mod_sites’, [‘proteins’, ‘genes’]… as well as ‘mod_seq_hash’, ‘mod_seq_charge_hash’ columns to map back to precursor_df
- Parameters:
hdf_file (str) – the hdf file path to save
- to_SpecLibBase() SpecLibBase [source][source]#
Convert the flat library to SpecLibBase object.
Returns:#
- SpecLibBase
A SpecLibBase object with precursor_df, fragment_mz_df and fragment_intensity_df, and ‘_additional_fragment_columns_df’ if there was more than mz and intensity in the original fragment_df.