alphabase.psm_reader.psm_reader#
See examples in psm_reader notebook.
Classes:
|
|
Functions:
|
Check if modifications of mod_str are in mod_set. |
|
Translate modifications of mod_str to the AlphaBase format mapped by mod_dict. |
Data:
A factory |
|
See psm_reader.yaml |
- class alphabase.psm_reader.psm_reader.PSMReaderBase(*, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, rt_unit: str = 'minute', **kwargs)[source][source]#
Bases:
object
Methods:
__init__
(*[, column_mapping, ...])The Base class for all PSMReaders.
add_modification_mapping
(modification_mapping)Append additional modification mappings for the search engine.
filter_psm_by_modifications
([include_mod_set])Only keeps peptides with modifications in include_mod_list.
import_file
(_file)This is the main entry function of PSM readers, it imports the file with following steps:
` origin_df = self._load_file(_file) self._translate_columns(origin_df) self._translate_decoy(origin_df) self._translate_score(origin_df) self._load_modifications(origin_df) self._translate_modifications() self._post_process(origin_df) `
import_files
(file_list)load
(_file)Wrapper for import_file()
norm_rt
()set_modification_mapping
(modification_mapping)Attributes:
- __init__(*, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, rt_unit: str = 'minute', **kwargs)[source][source]#
The Base class for all PSMReaders. The key of the sub-classes for different search engine format is to re-define column_mapping and modification_mapping.
- Parameters:
column_mapping (dict, optional) – A dict that maps alphabase’s columns to other search engine’s. The key of the column_mapping is alphabase’s column name, and the value could be the column name or a list of column names in other engine’s result. If it is None, this dict will be init by self._init_column_mapping. The dict values could be either str or list, for exaplme:
` columns_mapping = { 'sequence': 'NakedSequence', #str 'charge': 'Charge', #str 'proteins':['Proteins','UniprotIDs'], # list, this reader will automatically detect all of them. } `
Defaults to None.modification_mapping (dict, optional) – A dict that maps alphabase’s modifications to other engine’s. If it is None, this dict will be init by default modification mapping for each search engine (see
psm_reader_yaml
). The dict values can be either str or list, for exaplme:` modification_mapping = { 'Oxidation@M': 'Oxidation (M)', # str 'Phospho@S': ['S(Phospho (STY))','S(ph)','pS'], # list, this reader will automatically detect all of them. } `
Defaults to None.fdr (float, optional) – FDR level to keep PSMs. Defaults to 0.01.
keep_decoy (bool, optional) – If keep decoy PSMs in self.psm_df. Defautls to False.
- column_mapping#
Dict structure same as column_mapping in Args.
- Type:
dict
- modification_mapping#
Dict structure same as modification_mapping in Args. We must use self.set_modification_mapping(new_mapping) to update it.
- Type:
dict
- _psm_df#
the PSM DataFrame after loading from search engines.
- Type:
pd.DataFrame
- psm_df#
the getter of self._psm_df
- Type:
pd.DataFrame
- keep_fdr#
The only PSMs with FDR<=keep_fdr were returned in self._psm_df.
- Type:
float
- keep_decoy#
If keep decoy PSMs in self.psm_df.
- Type:
bool
- _min_max_rt_norm#
if True, the ‘rt_norm’ values in self._psm_df will be normalized by rt_norm = (self.psm_df.rt-rt_min)/(rt_max-rt_min). It is useful to normalize iRT values as they contain negative values. Defaults to False.
- Type:
bool
- add_modification_mapping(modification_mapping: dict)[source][source]#
Append additional modification mappings for the search engine.
- Parameters:
modification_mapping (dict) – The key of dict is a modification name in AlphaBase format; the value could be a str or a list, see below
` add_modification_mapping({ 'Dimethyl@K': ['K(Dimethyl)'], # list 'Dimethyl@Any N-term': '_(Dimethyl)', # str }) `
- filter_psm_by_modifications(include_mod_set={'Acetyl@Protein N-term', 'Oxidation@M', 'Phospho@S', 'Phospho@T', 'Phospho@Y'})[source][source]#
Only keeps peptides with modifications in include_mod_list.
- import_file(_file: str) DataFrame [source][source]#
This is the main entry function of PSM readers, it imports the file with following steps:
` origin_df = self._load_file(_file) self._translate_columns(origin_df) self._translate_decoy(origin_df) self._translate_score(origin_df) self._load_modifications(origin_df) self._translate_modifications() self._post_process(origin_df) `
- Parameters:
_file (str) – file path or file stream (io).
- property psm_df: DataFrame#
- class alphabase.psm_reader.psm_reader.PSMReaderProvider[source][source]#
Bases:
object
Methods:
__init__
()get_reader
(reader_type, *[, column_mapping, ...])get_reader_by_yaml
(yaml_dict)register_reader
(reader_type, reader_class)- get_reader(reader_type: str, *, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, **kwargs) PSMReaderBase [source][source]#
- get_reader_by_yaml(yaml_dict: dict) PSMReaderBase [source][source]#
- alphabase.psm_reader.psm_reader.keep_modifications(mod_str: str, mod_set: set) str [source][source]#
Check if modifications of mod_str are in mod_set.
- Parameters:
mod_str (str) – mod list in str format, seperated by ‘;’, e.g. Oxidation@M;Phospho@S.
mod_set (set) – mod set to check
- Returns:
original mod_str if all modifications are in mod_set else pd.NA.
- Return type:
str
- alphabase.psm_reader.psm_reader.psm_reader_provider = <alphabase.psm_reader.psm_reader.PSMReaderProvider object>#
A factory
PSMReaderProvider
object to register and get readers for different PSM types.
- alphabase.psm_reader.psm_reader.psm_reader_yaml = {'alphapept': {'column_mapping': {'charge': 'charge', 'decoy': 'decoy', 'fdr': 'q_value', 'mobility': 'mobility', 'precursor_mz': 'mz', 'query_id': 'query_idx', 'raw_name': 'raw_name', 'rt': 'rt', 'scan_num': 'scan_no', 'score': 'score', 'spec_idx': 'raw_idx'}, 'modification_mapping': {'Acetyl@Protein N-term': 'a', 'Carbamidomethyl@C': 'cC', 'Oxidation@M': 'oxM', 'Phospho@S': 'pS', 'Phospho@T': 'pT', 'Phospho@Y': 'pY'}, 'reader_type': 'alphapept', 'rt_unit': 'minute'}, 'diann': {'column_mapping': {'ccs': 'CCS', 'charge': 'Precursor.Charge', 'fdr': 'Q.Value', 'genes': 'Genes', 'mobility': ['IM', 'IonMobility'], 'proteins': 'Protein.Names', 'raw_name': 'Run', 'rt': 'RT', 'rt_start': 'RT.Start', 'rt_stop': 'RT.Stop', 'scan_num': 'MS2.Scan', 'score': 'CScore', 'sequence': 'Stripped.Sequence', 'uniprot_ids': 'Protein.Ids'}, 'fixed_C57': False, 'modification_mapping': 'maxquant', 'reader_type': 'diann', 'rt_unit': 'minute'}, 'library_reader_base': {'column_mapping': {'ccs': 'CCS', 'charge': 'PrecursorCharge', 'fragment_charge': ['FragmentCharge', 'FragmentIonCharge', 'ProductCharge', 'ProductIonCharge'], 'fragment_intensity': ['LibraryIntensity', 'RelativeIntensity', 'RelativeFragmentIntensity', 'RelativeFragmentIonIntensity'], 'fragment_loss_type': ['FragmentLossType', 'FragmentIonLossType', 'ProductLossType', 'ProductIonLossType'], 'fragment_mz': ['ProductMz'], 'fragment_series': ['FragmentSeriesNumber', 'FragmentNumber'], 'fragment_type': ['FragmentType', 'FragmentIonType', 'ProductType', 'ProductIonType'], 'genes': ['GeneName', 'Genes', 'Gene'], 'mobility': ['Mobility', 'IonMobility', 'PrecursorIonMobility'], 'modified_sequence': ['ModifiedPeptideSequence', 'ModifiedPeptide'], 'precursor_mz': 'PrecursorMz', 'proteins': ['ProteinId', 'ProteinID', 'ProteinName', 'Protein Name'], 'raw_name': 'ReferenceRun', 'rt': ['RT', 'iRT', 'Tr_recalibrated', 'RetentionTime', 'NormalizedRetentionTime'], 'sequence': ['PeptideSequence', 'StrippedPeptide'], 'uniprot_ids': ['UniProtIds', 'UniProtID', 'UniprotId']}, 'csv_sep': '\t', 'fixed_C57': False, 'mod_seq_columns': ['ModifiedPeptideSequence', 'ModifiedPeptide', 'ModifiedSequence', 'FullUniModPeptideName', 'LabeledSequence', 'FullUniModPeptideName'], 'modification_mapping': 'maxquant', 'reader_type': 'library_reader_base', 'rt_unit': 'irt'}, 'maxquant': {'column_mapping': {'ccs': 'CCS', 'charge': 'Charge', 'decoy': 'Reverse', 'genes': ['Gene Names', 'Gene names'], 'intensity': 'Intensity', 'mobility': ['Mobility', 'IonMobility', 'K0', '1/K0'], 'precursor_mz': 'm/z', 'proteins': 'Proteins', 'raw_name': 'Raw file', 'rt': 'Retention time', 'scan_num': ['Scan number', 'MS/MS scan number', 'Scan index'], 'score': 'Score', 'sequence': 'Sequence'}, 'fixed_C57': True, 'modification_mapping': {'Acetyl@Protein N-term': ['_(Acetyl (Protein N-term))', '_(ac)'], 'Carbamidomethyl@C': ['C(Carbamidomethyl (C))', 'C(Carbamidomethyl)'], 'Deamidated@N': ['N(Deamidation (NQ))', 'N(de)'], 'Deamidated@Q': ['Q(Deamidation (NQ))', 'Q(de)'], 'Dimethyl@Any N-term': ['(Dimethyl)'], 'Dimethyl@K': ['K(Dimethyl)'], 'Dimethyl@R': ['R(Dimethyl)'], 'GlyGly@K': ['K(GlyGly (K))', 'K(gl)'], 'Oxidation@M': ['M(Oxidation)', 'M(Oxidation (M))', 'M(ox)'], 'Phospho@S': ['S(Phospho (S))', 'S(Phospho (ST))', 'S(Phospho (STY))', 'S(ph)', 'pS'], 'Phospho@T': ['T(Phospho (T))', 'T(Phospho (ST))', 'T(Phospho (STY))', 'T(ph)', 'pT'], 'Phospho@Y': ['Y(Phospho (Y))', 'Y(Phospho (STY))', 'Y(ph)', 'pY']}, 'reader_type': 'maxquant', 'rt_unit': 'minute'}, 'msfragger_pepxml': {'column_mapping': {'charge': 'assumed_charge', 'fdr': 'expect', 'mobility': 'ion_mobility', 'proteins': 'protein', 'query_id': 'spectrum', 'raw_name': 'raw_name', 'rt': 'retention_time_sec', 'scan_num': 'start_scan', 'score': 'expect', 'sequence': 'peptide'}, 'mass_mapped_mods': ['Oxidation@M', 'Carbamidomethyl@C', 'Phospho@S', 'GlyGly@K', 'Cysteinyl@C', 'Acetyl@Any N-term', 'Glu->pyro-Glu@E^Any N-term', 'Gln->pyro-Glu@Q^Any N-term', 'Dimethyl@K', 'Methyl@E'], 'mod_mass_tol': 0.1, 'modification_mapping': {'': ''}, 'reader_type': 'msfragger_pepxml', 'rt_unit': 'second'}, 'pfind': {'column_mapping': {'charge': 'Charge', 'decoy': ['Target/Decoy', 'Targe/Decoy'], 'fdr': 'Q-value', 'proteins': 'Proteins', 'query_id': 'File_Name', 'raw_name': 'raw_name', 'rt': 'RT', 'scan_num': 'Scan_No', 'score': 'Final_Score', 'sequence': 'Sequence', 'uniprot_ids': 'Proteins'}, 'modification_mapping': {'': ''}, 'reader_type': 'pfind', 'rt_unit': 'minute'}, 'sage': {'column_mapping': {'charge': 'charge', 'decoy': 'is_decoy', 'fdr': 'spectrum_q', 'mobility': 'mobility', 'modified_sequence': 'peptide', 'peptide_fdr': 'peptide_q', 'protein_fdr': 'protein_q', 'proteins': 'proteins', 'raw_name': 'filename', 'rt': 'rt', 'scannr': 'scannr', 'score': 'sage_discriminant_score', 'sequence': 'stripped_peptide'}, 'reader_type': 'sage', 'rt_unit': 'minute'}, 'spectronaut': {'column_mapping': {'ccs': 'CCS', 'charge': 'PrecursorCharge', 'genes': ['Genes', 'Gene', 'GeneName', 'GeneNames'], 'mobility': ['Mobility', 'IonMobility', 'PrecursorIonMobility'], 'precursor_mz': 'PrecursorMz', 'proteins': ['Protein Name', 'ProteinId', 'ProteinID', 'ProteinName', 'ProteinGroup', 'ProteinGroups'], 'raw_name': 'ReferenceRun', 'rt': ['RT', 'iRT', 'Tr_recalibrated', 'RetentionTime', 'NormalizedRetentionTime'], 'sequence': ['StrippedPeptide', 'PeptideSequence'], 'uniprot_ids': ['UniProtIds', 'UniProtID', 'UniprotId']}, 'fixed_C57': False, 'mod_seq_columns': ['ModifiedPeptide', 'ModifiedSequence', 'FullUniModPeptideName', 'ModifiedPeptideSequence', 'LabeledSequence', 'FullUniModPeptideName'], 'modification_mapping': 'maxquant', 'reader_type': 'spectronaut', 'rt_unit': 'irt'}, 'spectronaut_report': {'column_mapping': {'charge': 'charge', 'genes': 'PG.Genes', 'proteins': ['PG.ProteinNames', 'PG.ProteinGroups'], 'raw_name': 'R.FileName', 'rt': ['EG.ApexRT', 'EG.MeanApexRT'], 'uniprot_ids': 'PG.UniProtIds'}, 'fixed_C57': False, 'modification_mapping': 'maxquant', 'reader_type': 'spectronaut_report', 'rt_unit': 'minute'}}#
See psm_reader.yaml
- alphabase.psm_reader.psm_reader.translate_other_modification(mod_str: str, mod_dict: dict) str [source][source]#
Translate modifications of mod_str to the AlphaBase format mapped by mod_dict.
- Parameters:
mod_str (str) – mod list in str format, seperated by ‘;’, e.g. ModA;ModB
mod_dict (dict) – translate mod dict from others to AlphaBase, e.g. for pFind, key=[‘Phospho[S]’,’Oxidation[M]’], value=[‘Phospho@S’,’Oxidation@M’]
- Returns:
new mods in AlphaBase format seperated by ‘;’. if any modification is not in mod_dict, return pd.NA.
- Return type:
str