alphabase.psm_reader.psm_reader#

See examples in psm_reader notebook.

Classes:

PSMReaderBase(*[, column_mapping, ...])

PSMReaderProvider()

Functions:

keep_modifications(mod_str, mod_set)

Check if modifications of mod_str are in mod_set.

translate_other_modification(mod_str, mod_dict)

Translate modifications of mod_str to the AlphaBase format mapped by mod_dict.

Data:

psm_reader_provider

A factory PSMReaderProvider object to register and get readers for different PSM types.

psm_reader_yaml

See psm_reader.yaml

class alphabase.psm_reader.psm_reader.PSMReaderBase(*, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, rt_unit: str = 'minute', **kwargs)[source][source]#

Bases: object

Methods:

__init__(*[, column_mapping, ...])

The Base class for all PSMReaders.

add_modification_mapping(modification_mapping)

Append additional modification mappings for the search engine.

filter_psm_by_modifications([include_mod_set])

Only keeps peptides with modifications in include_mod_list.

import_file(_file)

This is the main entry function of PSM readers, it imports the file with following steps: ` origin_df = self._load_file(_file) self._translate_columns(origin_df) self._translate_decoy(origin_df) self._translate_score(origin_df) self._load_modifications(origin_df) self._translate_modifications() self._post_process(origin_df) `

import_files(file_list)

load(_file)

Wrapper for import_file()

norm_rt()

normalize_rt()

normalize_rt_by_raw_name()

set_modification_mapping(modification_mapping)

Attributes:

__init__(*, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, rt_unit: str = 'minute', **kwargs)[source][source]#

The Base class for all PSMReaders. The key of the sub-classes for different search engine format is to re-define column_mapping and modification_mapping.

Parameters:
  • column_mapping (dict, optional) – A dict that maps alphabase’s columns to other search engine’s. The key of the column_mapping is alphabase’s column name, and the value could be the column name or a list of column names in other engine’s result. If it is None, this dict will be init by self._init_column_mapping. The dict values could be either str or list, for exaplme: ` columns_mapping = { 'sequence': 'NakedSequence', #str 'charge': 'Charge', #str 'proteins':['Proteins','UniprotIDs'], # list, this reader will automatically detect all of them. } ` Defaults to None.

  • modification_mapping (dict, optional) – A dict that maps alphabase’s modifications to other engine’s. If it is None, this dict will be init by default modification mapping for each search engine (see psm_reader_yaml). The dict values can be either str or list, for exaplme: ` modification_mapping = { 'Oxidation@M': 'Oxidation (M)', # str 'Phospho@S': ['S(Phospho (STY))','S(ph)','pS'], # list, this reader will automatically detect all of them. } ` Defaults to None.

  • fdr (float, optional) – FDR level to keep PSMs. Defaults to 0.01.

  • keep_decoy (bool, optional) – If keep decoy PSMs in self.psm_df. Defautls to False.

column_mapping#

Dict structure same as column_mapping in Args.

Type:

dict

modification_mapping#

Dict structure same as modification_mapping in Args. We must use self.set_modification_mapping(new_mapping) to update it.

Type:

dict

_psm_df#

the PSM DataFrame after loading from search engines.

Type:

pd.DataFrame

psm_df#

the getter of self._psm_df

Type:

pd.DataFrame

keep_fdr#

The only PSMs with FDR<=keep_fdr were returned in self._psm_df.

Type:

float

keep_decoy#

If keep decoy PSMs in self.psm_df.

Type:

bool

_min_max_rt_norm#

if True, the ‘rt_norm’ values in self._psm_df will be normalized by rt_norm = (self.psm_df.rt-rt_min)/(rt_max-rt_min). It is useful to normalize iRT values as they contain negative values. Defaults to False.

Type:

bool

add_modification_mapping(modification_mapping: dict)[source][source]#

Append additional modification mappings for the search engine.

Parameters:

modification_mapping (dict) – The key of dict is a modification name in AlphaBase format; the value could be a str or a list, see below ` add_modification_mapping({ 'Dimethyl@K': ['K(Dimethyl)'], # list 'Dimethyl@Any N-term': '_(Dimethyl)', # str }) `

filter_psm_by_modifications(include_mod_set={'Acetyl@Protein N-term', 'Oxidation@M', 'Phospho@S', 'Phospho@T', 'Phospho@Y'})[source][source]#

Only keeps peptides with modifications in include_mod_list.

import_file(_file: str) DataFrame[source][source]#

This is the main entry function of PSM readers, it imports the file with following steps: ` origin_df = self._load_file(_file) self._translate_columns(origin_df) self._translate_decoy(origin_df) self._translate_score(origin_df) self._load_modifications(origin_df) self._translate_modifications() self._post_process(origin_df) `

Parameters:

_file (str) – file path or file stream (io).

import_files(file_list: list)[source][source]#
load(_file) DataFrame[source][source]#

Wrapper for import_file()

norm_rt()[source][source]#
normalize_rt()[source][source]#
normalize_rt_by_raw_name()[source][source]#
property psm_df: DataFrame#
set_modification_mapping(modification_mapping: dict)[source][source]#
class alphabase.psm_reader.psm_reader.PSMReaderProvider[source][source]#

Bases: object

Methods:

__init__()

get_reader(reader_type, *[, column_mapping, ...])

get_reader_by_yaml(yaml_dict)

register_reader(reader_type, reader_class)

__init__()[source][source]#
get_reader(reader_type: str, *, column_mapping: dict = None, modification_mapping: dict = None, fdr=0.01, keep_decoy=False, **kwargs) PSMReaderBase[source][source]#
get_reader_by_yaml(yaml_dict: dict) PSMReaderBase[source][source]#
register_reader(reader_type, reader_class)[source][source]#
alphabase.psm_reader.psm_reader.keep_modifications(mod_str: str, mod_set: set) str[source][source]#

Check if modifications of mod_str are in mod_set.

Parameters:
  • mod_str (str) – mod list in str format, seperated by ‘;’, e.g. Oxidation@M;Phospho@S.

  • mod_set (set) – mod set to check

Returns:

original mod_str if all modifications are in mod_set else pd.NA.

Return type:

str

alphabase.psm_reader.psm_reader.psm_reader_provider = <alphabase.psm_reader.psm_reader.PSMReaderProvider object>#

A factory PSMReaderProvider object to register and get readers for different PSM types.

alphabase.psm_reader.psm_reader.psm_reader_yaml = {'alphapept': {'column_mapping': {'charge': 'charge', 'decoy': 'decoy', 'fdr': 'q_value', 'mobility': 'mobility', 'precursor_mz': 'mz', 'query_id': 'query_idx', 'raw_name': 'raw_name', 'rt': 'rt', 'scan_num': 'scan_no', 'score': 'score', 'spec_idx': 'raw_idx'}, 'modification_mapping': {'Acetyl@Protein N-term': 'a', 'Carbamidomethyl@C': 'cC', 'Oxidation@M': 'oxM', 'Phospho@S': 'pS', 'Phospho@T': 'pT', 'Phospho@Y': 'pY'}, 'reader_type': 'alphapept', 'rt_unit': 'minute'}, 'diann': {'column_mapping': {'ccs': 'CCS', 'charge': 'Precursor.Charge', 'fdr': 'Q.Value', 'genes': 'Genes', 'mobility': ['IM', 'IonMobility'], 'proteins': 'Protein.Names', 'raw_name': 'Run', 'rt': 'RT', 'rt_start': 'RT.Start', 'rt_stop': 'RT.Stop', 'scan_num': 'MS2.Scan', 'score': 'CScore', 'sequence': 'Stripped.Sequence', 'uniprot_ids': 'Protein.Ids'}, 'fixed_C57': False, 'modification_mapping': 'maxquant', 'reader_type': 'diann', 'rt_unit': 'minute'}, 'library_reader_base': {'column_mapping': {'ccs': 'CCS', 'charge': 'PrecursorCharge', 'fragment_charge': ['FragmentCharge', 'FragmentIonCharge', 'ProductCharge', 'ProductIonCharge'], 'fragment_intensity': ['LibraryIntensity', 'RelativeIntensity', 'RelativeFragmentIntensity', 'RelativeFragmentIonIntensity'], 'fragment_loss_type': ['FragmentLossType', 'FragmentIonLossType', 'ProductLossType', 'ProductIonLossType'], 'fragment_mz': ['ProductMz'], 'fragment_series': ['FragmentSeriesNumber', 'FragmentNumber'], 'fragment_type': ['FragmentType', 'FragmentIonType', 'ProductType', 'ProductIonType'], 'genes': ['GeneName', 'Genes', 'Gene'], 'mobility': ['Mobility', 'IonMobility', 'PrecursorIonMobility'], 'modified_sequence': ['ModifiedPeptideSequence', 'ModifiedPeptide'], 'precursor_mz': 'PrecursorMz', 'proteins': ['ProteinId', 'ProteinID', 'ProteinName', 'Protein Name'], 'raw_name': 'ReferenceRun', 'rt': ['RT', 'iRT', 'Tr_recalibrated', 'RetentionTime', 'NormalizedRetentionTime'], 'sequence': ['PeptideSequence', 'StrippedPeptide'], 'uniprot_ids': ['UniProtIds', 'UniProtID', 'UniprotId']}, 'csv_sep': '\t', 'fixed_C57': False, 'mod_seq_columns': ['ModifiedPeptideSequence', 'ModifiedPeptide', 'ModifiedSequence', 'FullUniModPeptideName', 'LabeledSequence', 'FullUniModPeptideName'], 'modification_mapping': 'maxquant', 'reader_type': 'library_reader_base', 'rt_unit': 'irt'}, 'maxquant': {'column_mapping': {'ccs': 'CCS', 'charge': 'Charge', 'decoy': 'Reverse', 'genes': ['Gene Names', 'Gene names'], 'intensity': 'Intensity', 'mobility': ['Mobility', 'IonMobility', 'K0', '1/K0'], 'precursor_mz': 'm/z', 'proteins': 'Proteins', 'raw_name': 'Raw file', 'rt': 'Retention time', 'scan_num': ['Scan number', 'MS/MS scan number', 'Scan index'], 'score': 'Score', 'sequence': 'Sequence'}, 'fixed_C57': True, 'modification_mapping': {'Acetyl@Protein N-term': ['_(Acetyl (Protein N-term))', '_(ac)'], 'Carbamidomethyl@C': ['C(Carbamidomethyl (C))', 'C(Carbamidomethyl)'], 'Deamidated@N': ['N(Deamidation (NQ))', 'N(de)'], 'Deamidated@Q': ['Q(Deamidation (NQ))', 'Q(de)'], 'Dimethyl@Any N-term': ['(Dimethyl)'], 'Dimethyl@K': ['K(Dimethyl)'], 'Dimethyl@R': ['R(Dimethyl)'], 'GlyGly@K': ['K(GlyGly (K))', 'K(gl)'], 'Oxidation@M': ['M(Oxidation)', 'M(Oxidation (M))', 'M(ox)'], 'Phospho@S': ['S(Phospho (S))', 'S(Phospho (ST))', 'S(Phospho (STY))', 'S(ph)', 'pS'], 'Phospho@T': ['T(Phospho (T))', 'T(Phospho (ST))', 'T(Phospho (STY))', 'T(ph)', 'pT'], 'Phospho@Y': ['Y(Phospho (Y))', 'Y(Phospho (STY))', 'Y(ph)', 'pY']}, 'reader_type': 'maxquant', 'rt_unit': 'minute'}, 'msfragger_pepxml': {'column_mapping': {'charge': 'assumed_charge', 'fdr': 'expect', 'mobility': 'ion_mobility', 'proteins': 'protein', 'query_id': 'spectrum', 'raw_name': 'raw_name', 'rt': 'retention_time_sec', 'scan_num': 'start_scan', 'score': 'expect', 'sequence': 'peptide'}, 'mass_mapped_mods': ['Oxidation@M', 'Carbamidomethyl@C', 'Phospho@S', 'GlyGly@K', 'Cysteinyl@C', 'Acetyl@Any N-term', 'Glu->pyro-Glu@E^Any N-term', 'Gln->pyro-Glu@Q^Any N-term', 'Dimethyl@K', 'Methyl@E'], 'mod_mass_tol': 0.1, 'modification_mapping': {'': ''}, 'reader_type': 'msfragger_pepxml', 'rt_unit': 'second'}, 'pfind': {'column_mapping': {'charge': 'Charge', 'decoy': ['Target/Decoy', 'Targe/Decoy'], 'fdr': 'Q-value', 'proteins': 'Proteins', 'query_id': 'File_Name', 'raw_name': 'raw_name', 'rt': 'RT', 'scan_num': 'Scan_No', 'score': 'Final_Score', 'sequence': 'Sequence', 'uniprot_ids': 'Proteins'}, 'modification_mapping': {'': ''}, 'reader_type': 'pfind', 'rt_unit': 'minute'}, 'sage': {'column_mapping': {'charge': 'charge', 'decoy': 'is_decoy', 'fdr': 'spectrum_q', 'mobility': 'mobility', 'modified_sequence': 'peptide', 'peptide_fdr': 'peptide_q', 'protein_fdr': 'protein_q', 'proteins': 'proteins', 'raw_name': 'filename', 'rt': 'rt', 'scannr': 'scannr', 'score': 'sage_discriminant_score', 'sequence': 'stripped_peptide'}, 'reader_type': 'sage', 'rt_unit': 'minute'}, 'spectronaut': {'column_mapping': {'ccs': 'CCS', 'charge': 'PrecursorCharge', 'genes': ['Genes', 'Gene', 'GeneName', 'GeneNames'], 'mobility': ['Mobility', 'IonMobility', 'PrecursorIonMobility'], 'precursor_mz': 'PrecursorMz', 'proteins': ['Protein Name', 'ProteinId', 'ProteinID', 'ProteinName', 'ProteinGroup', 'ProteinGroups'], 'raw_name': 'ReferenceRun', 'rt': ['RT', 'iRT', 'Tr_recalibrated', 'RetentionTime', 'NormalizedRetentionTime'], 'sequence': ['StrippedPeptide', 'PeptideSequence'], 'uniprot_ids': ['UniProtIds', 'UniProtID', 'UniprotId']}, 'fixed_C57': False, 'mod_seq_columns': ['ModifiedPeptide', 'ModifiedSequence', 'FullUniModPeptideName', 'ModifiedPeptideSequence', 'LabeledSequence', 'FullUniModPeptideName'], 'modification_mapping': 'maxquant', 'reader_type': 'spectronaut', 'rt_unit': 'irt'}, 'spectronaut_report': {'column_mapping': {'charge': 'charge', 'genes': 'PG.Genes', 'proteins': ['PG.ProteinNames', 'PG.ProteinGroups'], 'raw_name': 'R.FileName', 'rt': ['EG.ApexRT', 'EG.MeanApexRT'], 'uniprot_ids': 'PG.UniProtIds'}, 'fixed_C57': False, 'modification_mapping': 'maxquant', 'reader_type': 'spectronaut_report', 'rt_unit': 'minute'}}#

See psm_reader.yaml

alphabase.psm_reader.psm_reader.translate_other_modification(mod_str: str, mod_dict: dict) str[source][source]#

Translate modifications of mod_str to the AlphaBase format mapped by mod_dict.

Parameters:
  • mod_str (str) – mod list in str format, seperated by ‘;’, e.g. ModA;ModB

  • mod_dict (dict) – translate mod dict from others to AlphaBase, e.g. for pFind, key=[‘Phospho[S]’,’Oxidation[M]’], value=[‘Phospho@S’,’Oxidation@M’]

Returns:

new mods in AlphaBase format seperated by ‘;’. if any modification is not in mod_dict, return pd.NA.

Return type:

str