alphabase.pg_reader.mztab_pg_reader¶
FragPipe protein group reader.
Classes:
|
Reader for MZTab search engine output. |
- class alphabase.pg_reader.mztab_pg_reader.MZTabPGReader(*, column_mapping: dict[str, str] | None = None, measurement_regex: str | Literal['assay', 'study_variable'] | None = 'assay')[source][source]¶
Bases:
PGReaderBaseReader for MZTab search engine output.
MZTab is a standardized tab-delimited format for reporting proteomics and metabolomics results. The format organizes data into distinct sections: metadata (MTD), protein groups (PRH/PRT), peptides (PEH/PEP), PSMs (PSH/PSM), and small molecules (SMH/SML), with each section identified by specific three-letter prefixes. This reader extracts protein-level quantification data from the PRT lines, which contain protein abundances across samples or study variables.
Example:¶
Per default, the reader will return the raw intensities from the razor method. Additional protein features are stored in the dataframe index, samples are stored as columns.
from alphabase.pg_reader import MZTabPGReader # Get raw intensities reader = MZTabPGReader() results = reader.import_file(path)
References:¶
Griss, J. et al. The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience*. Molecular & Cellular Proteomics 13, 2765-2775 (2014).
Official MZTab Repository: https://github.com/HUPO-PSI/mzTab.git
Official documentation: https://hupo-psi.github.io/mzTab/
Methods:
__init__(*[, column_mapping, measurement_regex])Read protein group (PG) matrices into the standardized alphabase format.
- __init__(*, column_mapping: dict[str, str] | None = None, measurement_regex: str | Literal['assay', 'study_variable'] | None = 'assay')[source][source]¶
Read protein group (PG) matrices into the standardized alphabase format.
- Parameters:
column_mapping – A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml
measurement_regex – Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None uses all columns.
- column_mapping¶
Dictionary structure mapping alphabase columns (keys) to the corresponding columns in the other search engine (values), see parameters.
- measurement_regex¶
Regular expression that matches quantity of interest for all samples
Notes
- Standardizes protein group reports to a protein group dataframe (features x samples) in wide format. Contains at least
sample (run) identifier: :att:`pg_reader.keys.PGCols.SAMPLE_NAME` as column index
protein group identifier: :att:`pg_reader.keys.PGCols.protein` as index
protein group intensity: :att:`pg_reader.keys.PGCols.INTENSITY` as values
Additional feature-level metadata might be available in the index.