alphabase.pg_reader.mztab_pg_reader¶

FragPipe protein group reader.

Classes:

MZTabPGReader(*[, column_mapping, ...])

Reader for MZTab search engine output.

class alphabase.pg_reader.mztab_pg_reader.MZTabPGReader(*, column_mapping: dict[str, str] | None = None, measurement_regex: str | Literal['assay', 'study_variable'] | None = 'assay')[source][source]¶

Bases: PGReaderBase

Reader for MZTab search engine output.

MZTab is a standardized tab-delimited format for reporting proteomics and metabolomics results. The format organizes data into distinct sections: metadata (MTD), protein groups (PRH/PRT), peptides (PEH/PEP), PSMs (PSH/PSM), and small molecules (SMH/SML), with each section identified by specific three-letter prefixes. This reader extracts protein-level quantification data from the PRT lines, which contain protein abundances across samples or study variables.

Example:¶

Per default, the reader will return the raw intensities from the razor method. Additional protein features are stored in the dataframe index, samples are stored as columns.

from alphabase.pg_reader import MZTabPGReader

# Get raw intensities
reader = MZTabPGReader()
results = reader.import_file(path)

References:¶

Griss, J. et al. The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider Audience*. Molecular & Cellular Proteomics 13, 2765-2775 (2014).
Official MZTab Repository: https://github.com/HUPO-PSI/mzTab.git
Official documentation: https://hupo-psi.github.io/mzTab/

Methods:

__init__(*[, column_mapping, measurement_regex])

Read protein group (PG) matrices into the standardized alphabase format.

__init__(*, column_mapping: dict[str, str] | None = None, measurement_regex: str | Literal['assay', 'study_variable'] | None = 'assay')[source][source]¶

Read protein group (PG) matrices into the standardized alphabase format.

Parameters:

column_mapping – A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml
measurement_regex – Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None uses all columns.

column_mapping¶: Dictionary structure mapping alphabase columns (keys) to the corresponding columns in the other search engine (values), see parameters.

measurement_regex¶: Regular expression that matches quantity of interest for all samples

Notes

Standardizes protein group reports to a protein group dataframe (features x samples) in wide format. Contains at least

sample (run) identifier: :att:`pg_reader.keys.PGCols.SAMPLE_NAME` as column index
protein group identifier: :att:`pg_reader.keys.PGCols.protein` as index
protein group intensity: :att:`pg_reader.keys.PGCols.INTENSITY` as values

Additional feature-level metadata might be available in the index.