alphabase.pg_reader.base¶
The base class for all PG readers and the provider for all PG readers.
Classes:
|
Base class for all protein group (PG) readers. |
A factory class to register and get readers for different protein group report types. |
Data:
A factory |
- class alphabase.pg_reader.pg_reader.PGReaderBase(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | None = None)[source][source]¶
Bases:
objectBase class for all protein group (PG) readers.
Supports reading of protein groups of common types:
Type 1 — Minimal: A basic features x samples matrix. Only intensity values are stored, with sample names as columns and protein groups as the index. Example: AlphaDIA.
Type 2 — Multiple Intensity Fields: A wide matrix where each sample may appear multiple times with different quantification types (e.g., SampleA_LFQ, SampleB_raw). Intensity columns are typically identifiable using regular expressions. Only intensity fields are included. Example: AlphaPept.
Type 3 — Feature Metadata: A features x samples matrix with one intensity value per sample, plus additional feature-level metadata columns (e.g., gene names, descriptions). Example: DIA-NN.
Type 4 — Combined: A composite structure including both multiple intensity fields (Type 2) and feature-level metadata (Type 3). Examples: Spectronaut, MZTab, MaxQuant.
Methods:
__init__(*[, column_mapping, measurement_regex])Read protein group (PG) matrices into the standardized alphabase format.
add_column_mapping(column_mapping)Add additional column mappings for the search engine.
Get all predefined regular expressions for this reader class as configured in alphabase.constants.pg_reader_yaml.
import_file(file_path)Import a protein group (PG) matrix and process it to the alphabase convention.
- __init__(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | None = None)[source][source]¶
Read protein group (PG) matrices into the standardized alphabase format.
- Parameters:
column_mapping – A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml
measurement_regex – Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None uses all columns.
- column_mapping¶
Dictionary structure mapping alphabase columns (keys) to the corresponding columns in the other search engine (values), see parameters.
- measurement_regex¶
Regular expression that matches quantity of interest for all samples
Notes
- Standardizes protein group reports to a protein group dataframe (features x samples) in wide format. Contains at least
sample (run) identifier: :att:`pg_reader.keys.PGCols.SAMPLE_NAME` as column index
protein group identifier: :att:`pg_reader.keys.PGCols.protein` as index
protein group intensity: :att:`pg_reader.keys.PGCols.INTENSITY` as values
Additional feature-level metadata might be available in the index.
- add_column_mapping(column_mapping: Dict) None[source][source]¶
Add additional column mappings for the search engine.
- classmethod get_preconfigured_regex() dict[str, str][source][source]¶
Get all predefined regular expressions for this reader class as configured in alphabase.constants.pg_reader_yaml.
- import_file(file_path: str) DataFrame[source][source]¶
Import a protein group (PG) matrix and process it to the alphabase convention.
Loads the protein group matrix, standardizes feature metadata columns, and filters for the desired measurement type
- Parameters:
file_path (str) – Absolute path to the file containing protein group data
- Returns:
Protein group matrix with feature metadata as index
- Return type:
pd.DataFrame
- class alphabase.pg_reader.pg_reader.PGReaderProvider[source][source]¶
Bases:
objectA factory class to register and get readers for different protein group report types.
Methods:
__init__()Initialize PGReaderProvider.
get_reader(reader_type, *[, column_mapping])Get a reader by reader_type.
get_reader_by_yaml(yaml_dict)Get a reader by a yaml dict.
register_reader(reader_type, reader_class)Register a reader by reader_type.
- get_reader(reader_type: str, *, column_mapping: dict | None = None, **kwargs) PGReaderBase[source][source]¶
Get a reader by reader_type.
- get_reader_by_yaml(yaml_dict: dict) PGReaderBase[source][source]¶
Get a reader by a yaml dict.
- register_reader(reader_type: str, reader_class: Type[PGReaderBase]) None[source][source]¶
Register a reader by reader_type.
- alphabase.pg_reader.pg_reader.pg_reader_provider = <alphabase.pg_reader.pg_reader.PGReaderProvider object>¶
A factory
PGReaderProviderobject to register and get readers for different protein group report types.