alphabase.pg_reader.base

The base class for all PG readers and the provider for all PG readers.

Classes:

PGReaderBase(*[, column_mapping, ...])

Base class for all protein group (PG) readers.

PGReaderProvider()

A factory class to register and get readers for different protein group report types.

Data:

pg_reader_provider

A factory PGReaderProvider object to register and get readers for different protein group report types.

class alphabase.pg_reader.pg_reader.PGReaderBase(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | None = None)[source][source]

Bases: object

Base class for all protein group (PG) readers.

Supports reading of protein groups of common types:

  • Type 1 — Minimal: A basic features x samples matrix. Only intensity values are stored, with sample names as columns and protein groups as the index. Example: AlphaDIA.

  • Type 2 — Multiple Intensity Fields: A wide matrix where each sample may appear multiple times with different quantification types (e.g., SampleA_LFQ, SampleB_raw). Intensity columns are typically identifiable using regular expressions. Only intensity fields are included. Example: AlphaPept.

  • Type 3 — Feature Metadata: A features x samples matrix with one intensity value per sample, plus additional feature-level metadata columns (e.g., gene names, descriptions). Example: DIA-NN.

  • Type 4 — Combined: A composite structure including both multiple intensity fields (Type 2) and feature-level metadata (Type 3). Examples: Spectronaut, MZTab, MaxQuant.

Methods:

__init__(*[, column_mapping, measurement_regex])

Read protein group (PG) matrices into the standardized alphabase format.

add_column_mapping(column_mapping)

Add additional column mappings for the search engine.

get_preconfigured_regex()

Get all predefined regular expressions for this reader class as configured in alphabase.constants.pg_reader_yaml.

import_file(file_path)

Import a protein group (PG) matrix and process it to the alphabase convention.

__init__(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | None = None)[source][source]

Read protein group (PG) matrices into the standardized alphabase format.

Parameters:
  • column_mapping – A dictionary of mapping alphabase columns (keys) to the corresponding columns in the other search engine (values). If None will be loaded from the column_mapping key of the respective search engine in pg_reader.yaml

  • measurement_regex – Regular expression that identifies correct measurement type. Only relevant if PG matrix contains multiple measurement types. For example, alphapept returns the raw protein intensity per sample in column A and the LFQ corrected value in A_LFQ. If None uses all columns.

column_mapping

Dictionary structure mapping alphabase columns (keys) to the corresponding columns in the other search engine (values), see parameters.

measurement_regex

Regular expression that matches quantity of interest for all samples

Notes

Standardizes protein group reports to a protein group dataframe (features x samples) in wide format. Contains at least

Additional feature-level metadata might be available in the index.

add_column_mapping(column_mapping: Dict) None[source][source]

Add additional column mappings for the search engine.

classmethod get_preconfigured_regex() dict[str, str][source][source]

Get all predefined regular expressions for this reader class as configured in alphabase.constants.pg_reader_yaml.

import_file(file_path: str) DataFrame[source][source]

Import a protein group (PG) matrix and process it to the alphabase convention.

Loads the protein group matrix, standardizes feature metadata columns, and filters for the desired measurement type

Parameters:

file_path (str) – Absolute path to the file containing protein group data

Returns:

Protein group matrix with feature metadata as index

Return type:

pd.DataFrame

class alphabase.pg_reader.pg_reader.PGReaderProvider[source][source]

Bases: object

A factory class to register and get readers for different protein group report types.

Methods:

__init__()

Initialize PGReaderProvider.

get_reader(reader_type, *[, column_mapping])

Get a reader by reader_type.

get_reader_by_yaml(yaml_dict)

Get a reader by a yaml dict.

register_reader(reader_type, reader_class)

Register a reader by reader_type.

__init__()[source][source]

Initialize PGReaderProvider.

get_reader(reader_type: str, *, column_mapping: dict | None = None, **kwargs) PGReaderBase[source][source]

Get a reader by reader_type.

get_reader_by_yaml(yaml_dict: dict) PGReaderBase[source][source]

Get a reader by a yaml dict.

register_reader(reader_type: str, reader_class: Type[PGReaderBase]) None[source][source]

Register a reader by reader_type.

alphabase.pg_reader.pg_reader.pg_reader_provider = <alphabase.pg_reader.pg_reader.PGReaderProvider object>

A factory PGReaderProvider object to register and get readers for different protein group report types.