alphabase.pg_reader.alphapept_pg_reader

AlphaPept protein group reader.

Classes:

AlphaPeptPGReader(*[, column_mapping, ...])

Reader for protein group matrices from the alphapept search engine.

class alphabase.pg_reader.alphapept_pg_reader.AlphaPeptPGReader(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | Literal['raw', 'lfq'] | None = 'raw')[source][source]

Bases: PGReaderBase

Reader for protein group matrices from the alphapept search engine.

Per default, the reader will read raw intensities from the protein group matrix. By passing a suitable regular expression, it is also possible to extract LFQ corrected intensities from the reader.

Notes:

AlphaPept protein group matrices contain both raw intensities and LFQ-corrected intensities. The LFQ-corrected intensities are marked by an _LFQ suffix.

In order to read alphapept .hdf output, please install the package with extra optional dependencies pip install “alphabase[hdf]”.

Example:

Get example data

import os
import tempfile
from alphabase.tools.data_downloader import DataShareDownloader
from alphabase.pg_reader import AlphaPeptPGReader


# Download to temporary directory
URL = "https://datashare.biochem.mpg.de/s/6G6KHJqwcRPQiOO"
download_dir = tempfile.mkdtemp()

download_path = DataShareDownloader(url=URL, output_dir=download_dir).download()

Per default, the reader will return the raw intensities. Additional protein features are stored in the dataframe index, samples are stored as columns.

# Get raw intensities
reader = AlphaPeptPGReader()
results = reader.import_file(download_path)
results.index.names
> FrozenList(['proteins', 'uniprot_ids', 'ensembl_ids', 'source_db', 'is_decoy'])
results.columns
> Index(['A', 'B'], dtype='object')

To read the LFQ values, pass the pre-configured key lfq to the reader, which represents a regular expression that automatically extracts the LFQ columns from the protein group table.

# Get raw intensities
reader = AlphaPeptPGReader(measurement_regex="lfq")
results = reader.import_file(download_path)
results.index.names
> FrozenList(['proteins', 'uniprot_ids', 'ensembl_ids', 'source_db', 'is_decoy'])
results.columns
> Index(['A_LFQ', 'B_LFQ'], dtype='object')

To check out all preconfigured regular expressions, use the get_preconfigured_regex method:

AlphaPeptPGReader.get_preconfigured_regex()
> {'raw': '^.*(?<!_LFQ)$', 'lfq': '_LFQ$'}

Methods:

__init__(*[, column_mapping, measurement_regex])

Initialize AlphaPept protein group matrix reader.

__init__(*, column_mapping: dict[str, Any] | None = None, measurement_regex: str | Literal['raw', 'lfq'] | None = 'raw')[source][source]

Initialize AlphaPept protein group matrix reader.

Parameters:
  • column_mapping – Dictionary mapping alphabase column names (keys) to AlphaPept column names (values). If None, uses default mapping from configuration file.

  • measurement_regex

    Pattern to select quantity columns

    • ”raw” (default): Raw intensities (excludes _LFQ columns)

    • ”lfq”: LFQ-corrected intensities (_LFQ suffix)

    • str: Custom regular expression pattern

    • None: All quantity columns

    See class documentation for usage examples and get_preconfigured_regex() for available patterns.