{ "cells": [ { "cell_type": "markdown", "id": "d48b9454", "metadata": {}, "source": [ "# Protein Group readers" ] }, { "cell_type": "code", "execution_count": 1, "id": "3812811e", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "%reload_ext autoreload \n", "%autoreload 2 " ] }, { "cell_type": "code", "execution_count": 2, "id": "c0510071", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/lucas-diedrich/Documents/Projects/alphaX/alphabase/alphabase/alphabase/tools/data_downloader.py:4: DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13\n", " import cgi\n", "/Users/lucas-diedrich/Documents/Projects/alphaX/alphabase/alphabase/alphabase/tools/data_downloader.py:18: ImportWarning: Dependency 'progressbar' not installed. Download progress will not be displayed.\n", " warnings.warn(\n" ] } ], "source": [ "# Helper packages\n", "import io\n", "from copy import copy\n", "from typing import Literal, Optional\n", "\n", "import anndata as ad\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# alphabase\n", "from alphabase.pg_reader import pg_reader_provider\n", "from alphabase.tools.data_downloader import DataShareDownloader" ] }, { "cell_type": "markdown", "id": "e17fe6eb", "metadata": {}, "source": [ "## Background \n", "\n", "The `alphabase.pg_reader` module provides a unifying interface **to read protein group (PG) tables** from different search engines and file formats. It is designed to be easy to use, and to provide a consistent output format in the form of `pandas.DataFrame`s, regardless of the input file format.\n", "\n", "### Introduction to protein group matrices\n", "\n", "Protein group matrices are the primary output for protein-level quantification in proteomics workflows. After search engines identify peptide spectrum matches (PSMs, see [PSM-reader tutorial](../nbs/psm_readers.ipynb)), they aggregate peptide-level evidence to infer protein-level abundances. These protein group tables represent a structured matrix that maps protein groups (features) to samples (observations), with estimated intensity values as entries.\n", "\n", "\n", "A minimal protein group table could look something like this:\n", "\n", "| proteins | sample_1 | sample_2 | sample_3 |\n", "|----------|----------|----------|----------|\n", "| P12345 | 1000.5 | 892.3 | 1150.7 |\n", "| Q67890 | 2500.1 | 2780.9 | 2340.2 |\n", "\n", "\n", "\n", "> 💡 Since some identified peptide sequences can match multiple proteins (such as isoforms or homologues), proteomics search engines typically handle this ambiguity by grouping these proteins into *protein groups* as features.\n", "\n", "\n", "In this example, protein P12345 has quantified intensities of 1000.5, 892.3, and 1150.7 in samples 1, 2, and 3 respectively.\n", "\n", "### Search engine outputs\n", "\n", "In reality, protein group tables are significantly more complex than this, as they contain additional feature-level information about the proteins (e.g., gene names, descriptions, alternative quantification methods), and the quantification (e.g., different intensity types like raw, LFQ quantification, iBAQ). This additional information can be valuable for downstream analyses, but also makes protein group tables a lot more difficult to work with, as the exact names and formats may differ between search engines, versions, and file formats.\n", "\n", "#### Unifying properties \n", "\n", "`alphabase` aligns the column names to a unified vocabulary, facilitating cross-engine comparisons. We can categorize protein group tables into several common types:\n", "\n", "**Type 1 — Minimal**: A basic features × samples matrix. Only intensity values are stored, with sample names as columns and protein groups as the index. *Example*: AlphaDIA.\n", "\n", "**Type 2 — Multiple Intensity Fields**: A wide matrix where each sample may appear multiple times with different quantification types (e.g., `SampleA_LFQ`, `SampleB_raw`). *Example*: AlphaPept.\n", "\n", "**Type 3 — Feature Metadata**: A features × samples matrix with one intensity value per sample, plus additional feature-level metadata columns (e.g., gene names, descriptions). *Example*: DIA-NN.\n", "\n", "**Type 4 — Combined**: A composite structure including both multiple intensity fields (Type 2) and feature-level metadata (Type 3). *Examples*: Spectronaut, MZTab, MaxQuant.\n" ] }, { "cell_type": "markdown", "id": "4547ae8c", "metadata": {}, "source": [ "## Code | Read and parse protein group tables\n", "\n", "The alphabase `pg_reader` module enables users to parse proteomics protein group reports to a dataframe for most common search engines with a single line of code via its `alphabase.pg_reader.pg_reader_provider` factory.\n", "\n", "\n", "All readers return a standardized pandas DataFrame with:\n", "- **Features as index**: Protein identifiers and metadata in the `pandas.DataFrame.Index`\n", "- **Samples as columns**: Sample/run identifiers as column index\n", "- **Intensity values**: Protein quantification data as `pandas.DataFrame.values`\n", "\n", "\n", "\n", "The readers **support different quantification methods** by matching regular expression patterns in the output tables and the **retrieval of desired metadata columns to standardized names**.\n", "\n", "\n", "The unified alphabase format enables seamless comparison and analysis across different search engines, facilitating:\n", "- Method comparison studies\n", "- Data integration workflows\n", "- Standardized downstream analysis pipelines" ] }, { "cell_type": "markdown", "id": "a433a22a", "metadata": {}, "source": [ "### Available readers \n", "\n", "\n", "`alphabase.pg_reader.pg_reader_provider` has registered reader classes for the most common proteomics search engines. A list of implemented readers can be accessed via its `reader_dict` property:" ] }, { "cell_type": "code", "execution_count": 3, "id": "d7eeeefd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Registered readers in alphabase:\n", "\t- alphadia\n", "\t- alphapept\n", "\t- diann\n", "\t- fragpipe\n", "\t- maxquant\n", "\t- mztab\n", "\t- spectronaut\n" ] } ], "source": [ "all_registered_readers = pg_reader_provider.reader_dict.keys()\n", "\n", "# Display all registered readers\n", "sep = \"\\n\\t- \"\n", "print(\"Registered readers in alphabase:\", sep.join(sorted(all_registered_readers)), sep=sep)" ] }, { "cell_type": "markdown", "id": "d8352b39", "metadata": {}, "source": [ "### Interact with the reader provider" ] }, { "cell_type": "code", "execution_count": null, "id": "48b68899", "metadata": {}, "outputs": [], "source": [ "def get_pg_matrix_example(output_dir: Optional[str] = None, search_engine: Literal[\"alphadia\", \"alphapept\", \"spectronaut\"] = \"alphadia\") -> str:\n", " \"\"\"Get example data for the tutorial\n", "\n", " The function downloads example data and stores it\n", " in `output_dir`, or, alternatively in a temporary directory\n", "\n", " Parameter\n", " ---------\n", " output_dir\n", " Output directory. If `None`, creates a temporary directory\n", "\n", " Returns\n", " -------\n", " File location\n", " \"\"\"\n", " EXAMPLE_URLS = {\n", " \"alphadia\": \"https://datashare.biochem.mpg.de/s/4AtCZassaUzRR8K\",\n", " \"alphapept\": \"https://datashare.biochem.mpg.de/s/6G6KHJqwcRPQiOO\",\n", " \"spectronaut\": \"https://datashare.biochem.mpg.de/s/2u7U03wvmQDVT4y\",\n", " }\n", "\n", " if search_engine not in EXAMPLE_URLS:\n", " raise KeyError(f\"{search_engine} not found, select one of {', '.join(EXAMPLE_URLS.keys())}\")\n", "\n", " if output_dir is None:\n", " from tempfile import tempdir\n", "\n", " output_dir = tempdir\n", "\n", " downloader = DataShareDownloader(url=EXAMPLE_URLS[search_engine], output_dir=output_dir)\n", "\n", " return downloader.download()" ] }, { "cell_type": "markdown", "id": "dded4cee", "metadata": {}, "source": [ "### Example 1 - AlphaDIA\n", "\n", "We demonstrate how to interact with protein group tables via alphabase based on a minimal example output of the AlphaDIA search engine. \n", "\n", "First, let's get some minimal example data for the AlphaDIA output. The example data represents a DIA run of 6 HeLA samples on the Orbitrap Astral. \n", "\n", "You can see that the output data contains the feature names in the column `pg` and the computed protein group intensities per sample in the remaining columns.\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "7f4bc10f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/var/folders/py/838_q5nd6594y27wbrpkhl3h0000gn/T/alphadia1.10.4__pg_matrix.tsv already exists (0.8597145080566406 MB)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pg20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0320231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0220231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0120231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_0320231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_0220231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_01
0A0A024RBG15.597816e+056.285112e+050.000000e+003.153867e+052.753702e+054.505648e+05
1A0A024RBG1;Q9NZJ91.331061e+061.400360e+061.551987e+061.606095e+061.464152e+061.397026e+06
2A0A075B759;A0A075B767;P629372.024742e+088.552202e+061.837425e+081.674874e+081.768245e+081.595220e+08
3A0A096LP016.355092e+054.589410e+054.184495e+054.032932e+052.317467e+052.731363e+05
4A0A096LP491.777069e+051.387537e+052.513601e+051.296699e+051.276095e+051.623200e+05
........................
9359Q9Y6X33.898963e+054.353048e+054.150456e+055.069992e+054.195746e+053.675962e+05
9360Q9Y6X61.869312e+050.000000e+000.000000e+002.304623e+052.421623e+050.000000e+00
9361Q9Y6X93.362758e+063.395221e+063.541975e+062.704210e+063.141519e+062.995787e+06
9362Q9Y6Y05.924220e+066.183842e+066.190598e+066.025724e+065.920595e+066.754984e+06
9363Q9Y6Y81.416146e+071.424916e+071.342342e+071.345135e+071.406395e+071.349913e+07
\n", "

9364 rows × 7 columns

\n", "
" ], "text/plain": [ " pg \\\n", "0 A0A024RBG1 \n", "1 A0A024RBG1;Q9NZJ9 \n", "2 A0A075B759;A0A075B767;P62937 \n", "3 A0A096LP01 \n", "4 A0A096LP49 \n", "... ... \n", "9359 Q9Y6X3 \n", "9360 Q9Y6X6 \n", "9361 Q9Y6X9 \n", "9362 Q9Y6Y0 \n", "9363 Q9Y6Y8 \n", "\n", " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_03 \\\n", "0 5.597816e+05 \n", "1 1.331061e+06 \n", "2 2.024742e+08 \n", "3 6.355092e+05 \n", "4 1.777069e+05 \n", "... ... \n", "9359 3.898963e+05 \n", "9360 1.869312e+05 \n", "9361 3.362758e+06 \n", "9362 5.924220e+06 \n", "9363 1.416146e+07 \n", "\n", " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_02 \\\n", "0 6.285112e+05 \n", "1 1.400360e+06 \n", "2 8.552202e+06 \n", "3 4.589410e+05 \n", "4 1.387537e+05 \n", "... ... \n", "9359 4.353048e+05 \n", "9360 0.000000e+00 \n", "9361 3.395221e+06 \n", "9362 6.183842e+06 \n", "9363 1.424916e+07 \n", "\n", " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_01 \\\n", "0 0.000000e+00 \n", "1 1.551987e+06 \n", "2 1.837425e+08 \n", "3 4.184495e+05 \n", "4 2.513601e+05 \n", "... ... \n", "9359 4.150456e+05 \n", "9360 0.000000e+00 \n", "9361 3.541975e+06 \n", "9362 6.190598e+06 \n", "9363 1.342342e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_03 \\\n", "0 3.153867e+05 \n", "1 1.606095e+06 \n", "2 1.674874e+08 \n", "3 4.032932e+05 \n", "4 1.296699e+05 \n", "... ... \n", "9359 5.069992e+05 \n", "9360 2.304623e+05 \n", "9361 2.704210e+06 \n", "9362 6.025724e+06 \n", "9363 1.345135e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_02 \\\n", "0 2.753702e+05 \n", "1 1.464152e+06 \n", "2 1.768245e+08 \n", "3 2.317467e+05 \n", "4 1.276095e+05 \n", "... ... \n", "9359 4.195746e+05 \n", "9360 2.421623e+05 \n", "9361 3.141519e+06 \n", "9362 5.920595e+06 \n", "9363 1.406395e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_01 \n", "0 4.505648e+05 \n", "1 1.397026e+06 \n", "2 1.595220e+08 \n", "3 2.731363e+05 \n", "4 1.623200e+05 \n", "... ... \n", "9359 3.675962e+05 \n", "9360 0.000000e+00 \n", "9361 2.995787e+06 \n", "9362 6.754984e+06 \n", "9363 1.349913e+07 \n", "\n", "[9364 rows x 7 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alphadia_example_path = get_pg_matrix_example(search_engine=\"alphadia\")\n", "\n", "# Parse with pandas for visualization purposes\n", "pd.read_csv(alphadia_example_path, sep=\"\\t\")" ] }, { "cell_type": "markdown", "id": "a0ee852c", "metadata": {}, "source": [ "Then use the `pg_reader_provider.get_reader` method to get the AlphaDIA protein group reader. Use the `import_file` method to read the file, which is directly returned as a :class:`pandas.DataFrame`. \n", "\n", "Note how the dataframe values only contain the actual measurements and how the `pg` column was mapped to the standardized name `uniprot_ids`." ] }, { "cell_type": "code", "execution_count": 6, "id": "18fb6522", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0320231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0220231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_0120231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_0320231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_0220231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_01
uniprot_ids
A0A024RBG15.597816e+056.285112e+050.000000e+003.153867e+052.753702e+054.505648e+05
A0A024RBG1;Q9NZJ91.331061e+061.400360e+061.551987e+061.606095e+061.464152e+061.397026e+06
A0A075B759;A0A075B767;P629372.024742e+088.552202e+061.837425e+081.674874e+081.768245e+081.595220e+08
A0A096LP016.355092e+054.589410e+054.184495e+054.032932e+052.317467e+052.731363e+05
A0A096LP491.777069e+051.387537e+052.513601e+051.296699e+051.276095e+051.623200e+05
.....................
Q9Y6X33.898963e+054.353048e+054.150456e+055.069992e+054.195746e+053.675962e+05
Q9Y6X61.869312e+050.000000e+000.000000e+002.304623e+052.421623e+050.000000e+00
Q9Y6X93.362758e+063.395221e+063.541975e+062.704210e+063.141519e+062.995787e+06
Q9Y6Y05.924220e+066.183842e+066.190598e+066.025724e+065.920595e+066.754984e+06
Q9Y6Y81.416146e+071.424916e+071.342342e+071.345135e+071.406395e+071.349913e+07
\n", "

9364 rows × 6 columns

\n", "
" ], "text/plain": [ " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_03 \\\n", "uniprot_ids \n", "A0A024RBG1 5.597816e+05 \n", "A0A024RBG1;Q9NZJ9 1.331061e+06 \n", "A0A075B759;A0A075B767;P62937 2.024742e+08 \n", "A0A096LP01 6.355092e+05 \n", "A0A096LP49 1.777069e+05 \n", "... ... \n", "Q9Y6X3 3.898963e+05 \n", "Q9Y6X6 1.869312e+05 \n", "Q9Y6X9 3.362758e+06 \n", "Q9Y6Y0 5.924220e+06 \n", "Q9Y6Y8 1.416146e+07 \n", "\n", " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_02 \\\n", "uniprot_ids \n", "A0A024RBG1 6.285112e+05 \n", "A0A024RBG1;Q9NZJ9 1.400360e+06 \n", "A0A075B759;A0A075B767;P62937 8.552202e+06 \n", "A0A096LP01 4.589410e+05 \n", "A0A096LP49 1.387537e+05 \n", "... ... \n", "Q9Y6X3 4.353048e+05 \n", "Q9Y6X6 0.000000e+00 \n", "Q9Y6X9 3.395221e+06 \n", "Q9Y6Y0 6.183842e+06 \n", "Q9Y6Y8 1.424916e+07 \n", "\n", " 20231024_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_before_01 \\\n", "uniprot_ids \n", "A0A024RBG1 0.000000e+00 \n", "A0A024RBG1;Q9NZJ9 1.551987e+06 \n", "A0A075B759;A0A075B767;P62937 1.837425e+08 \n", "A0A096LP01 4.184495e+05 \n", "A0A096LP49 2.513601e+05 \n", "... ... \n", "Q9Y6X3 4.150456e+05 \n", "Q9Y6X6 0.000000e+00 \n", "Q9Y6X9 3.541975e+06 \n", "Q9Y6Y0 6.190598e+06 \n", "Q9Y6Y8 1.342342e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_03 \\\n", "uniprot_ids \n", "A0A024RBG1 3.153867e+05 \n", "A0A024RBG1;Q9NZJ9 1.606095e+06 \n", "A0A075B759;A0A075B767;P62937 1.674874e+08 \n", "A0A096LP01 4.032932e+05 \n", "A0A096LP49 1.296699e+05 \n", "... ... \n", "Q9Y6X3 5.069992e+05 \n", "Q9Y6X6 2.304623e+05 \n", "Q9Y6X9 2.704210e+06 \n", "Q9Y6Y0 6.025724e+06 \n", "Q9Y6Y8 1.345135e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_02 \\\n", "uniprot_ids \n", "A0A024RBG1 2.753702e+05 \n", "A0A024RBG1;Q9NZJ9 1.464152e+06 \n", "A0A075B759;A0A075B767;P62937 1.768245e+08 \n", "A0A096LP01 2.317467e+05 \n", "A0A096LP49 1.276095e+05 \n", "... ... \n", "Q9Y6X3 4.195746e+05 \n", "Q9Y6X6 2.421623e+05 \n", "Q9Y6X9 3.141519e+06 \n", "Q9Y6Y0 5.920595e+06 \n", "Q9Y6Y8 1.406395e+07 \n", "\n", " 20231023_OA3_TiHe_ADIAMA_HeLa_200ng_Evo01_21min_F-40_iO_after_01 \n", "uniprot_ids \n", "A0A024RBG1 4.505648e+05 \n", "A0A024RBG1;Q9NZJ9 1.397026e+06 \n", "A0A075B759;A0A075B767;P62937 1.595220e+08 \n", "A0A096LP01 2.731363e+05 \n", "A0A096LP49 1.623200e+05 \n", "... ... \n", "Q9Y6X3 3.675962e+05 \n", "Q9Y6X6 0.000000e+00 \n", "Q9Y6X9 2.995787e+06 \n", "Q9Y6Y0 6.754984e+06 \n", "Q9Y6Y8 1.349913e+07 \n", "\n", "[9364 rows x 6 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alphadia_reader = pg_reader_provider.get_reader('alphadia')\n", "\n", "# Import the file or a bytestream\n", "alphadia_report = alphadia_reader.import_file(alphadia_example_path)\n", "\n", "# Display the result\n", "alphadia_report" ] }, { "cell_type": "markdown", "id": "fc3d5d90", "metadata": {}, "source": [ "### Example 2 - AlphaPept with different quantification methods\n", "\n", "AlphaPept is a DDA search engine that returns multiple quantification methods (raw intensities, LFQ) in its protein group report. We can use the reader to extract these different types of measurements by specifying the `measurement_regex` parameter.\n", "\n", "AlphaPept reports can be both in a `.hdf` or `.tsv` format. The `pg_readers` support all common data formats (text-based like `.tsv`, `.csv`, and binary like `.hdf` (via extra `alphabase[hdf]` dependency), `.parquet`) out of the box. " ] }, { "cell_type": "code", "execution_count": 7, "id": "48904224", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/var/folders/py/838_q5nd6594y27wbrpkhl3h0000gn/T/alphapept0.5.3__pg_matrix_csv.csv already exists (0.33005523681640625 MB)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0A_LFQB_LFQAB
0sp|P36578|RL4_HUMAN4.669329e+084.844083e+084.452735e+085.060678e+08
1sp|Q9P258|RCC2_HUMAN4.074842e+084.138132e+084.177856e+084.035118e+08
2sp|O60518|RNBP6_HUMAN4.960386e+062.022553e+061.295621e+065.687318e+06
3sp|P55036|PSMD4_HUMAN1.157420e+081.123571e+081.130880e+081.150112e+08
4sp|A1X283|SPD2B_HUMAN1.247112e+071.180582e+071.380177e+071.047516e+07
..................
3776sp|Q14966|ZN638_HUMANNaN1.139844e+06NaN1.139844e+06
3777sp|P84095|RHOG_HUMANNaN9.466796e+05NaN9.466796e+05
3778sp|Q99766|ATP5S_HUMANNaN3.577785e+05NaN3.577785e+05
3779sp|O14925|TIM23_HUMAN,sp|Q5SRD1|TI23B_HUMANNaN9.237994e+05NaN9.237994e+05
3780sp|P51946|CCNH_HUMANNaN9.278844e+05NaN9.278844e+05
\n", "

3781 rows × 5 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 A_LFQ B_LFQ \\\n", "0 sp|P36578|RL4_HUMAN 4.669329e+08 4.844083e+08 \n", "1 sp|Q9P258|RCC2_HUMAN 4.074842e+08 4.138132e+08 \n", "2 sp|O60518|RNBP6_HUMAN 4.960386e+06 2.022553e+06 \n", "3 sp|P55036|PSMD4_HUMAN 1.157420e+08 1.123571e+08 \n", "4 sp|A1X283|SPD2B_HUMAN 1.247112e+07 1.180582e+07 \n", "... ... ... ... \n", "3776 sp|Q14966|ZN638_HUMAN NaN 1.139844e+06 \n", "3777 sp|P84095|RHOG_HUMAN NaN 9.466796e+05 \n", "3778 sp|Q99766|ATP5S_HUMAN NaN 3.577785e+05 \n", "3779 sp|O14925|TIM23_HUMAN,sp|Q5SRD1|TI23B_HUMAN NaN 9.237994e+05 \n", "3780 sp|P51946|CCNH_HUMAN NaN 9.278844e+05 \n", "\n", " A B \n", "0 4.452735e+08 5.060678e+08 \n", "1 4.177856e+08 4.035118e+08 \n", "2 1.295621e+06 5.687318e+06 \n", "3 1.130880e+08 1.150112e+08 \n", "4 1.380177e+07 1.047516e+07 \n", "... ... ... \n", "3776 NaN 1.139844e+06 \n", "3777 NaN 9.466796e+05 \n", "3778 NaN 3.577785e+05 \n", "3779 NaN 9.237994e+05 \n", "3780 NaN 9.278844e+05 \n", "\n", "[3781 rows x 5 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create example MaxQuant data with multiple quantification types\n", "alphapept_example_path = get_pg_matrix_example(search_engine=\"alphapept\")\n", "pd.read_csv(alphapept_example_path)" ] }, { "cell_type": "markdown", "id": "d55ab6db", "metadata": {}, "source": [ "#### Default - raw intensities\n", "Let's first use the default option that imports raw intensities. You can see that the reader automatically extracts only raw intensity columns and that it parses the uniprot header index to a more streamlined format." ] }, { "cell_type": "code", "execution_count": 8, "id": "a9fd0af3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
proteinsuniprot_idsensembl_idssource_dbis_decoy
RL4_HUMANP36578naspFalse445273477.0318756506067774.6891948
RCC2_HUMANQ9P258naspFalse417785611.6324583403511752.8857417
RNBP6_HUMANO60518naspFalse1295621.24666794485687318.493374016
PSMD4_HUMANP55036naspFalse113087994.44403341115011156.7335174
SPD2B_HUMANA1X283naspFalse13801771.73322309210475164.42857083
.....................
ZN638_HUMANQ14966naspFalse1139843.6453892316
RHOG_HUMANP84095naspFalse946679.6466570131
ATP5S_HUMANQ99766naspFalse357778.52002529387
TIM23_HUMAN;TI23B_HUMANO14925;Q5SRD1na;nasp;spFalse923799.3856913601
CCNH_HUMANP51946naspFalse927884.4020782198
\n", "

3781 rows × 2 columns

\n", "
" ], "text/plain": [ " A \\\n", "proteins uniprot_ids ensembl_ids source_db is_decoy \n", "RL4_HUMAN P36578 na sp False 445273477.0318756 \n", "RCC2_HUMAN Q9P258 na sp False 417785611.6324583 \n", "RNBP6_HUMAN O60518 na sp False 1295621.2466679448 \n", "PSMD4_HUMAN P55036 na sp False 113087994.44403341 \n", "SPD2B_HUMAN A1X283 na sp False 13801771.733223092 \n", "... ... \n", "ZN638_HUMAN Q14966 na sp False \n", "RHOG_HUMAN P84095 na sp False \n", "ATP5S_HUMAN Q99766 na sp False \n", "TIM23_HUMAN;TI23B_HUMAN O14925;Q5SRD1 na;na sp;sp False \n", "CCNH_HUMAN P51946 na sp False \n", "\n", " B \n", "proteins uniprot_ids ensembl_ids source_db is_decoy \n", "RL4_HUMAN P36578 na sp False 506067774.6891948 \n", "RCC2_HUMAN Q9P258 na sp False 403511752.8857417 \n", "RNBP6_HUMAN O60518 na sp False 5687318.493374016 \n", "PSMD4_HUMAN P55036 na sp False 115011156.7335174 \n", "SPD2B_HUMAN A1X283 na sp False 10475164.42857083 \n", "... ... \n", "ZN638_HUMAN Q14966 na sp False 1139843.6453892316 \n", "RHOG_HUMAN P84095 na sp False 946679.6466570131 \n", "ATP5S_HUMAN Q99766 na sp False 357778.52002529387 \n", "TIM23_HUMAN;TI23B_HUMAN O14925;Q5SRD1 na;na sp;sp False 923799.3856913601 \n", "CCNH_HUMAN P51946 na sp False 927884.4020782198 \n", "\n", "[3781 rows x 2 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Default: raw intensities\n", "alphapept_reader_default = pg_reader_provider.get_reader('alphapept')\n", "alphapept_reader_default.import_file(alphapept_example_path)" ] }, { "cell_type": "markdown", "id": "3f0497c4", "metadata": {}, "source": [ "#### LFQ runs\n", "We can easily extract the LFQ intensities by selecting the pre-defined regular expression to extract them:" ] }, { "cell_type": "code", "execution_count": 9, "id": "35d8d28d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A_LFQB_LFQ
proteinsuniprot_idsensembl_idssource_dbis_decoy
RL4_HUMANP36578naspFalse466932936.27537036484408315.44570005
RCC2_HUMANQ9P258naspFalse407484183.9302226413813180.5879775
RNBP6_HUMANO60518naspFalse4960386.3745165142022553.3655254466
PSMD4_HUMANP55036naspFalse115742020.94987468112357130.22767611
SPD2B_HUMANA1X283naspFalse12471120.72862131711805815.433172602
.....................
ZN638_HUMANQ14966naspFalse1139843.6453892316
RHOG_HUMANP84095naspFalse946679.6466570131
ATP5S_HUMANQ99766naspFalse357778.52002529387
TIM23_HUMAN;TI23B_HUMANO14925;Q5SRD1na;nasp;spFalse923799.3856913601
CCNH_HUMANP51946naspFalse927884.4020782198
\n", "

3781 rows × 2 columns

\n", "
" ], "text/plain": [ " A_LFQ \\\n", "proteins uniprot_ids ensembl_ids source_db is_decoy \n", "RL4_HUMAN P36578 na sp False 466932936.27537036 \n", "RCC2_HUMAN Q9P258 na sp False 407484183.9302226 \n", "RNBP6_HUMAN O60518 na sp False 4960386.374516514 \n", "PSMD4_HUMAN P55036 na sp False 115742020.94987468 \n", "SPD2B_HUMAN A1X283 na sp False 12471120.728621317 \n", "... ... \n", "ZN638_HUMAN Q14966 na sp False \n", "RHOG_HUMAN P84095 na sp False \n", "ATP5S_HUMAN Q99766 na sp False \n", "TIM23_HUMAN;TI23B_HUMAN O14925;Q5SRD1 na;na sp;sp False \n", "CCNH_HUMAN P51946 na sp False \n", "\n", " B_LFQ \n", "proteins uniprot_ids ensembl_ids source_db is_decoy \n", "RL4_HUMAN P36578 na sp False 484408315.44570005 \n", "RCC2_HUMAN Q9P258 na sp False 413813180.5879775 \n", "RNBP6_HUMAN O60518 na sp False 2022553.3655254466 \n", "PSMD4_HUMAN P55036 na sp False 112357130.22767611 \n", "SPD2B_HUMAN A1X283 na sp False 11805815.433172602 \n", "... ... \n", "ZN638_HUMAN Q14966 na sp False 1139843.6453892316 \n", "RHOG_HUMAN P84095 na sp False 946679.6466570131 \n", "ATP5S_HUMAN Q99766 na sp False 357778.52002529387 \n", "TIM23_HUMAN;TI23B_HUMAN O14925;Q5SRD1 na;na sp;sp False 923799.3856913601 \n", "CCNH_HUMAN P51946 na sp False 927884.4020782198 \n", "\n", "[3781 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# LFQ intensities\n", "alphapept_reader_lfq = pg_reader_provider.get_reader('alphapept', measurement_regex=\"lfq\")\n", "alphapept_reader_lfq.import_file(alphapept_example_path)" ] }, { "cell_type": "markdown", "id": "ddef95cd", "metadata": {}, "source": [ "#### Explore all pre-configured patterns\n", "\n", "You can also pass custom patterns as valid regular expression and check out all pre-configured regular expression sets with the `get_preconfigured_regex` method:" ] }, { "cell_type": "code", "execution_count": 10, "id": "3681a6ac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'raw': '^.*(?\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PG.GenesPG.OrganismsPG.ProteinNamesPTM.CollapseKeyPTM.FlankingRegionPTM.ModificationTitlePTM.MultiplicityPTM.ProteinIdPTM.SiteAAPTM.SiteLocation...[27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity[28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity[29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity[30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity[31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity[32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity[33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity[34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity[35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity[36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity
0TRBV19;TRBHomo sapiensTVB19_HUMAN;TRBR1_HUMANA0A075B6N1_S86_M3IAEGYSVSREKKESFPhospho (STY)3A0A075B6N1S86...69968.8359375103632.601562590488.9296875113429.85937596970.273437561069.17187599673.2734375109199.875112307.4765625112374.84375
1TRBV19;TRBHomo sapiensTVB19_HUMAN;TRBR1_HUMANA0A075B6N1_S84_M3GDIAEGYSVSREKKEPhospho (STY)3A0A075B6N1S84...69968.8359375103632.601562590488.9296875113429.85937596970.273437561069.17187599673.2734375109199.875112307.4765625112374.84375
2TRBV19;TRBHomo sapiensTVB19_HUMAN;TRBR1_HUMANA0A075B6N1_Y83_M3KGDIAEGYSVSREKKPhospho (STY)3A0A075B6N1Y83...69968.8359375103632.601562590488.9296875113429.85937596970.273437561069.17187599673.2734375109199.875112307.4765625112374.84375
3TRBV19;TRBHomo sapiensTVB19_HUMAN;TRBR1_HUMANP0DSE2_S86_M3IAEGYSVSREKKESFPhospho (STY)3P0DSE2S86...69968.8359375103632.601562590488.9296875113429.85937596970.273437561069.17187599673.2734375109199.875112307.4765625112374.84375
4TRBV19;TRBHomo sapiensTVB19_HUMAN;TRBR1_HUMANP0DSE2_S84_M3GDIAEGYSVSREKKEPhospho (STY)3P0DSE2S84...69968.8359375103632.601562590488.9296875113429.85937596970.273437561069.17187599673.2734375109199.875112307.4765625112374.84375
..................................................................
54858MORC2Homo sapiensMORC2_HUMANQ9Y6X9_S739_M2ATPSRKRSVAVSDEEPhospho (STY)2Q9Y6X9S739...23552.46679687522144.58007812520846.851562524248.4179687522490.054687522095.99023437525553.84960937522250.54687514592.86914062519265.998046875
54859MORC2Homo sapiensMORC2_HUMANQ9Y6X9-2_S681_M2RKRSVAVSDEEEVEEPhospho (STY)2Q9Y6X9-2S681...23552.46679687522144.58007812520846.851562524248.4179687522490.054687522095.99023437525553.84960937522250.54687514592.86914062519265.998046875
54860MORC2Homo sapiensMORC2_HUMANQ9Y6X9-2_S677_M2ATPSRKRSVAVSDEEPhospho (STY)2Q9Y6X9-2S677...23552.46679687522144.58007812520846.851562524248.4179687522490.054687522095.99023437525553.84960937522250.54687514592.86914062519265.998046875
54861IVNS1ABPHomo sapiensNS1BP_HUMANQ9Y6Y0_M341_M1SKSLSFEMQQDELIEOxidation (M)1Q9Y6Y0M341...Filtered17287.40625Filtered15751.86132812514749.72460937512410.7929687514130.1396484375Filtered13198.47460937513553.0908203125
54862IVNS1ABPHomo sapiensNS1BP_HUMANQ9Y6Y0_S338_M1PKLSKSLSFEMQQDEPhospho (STY)1Q9Y6Y0S338...Filtered17287.40625Filtered15751.86132812514749.72460937512410.7929687514130.13964843757562.6206054687513198.47460937513553.0908203125
\n", "

54863 rows × 46 columns

\n", "" ], "text/plain": [ " PG.Genes PG.Organisms PG.ProteinNames PTM.CollapseKey \\\n", "0 TRBV19;TRB Homo sapiens TVB19_HUMAN;TRBR1_HUMAN A0A075B6N1_S86_M3 \n", "1 TRBV19;TRB Homo sapiens TVB19_HUMAN;TRBR1_HUMAN A0A075B6N1_S84_M3 \n", "2 TRBV19;TRB Homo sapiens TVB19_HUMAN;TRBR1_HUMAN A0A075B6N1_Y83_M3 \n", "3 TRBV19;TRB Homo sapiens TVB19_HUMAN;TRBR1_HUMAN P0DSE2_S86_M3 \n", "4 TRBV19;TRB Homo sapiens TVB19_HUMAN;TRBR1_HUMAN P0DSE2_S84_M3 \n", "... ... ... ... ... \n", "54858 MORC2 Homo sapiens MORC2_HUMAN Q9Y6X9_S739_M2 \n", "54859 MORC2 Homo sapiens MORC2_HUMAN Q9Y6X9-2_S681_M2 \n", "54860 MORC2 Homo sapiens MORC2_HUMAN Q9Y6X9-2_S677_M2 \n", "54861 IVNS1ABP Homo sapiens NS1BP_HUMAN Q9Y6Y0_M341_M1 \n", "54862 IVNS1ABP Homo sapiens NS1BP_HUMAN Q9Y6Y0_S338_M1 \n", "\n", " PTM.FlankingRegion PTM.ModificationTitle PTM.Multiplicity \\\n", "0 IAEGYSVSREKKESF Phospho (STY) 3 \n", "1 GDIAEGYSVSREKKE Phospho (STY) 3 \n", "2 KGDIAEGYSVSREKK Phospho (STY) 3 \n", "3 IAEGYSVSREKKESF Phospho (STY) 3 \n", "4 GDIAEGYSVSREKKE Phospho (STY) 3 \n", "... ... ... ... \n", "54858 ATPSRKRSVAVSDEE Phospho (STY) 2 \n", "54859 RKRSVAVSDEEEVEE Phospho (STY) 2 \n", "54860 ATPSRKRSVAVSDEE Phospho (STY) 2 \n", "54861 SKSLSFEMQQDELIE Oxidation (M) 1 \n", "54862 PKLSKSLSFEMQQDE Phospho (STY) 1 \n", "\n", " PTM.ProteinId PTM.SiteAA PTM.SiteLocation ... \\\n", "0 A0A075B6N1 S 86 ... \n", "1 A0A075B6N1 S 84 ... \n", "2 A0A075B6N1 Y 83 ... \n", "3 P0DSE2 S 86 ... \n", "4 P0DSE2 S 84 ... \n", "... ... ... ... ... \n", "54858 Q9Y6X9 S 739 ... \n", "54859 Q9Y6X9-2 S 681 ... \n", "54860 Q9Y6X9-2 S 677 ... \n", "54861 Q9Y6Y0 M 341 ... \n", "54862 Q9Y6Y0 S 338 ... \n", "\n", " [27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity \\\n", "0 69968.8359375 \n", "1 69968.8359375 \n", "2 69968.8359375 \n", "3 69968.8359375 \n", "4 69968.8359375 \n", "... ... \n", "54858 23552.466796875 \n", "54859 23552.466796875 \n", "54860 23552.466796875 \n", "54861 Filtered \n", "54862 Filtered \n", "\n", " [28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity \\\n", "0 103632.6015625 \n", "1 103632.6015625 \n", "2 103632.6015625 \n", "3 103632.6015625 \n", "4 103632.6015625 \n", "... ... \n", "54858 22144.580078125 \n", "54859 22144.580078125 \n", "54860 22144.580078125 \n", "54861 17287.40625 \n", "54862 17287.40625 \n", "\n", " [29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity \\\n", "0 90488.9296875 \n", "1 90488.9296875 \n", "2 90488.9296875 \n", "3 90488.9296875 \n", "4 90488.9296875 \n", "... ... \n", "54858 20846.8515625 \n", "54859 20846.8515625 \n", "54860 20846.8515625 \n", "54861 Filtered \n", "54862 Filtered \n", "\n", " [30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity \\\n", "0 113429.859375 \n", "1 113429.859375 \n", "2 113429.859375 \n", "3 113429.859375 \n", "4 113429.859375 \n", "... ... \n", "54858 24248.41796875 \n", "54859 24248.41796875 \n", "54860 24248.41796875 \n", "54861 15751.861328125 \n", "54862 15751.861328125 \n", "\n", " [31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity \\\n", "0 96970.2734375 \n", "1 96970.2734375 \n", "2 96970.2734375 \n", "3 96970.2734375 \n", "4 96970.2734375 \n", "... ... \n", "54858 22490.0546875 \n", "54859 22490.0546875 \n", "54860 22490.0546875 \n", "54861 14749.724609375 \n", "54862 14749.724609375 \n", "\n", " [32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity \\\n", "0 61069.171875 \n", "1 61069.171875 \n", "2 61069.171875 \n", "3 61069.171875 \n", "4 61069.171875 \n", "... ... \n", "54858 22095.990234375 \n", "54859 22095.990234375 \n", "54860 22095.990234375 \n", "54861 12410.79296875 \n", "54862 12410.79296875 \n", "\n", " [33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity \\\n", "0 99673.2734375 \n", "1 99673.2734375 \n", "2 99673.2734375 \n", "3 99673.2734375 \n", "4 99673.2734375 \n", "... ... \n", "54858 25553.849609375 \n", "54859 25553.849609375 \n", "54860 25553.849609375 \n", "54861 14130.1396484375 \n", "54862 14130.1396484375 \n", "\n", " [34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity \\\n", "0 109199.875 \n", "1 109199.875 \n", "2 109199.875 \n", "3 109199.875 \n", "4 109199.875 \n", "... ... \n", "54858 22250.546875 \n", "54859 22250.546875 \n", "54860 22250.546875 \n", "54861 Filtered \n", "54862 7562.62060546875 \n", "\n", " [35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity \\\n", "0 112307.4765625 \n", "1 112307.4765625 \n", "2 112307.4765625 \n", "3 112307.4765625 \n", "4 112307.4765625 \n", "... ... \n", "54858 14592.869140625 \n", "54859 14592.869140625 \n", "54860 14592.869140625 \n", "54861 13198.474609375 \n", "54862 13198.474609375 \n", "\n", " [36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity \n", "0 112374.84375 \n", "1 112374.84375 \n", "2 112374.84375 \n", "3 112374.84375 \n", "4 112374.84375 \n", "... ... \n", "54858 19265.998046875 \n", "54859 19265.998046875 \n", "54860 19265.998046875 \n", "54861 13553.0908203125 \n", "54862 13553.0908203125 \n", "\n", "[54863 rows x 46 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spectronaut_example_path = get_pg_matrix_example(search_engine=\"spectronaut\")\n", "\n", "# Parse with pandas for visualization purposes\n", "pd.read_csv(spectronaut_example_path, sep=\"\\t\")" ] }, { "cell_type": "markdown", "id": "e3c63473", "metadata": {}, "source": [ "The default reader extracts some streamlined information" ] }, { "cell_type": "code", "execution_count": 12, "id": "53449c57", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
[1] 20180815_QE3_nLC3_AH_DIA_Honly_ind_01.raw.PTM.Quantity[2] 20180815_QE3_nLC3_AH_DIA_Honly_ind_02.raw.PTM.Quantity[3] 20180815_QE3_nLC3_AH_DIA_Honly_ind_03.raw.PTM.Quantity[4] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01.raw.PTM.Quantity[5] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02.raw.PTM.Quantity[6] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03.raw.PTM.Quantity[7] 20180816_QE3_nLC3_AH_DIA_H100_Y100_01.raw.PTM.Quantity[8] 20180816_QE3_nLC3_AH_DIA_H100_Y100_02.raw.PTM.Quantity[9] 20180816_QE3_nLC3_AH_DIA_H100_Y100_03.raw.PTM.Quantity[10] 20180816_QE3_nLC3_AH_DIA_H100_Y100_04.raw.PTM.Quantity...[27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity[28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity[29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity[30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity[31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity[32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity[33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity[34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity[35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity[36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity
proteinsgenes
TVB19_HUMAN;TRBR1_HUMANTRBV19;TRBNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
TRBV19;TRBNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
TRBV19;TRBNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
TRBV19;TRBNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
TRBV19;TRBNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
.....................................................................
MORC2_HUMANMORC2NaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
MORC2NaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
MORC2NaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
NS1BP_HUMANIVNS1ABPNaNNaN38411.285156NaNNaNNaN10104.60156212773.76464810412.31152311411.670898...NaN17287.406250NaN15751.86132814749.72460912410.79296914130.139648NaN13198.47460913553.090820
IVNS1ABPNaNNaN38411.285156NaNNaNNaN10104.60156218788.16796910412.31152317367.800781...NaN17287.406250NaN15751.86132814749.72460912410.79296914130.1396487562.62060513198.47460913553.090820
\n", "

54863 rows × 36 columns

\n", "
" ], "text/plain": [ " [1] 20180815_QE3_nLC3_AH_DIA_Honly_ind_01.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 NaN \n", " MORC2 NaN \n", " MORC2 NaN \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [2] 20180815_QE3_nLC3_AH_DIA_Honly_ind_02.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 NaN \n", " MORC2 NaN \n", " MORC2 NaN \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [3] 20180815_QE3_nLC3_AH_DIA_Honly_ind_03.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 6817.745605 \n", " MORC2 6817.745605 \n", " MORC2 6817.745605 \n", "NS1BP_HUMAN IVNS1ABP 38411.285156 \n", " IVNS1ABP 38411.285156 \n", "\n", " [4] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 NaN \n", " MORC2 NaN \n", " MORC2 NaN \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [5] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 NaN \n", " MORC2 NaN \n", " MORC2 NaN \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [6] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 NaN \n", " MORC2 NaN \n", " MORC2 NaN \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [7] 20180816_QE3_nLC3_AH_DIA_H100_Y100_01.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 89374.656250 \n", " TRBV19;TRB 89374.656250 \n", " TRBV19;TRB 89374.656250 \n", " TRBV19;TRB 89374.656250 \n", " TRBV19;TRB 89374.656250 \n", "... ... \n", "MORC2_HUMAN MORC2 18010.679688 \n", " MORC2 18010.679688 \n", " MORC2 18010.679688 \n", "NS1BP_HUMAN IVNS1ABP 10104.601562 \n", " IVNS1ABP 10104.601562 \n", "\n", " [8] 20180816_QE3_nLC3_AH_DIA_H100_Y100_02.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", " TRBV19;TRB NaN \n", "... ... \n", "MORC2_HUMAN MORC2 12501.521484 \n", " MORC2 12501.521484 \n", " MORC2 12501.521484 \n", "NS1BP_HUMAN IVNS1ABP 12773.764648 \n", " IVNS1ABP 18788.167969 \n", "\n", " [9] 20180816_QE3_nLC3_AH_DIA_H100_Y100_03.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 90181.578125 \n", " TRBV19;TRB 90181.578125 \n", " TRBV19;TRB 90181.578125 \n", " TRBV19;TRB 90181.578125 \n", " TRBV19;TRB 90181.578125 \n", "... ... \n", "MORC2_HUMAN MORC2 17377.408203 \n", " MORC2 17377.408203 \n", " MORC2 17377.408203 \n", "NS1BP_HUMAN IVNS1ABP 10412.311523 \n", " IVNS1ABP 10412.311523 \n", "\n", " [10] 20180816_QE3_nLC3_AH_DIA_H100_Y100_04.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 96197.070312 \n", " TRBV19;TRB 96197.070312 \n", " TRBV19;TRB 96197.070312 \n", " TRBV19;TRB 96197.070312 \n", " TRBV19;TRB 96197.070312 \n", "... ... \n", "MORC2_HUMAN MORC2 13730.358398 \n", " MORC2 13730.358398 \n", " MORC2 13730.358398 \n", "NS1BP_HUMAN IVNS1ABP 11411.670898 \n", " IVNS1ABP 17367.800781 \n", "\n", " ... \\\n", "proteins genes ... \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB ... \n", " TRBV19;TRB ... \n", " TRBV19;TRB ... \n", " TRBV19;TRB ... \n", " TRBV19;TRB ... \n", "... ... \n", "MORC2_HUMAN MORC2 ... \n", " MORC2 ... \n", " MORC2 ... \n", "NS1BP_HUMAN IVNS1ABP ... \n", " IVNS1ABP ... \n", "\n", " [27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 69968.835938 \n", " TRBV19;TRB 69968.835938 \n", " TRBV19;TRB 69968.835938 \n", " TRBV19;TRB 69968.835938 \n", " TRBV19;TRB 69968.835938 \n", "... ... \n", "MORC2_HUMAN MORC2 23552.466797 \n", " MORC2 23552.466797 \n", " MORC2 23552.466797 \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 103632.601562 \n", " TRBV19;TRB 103632.601562 \n", " TRBV19;TRB 103632.601562 \n", " TRBV19;TRB 103632.601562 \n", " TRBV19;TRB 103632.601562 \n", "... ... \n", "MORC2_HUMAN MORC2 22144.580078 \n", " MORC2 22144.580078 \n", " MORC2 22144.580078 \n", "NS1BP_HUMAN IVNS1ABP 17287.406250 \n", " IVNS1ABP 17287.406250 \n", "\n", " [29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 90488.929688 \n", " TRBV19;TRB 90488.929688 \n", " TRBV19;TRB 90488.929688 \n", " TRBV19;TRB 90488.929688 \n", " TRBV19;TRB 90488.929688 \n", "... ... \n", "MORC2_HUMAN MORC2 20846.851562 \n", " MORC2 20846.851562 \n", " MORC2 20846.851562 \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP NaN \n", "\n", " [30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 113429.859375 \n", " TRBV19;TRB 113429.859375 \n", " TRBV19;TRB 113429.859375 \n", " TRBV19;TRB 113429.859375 \n", " TRBV19;TRB 113429.859375 \n", "... ... \n", "MORC2_HUMAN MORC2 24248.417969 \n", " MORC2 24248.417969 \n", " MORC2 24248.417969 \n", "NS1BP_HUMAN IVNS1ABP 15751.861328 \n", " IVNS1ABP 15751.861328 \n", "\n", " [31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 96970.273438 \n", " TRBV19;TRB 96970.273438 \n", " TRBV19;TRB 96970.273438 \n", " TRBV19;TRB 96970.273438 \n", " TRBV19;TRB 96970.273438 \n", "... ... \n", "MORC2_HUMAN MORC2 22490.054688 \n", " MORC2 22490.054688 \n", " MORC2 22490.054688 \n", "NS1BP_HUMAN IVNS1ABP 14749.724609 \n", " IVNS1ABP 14749.724609 \n", "\n", " [32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 61069.171875 \n", " TRBV19;TRB 61069.171875 \n", " TRBV19;TRB 61069.171875 \n", " TRBV19;TRB 61069.171875 \n", " TRBV19;TRB 61069.171875 \n", "... ... \n", "MORC2_HUMAN MORC2 22095.990234 \n", " MORC2 22095.990234 \n", " MORC2 22095.990234 \n", "NS1BP_HUMAN IVNS1ABP 12410.792969 \n", " IVNS1ABP 12410.792969 \n", "\n", " [33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 99673.273438 \n", " TRBV19;TRB 99673.273438 \n", " TRBV19;TRB 99673.273438 \n", " TRBV19;TRB 99673.273438 \n", " TRBV19;TRB 99673.273438 \n", "... ... \n", "MORC2_HUMAN MORC2 25553.849609 \n", " MORC2 25553.849609 \n", " MORC2 25553.849609 \n", "NS1BP_HUMAN IVNS1ABP 14130.139648 \n", " IVNS1ABP 14130.139648 \n", "\n", " [34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 109199.875000 \n", " TRBV19;TRB 109199.875000 \n", " TRBV19;TRB 109199.875000 \n", " TRBV19;TRB 109199.875000 \n", " TRBV19;TRB 109199.875000 \n", "... ... \n", "MORC2_HUMAN MORC2 22250.546875 \n", " MORC2 22250.546875 \n", " MORC2 22250.546875 \n", "NS1BP_HUMAN IVNS1ABP NaN \n", " IVNS1ABP 7562.620605 \n", "\n", " [35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity \\\n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 112307.476562 \n", " TRBV19;TRB 112307.476562 \n", " TRBV19;TRB 112307.476562 \n", " TRBV19;TRB 112307.476562 \n", " TRBV19;TRB 112307.476562 \n", "... ... \n", "MORC2_HUMAN MORC2 14592.869141 \n", " MORC2 14592.869141 \n", " MORC2 14592.869141 \n", "NS1BP_HUMAN IVNS1ABP 13198.474609 \n", " IVNS1ABP 13198.474609 \n", "\n", " [36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity \n", "proteins genes \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB 112374.843750 \n", " TRBV19;TRB 112374.843750 \n", " TRBV19;TRB 112374.843750 \n", " TRBV19;TRB 112374.843750 \n", " TRBV19;TRB 112374.843750 \n", "... ... \n", "MORC2_HUMAN MORC2 19265.998047 \n", " MORC2 19265.998047 \n", " MORC2 19265.998047 \n", "NS1BP_HUMAN IVNS1ABP 13553.090820 \n", " IVNS1ABP 13553.090820 \n", "\n", "[54863 rows x 36 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example with custom column mapping\n", "reader = pg_reader_provider.get_reader('spectronaut')\n", "reader.import_file(spectronaut_example_path)" ] }, { "cell_type": "markdown", "id": "c3c69b74", "metadata": {}, "source": [ "Let's say that we are also interested in the PTM site in the sample. We can extract this information as well by using the `add_column_mapping` method:" ] }, { "cell_type": "code", "execution_count": 13, "id": "e83439c3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
[1] 20180815_QE3_nLC3_AH_DIA_Honly_ind_01.raw.PTM.Quantity[2] 20180815_QE3_nLC3_AH_DIA_Honly_ind_02.raw.PTM.Quantity[3] 20180815_QE3_nLC3_AH_DIA_Honly_ind_03.raw.PTM.Quantity[4] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01.raw.PTM.Quantity[5] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02.raw.PTM.Quantity[6] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03.raw.PTM.Quantity[7] 20180816_QE3_nLC3_AH_DIA_H100_Y100_01.raw.PTM.Quantity[8] 20180816_QE3_nLC3_AH_DIA_H100_Y100_02.raw.PTM.Quantity[9] 20180816_QE3_nLC3_AH_DIA_H100_Y100_03.raw.PTM.Quantity[10] 20180816_QE3_nLC3_AH_DIA_H100_Y100_04.raw.PTM.Quantity...[27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity[28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity[29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity[30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity[31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity[32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity[33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity[34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity[35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity[36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity
proteinsgenesptm_site_amino_acid
TVB19_HUMAN;TRBR1_HUMANTRBV19;TRBSNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
SNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
YNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
SNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
SNaNNaNNaNNaNNaNNaN89374.656250NaN90181.57812596197.070312...69968.835938103632.60156290488.929688113429.85937596970.27343861069.17187599673.273438109199.875000112307.476562112374.843750
........................................................................
MORC2_HUMANMORC2SNaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
SNaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
SNaNNaN6817.745605NaNNaNNaN18010.67968812501.52148417377.40820313730.358398...23552.46679722144.58007820846.85156224248.41796922490.05468822095.99023425553.84960922250.54687514592.86914119265.998047
NS1BP_HUMANIVNS1ABPMNaNNaN38411.285156NaNNaNNaN10104.60156212773.76464810412.31152311411.670898...NaN17287.406250NaN15751.86132814749.72460912410.79296914130.139648NaN13198.47460913553.090820
SNaNNaN38411.285156NaNNaNNaN10104.60156218788.16796910412.31152317367.800781...NaN17287.406250NaN15751.86132814749.72460912410.79296914130.1396487562.62060513198.47460913553.090820
\n", "

54863 rows × 36 columns

\n", "
" ], "text/plain": [ " [1] 20180815_QE3_nLC3_AH_DIA_Honly_ind_01.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S NaN \n", " S NaN \n", " S NaN \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [2] 20180815_QE3_nLC3_AH_DIA_Honly_ind_02.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S NaN \n", " S NaN \n", " S NaN \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [3] 20180815_QE3_nLC3_AH_DIA_Honly_ind_03.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S 6817.745605 \n", " S 6817.745605 \n", " S 6817.745605 \n", "NS1BP_HUMAN IVNS1ABP M 38411.285156 \n", " S 38411.285156 \n", "\n", " [4] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_01.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S NaN \n", " S NaN \n", " S NaN \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [5] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_02.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S NaN \n", " S NaN \n", " S NaN \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [6] 20180815_QE3_nLC3_AH_DIA_Yonly_ind_03.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S NaN \n", " S NaN \n", " S NaN \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [7] 20180816_QE3_nLC3_AH_DIA_H100_Y100_01.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 89374.656250 \n", " S 89374.656250 \n", " Y 89374.656250 \n", " S 89374.656250 \n", " S 89374.656250 \n", "... ... \n", "MORC2_HUMAN MORC2 S 18010.679688 \n", " S 18010.679688 \n", " S 18010.679688 \n", "NS1BP_HUMAN IVNS1ABP M 10104.601562 \n", " S 10104.601562 \n", "\n", " [8] 20180816_QE3_nLC3_AH_DIA_H100_Y100_02.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S NaN \n", " S NaN \n", " Y NaN \n", " S NaN \n", " S NaN \n", "... ... \n", "MORC2_HUMAN MORC2 S 12501.521484 \n", " S 12501.521484 \n", " S 12501.521484 \n", "NS1BP_HUMAN IVNS1ABP M 12773.764648 \n", " S 18788.167969 \n", "\n", " [9] 20180816_QE3_nLC3_AH_DIA_H100_Y100_03.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 90181.578125 \n", " S 90181.578125 \n", " Y 90181.578125 \n", " S 90181.578125 \n", " S 90181.578125 \n", "... ... \n", "MORC2_HUMAN MORC2 S 17377.408203 \n", " S 17377.408203 \n", " S 17377.408203 \n", "NS1BP_HUMAN IVNS1ABP M 10412.311523 \n", " S 10412.311523 \n", "\n", " [10] 20180816_QE3_nLC3_AH_DIA_H100_Y100_04.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 96197.070312 \n", " S 96197.070312 \n", " Y 96197.070312 \n", " S 96197.070312 \n", " S 96197.070312 \n", "... ... \n", "MORC2_HUMAN MORC2 S 13730.358398 \n", " S 13730.358398 \n", " S 13730.358398 \n", "NS1BP_HUMAN IVNS1ABP M 11411.670898 \n", " S 17367.800781 \n", "\n", " ... \\\n", "proteins genes ptm_site_amino_acid ... \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S ... \n", " S ... \n", " Y ... \n", " S ... \n", " S ... \n", "... ... \n", "MORC2_HUMAN MORC2 S ... \n", " S ... \n", " S ... \n", "NS1BP_HUMAN IVNS1ABP M ... \n", " S ... \n", "\n", " [27] 20180816_QE3_nLC3_AH_DIA_H100_Y25_03.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 69968.835938 \n", " S 69968.835938 \n", " Y 69968.835938 \n", " S 69968.835938 \n", " S 69968.835938 \n", "... ... \n", "MORC2_HUMAN MORC2 S 23552.466797 \n", " S 23552.466797 \n", " S 23552.466797 \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [28] 20180816_QE3_nLC3_AH_DIA_H100_Y25_04.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 103632.601562 \n", " S 103632.601562 \n", " Y 103632.601562 \n", " S 103632.601562 \n", " S 103632.601562 \n", "... ... \n", "MORC2_HUMAN MORC2 S 22144.580078 \n", " S 22144.580078 \n", " S 22144.580078 \n", "NS1BP_HUMAN IVNS1ABP M 17287.406250 \n", " S 17287.406250 \n", "\n", " [29] 20180816_QE3_nLC3_AH_DIA_H100_Y25_05.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 90488.929688 \n", " S 90488.929688 \n", " Y 90488.929688 \n", " S 90488.929688 \n", " S 90488.929688 \n", "... ... \n", "MORC2_HUMAN MORC2 S 20846.851562 \n", " S 20846.851562 \n", " S 20846.851562 \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S NaN \n", "\n", " [30] 20180816_QE3_nLC3_AH_DIA_H100_Y25_06.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 113429.859375 \n", " S 113429.859375 \n", " Y 113429.859375 \n", " S 113429.859375 \n", " S 113429.859375 \n", "... ... \n", "MORC2_HUMAN MORC2 S 24248.417969 \n", " S 24248.417969 \n", " S 24248.417969 \n", "NS1BP_HUMAN IVNS1ABP M 15751.861328 \n", " S 15751.861328 \n", "\n", " [31] 20180816_QE3_nLC3_AH_DIA_H100_Y50_01.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 96970.273438 \n", " S 96970.273438 \n", " Y 96970.273438 \n", " S 96970.273438 \n", " S 96970.273438 \n", "... ... \n", "MORC2_HUMAN MORC2 S 22490.054688 \n", " S 22490.054688 \n", " S 22490.054688 \n", "NS1BP_HUMAN IVNS1ABP M 14749.724609 \n", " S 14749.724609 \n", "\n", " [32] 20180816_QE3_nLC3_AH_DIA_H100_Y50_02.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 61069.171875 \n", " S 61069.171875 \n", " Y 61069.171875 \n", " S 61069.171875 \n", " S 61069.171875 \n", "... ... \n", "MORC2_HUMAN MORC2 S 22095.990234 \n", " S 22095.990234 \n", " S 22095.990234 \n", "NS1BP_HUMAN IVNS1ABP M 12410.792969 \n", " S 12410.792969 \n", "\n", " [33] 20180816_QE3_nLC3_AH_DIA_H100_Y50_03.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 99673.273438 \n", " S 99673.273438 \n", " Y 99673.273438 \n", " S 99673.273438 \n", " S 99673.273438 \n", "... ... \n", "MORC2_HUMAN MORC2 S 25553.849609 \n", " S 25553.849609 \n", " S 25553.849609 \n", "NS1BP_HUMAN IVNS1ABP M 14130.139648 \n", " S 14130.139648 \n", "\n", " [34] 20180816_QE3_nLC3_AH_DIA_H100_Y50_04.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 109199.875000 \n", " S 109199.875000 \n", " Y 109199.875000 \n", " S 109199.875000 \n", " S 109199.875000 \n", "... ... \n", "MORC2_HUMAN MORC2 S 22250.546875 \n", " S 22250.546875 \n", " S 22250.546875 \n", "NS1BP_HUMAN IVNS1ABP M NaN \n", " S 7562.620605 \n", "\n", " [35] 20180816_QE3_nLC3_AH_DIA_H100_Y50_05.raw.PTM.Quantity \\\n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 112307.476562 \n", " S 112307.476562 \n", " Y 112307.476562 \n", " S 112307.476562 \n", " S 112307.476562 \n", "... ... \n", "MORC2_HUMAN MORC2 S 14592.869141 \n", " S 14592.869141 \n", " S 14592.869141 \n", "NS1BP_HUMAN IVNS1ABP M 13198.474609 \n", " S 13198.474609 \n", "\n", " [36] 20180816_QE3_nLC3_AH_DIA_H100_Y50_06.raw.PTM.Quantity \n", "proteins genes ptm_site_amino_acid \n", "TVB19_HUMAN;TRBR1_HUMAN TRBV19;TRB S 112374.843750 \n", " S 112374.843750 \n", " Y 112374.843750 \n", " S 112374.843750 \n", " S 112374.843750 \n", "... ... \n", "MORC2_HUMAN MORC2 S 19265.998047 \n", " S 19265.998047 \n", " S 19265.998047 \n", "NS1BP_HUMAN IVNS1ABP M 13553.090820 \n", " S 13553.090820 \n", "\n", "[54863 rows x 36 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add custom column mapping for organism information\n", "reader.add_column_mapping({\"ptm_site_amino_acid\": \"PTM.SiteAA\"})\n", "reader.import_file(spectronaut_example_path)" ] }, { "cell_type": "markdown", "id": "ba18279a", "metadata": {}, "source": [ "## scVerse compatibility \n", "\n", "The standardized format also allows users to easily convert the protein group tables to widely used `-omics` formats like `anndata.AnnData`." ] }, { "cell_type": "code", "execution_count": 14, "id": "6cee0251", "metadata": {}, "outputs": [], "source": [ "def create_anndata_from_pg_matrix(file_path: str, search_engine: str, **kwargs) -> ad.AnnData:\n", " \"\"\"Get anndata object from PG matrix.\"\"\"\n", "\n", " reader = pg_reader_provider.get_reader(search_engine, **kwargs)\n", " df = reader.import_file(file_path)\n", " return ad.AnnData(\n", " X=df.values.T,\n", " var=df.index.to_frame(),\n", " obs = df.columns.to_frame(name=\"sample_id\")\n", " )" ] }, { "cell_type": "code", "execution_count": 15, "id": "6619e5f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 6 × 9364\n", " obs: 'sample_id'\n", " var: 'uniprot_ids'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata = create_anndata_from_pg_matrix(\n", " alphadia_example_path, search_engine=\"alphadia\"\n", ")\n", "\n", "adata" ] }, { "cell_type": "markdown", "id": "ce2e70a6", "metadata": {}, "source": [ "## Conclusion\n", "\n", "The alphabase protein group reader module provides:\n", "\n", "- **Unified interface** for reading protein group tables from multiple search engines\n", "- **Standardized output format** that facilitates cross-engine comparisons and downstream analyses\n", "- **Flexible quantification options** to extract different measurement types (raw, LFQ, iBAQ)\n", "- **Extensible architecture** that supports custom column mappings and new search engines\n", "\n", "This standardization enables researchers to focus on biological insights rather than data format complexities." ] } ], "metadata": { "kernelspec": { "display_name": "alphabase-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" } }, "nbformat": 4, "nbformat_minor": 5 }