{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Basic Definitions and Settings\n", "\n", "Measuring m/z values is the elemental function of MS technologies, therefore the calculation of mass values for a peptide and its fragments becomes the most essential part in MS-based computational tools. AlphaBase calculates all mass values from atoms. And the masses of amino acids and modifications are calculated from their atom compositions, repectively. Eventually, the masses of peptides or precursors as well as their fragments can be calculated by the amino acid sequences with or without modifications (See figure below).\n", "\n", "Calculating masses from atoms makes it much easier to switch between unlabeled and heavy-labeled peptides, as we did in Stellar MS for 15N-labeled peptides as the reference for targeted proteomics (https://www.biorxiv.org/content/10.1101/2024.06.02.597029v2.full).\n", "\n", "The other advantage of starting from atoms is that AlphaBase can calculate isotope distributions of peptides based on a pre-defined isotope distribution list of atoms (e.g., NIST atom table in https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl). The isotope information has been applied in our alphaDIA search engine to boost the identification of DIA-MS data (https://www.biorxiv.org/content/10.1101/2024.05.28.596182v1)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Atoms/Elements\n", "\n", "The masses of all amino acids and modifications are calculated from their atom compositions.\n", "\n", "The atom information are defined in https://github.com/MannLabs/alphabase/blob/main/alphabase/constants/const_files/nist_element.yaml which is parsed from NIST, see https://github.com/MannLabs/alphabase/blob/main/scripts/nist_chem_to_yaml.ipynb.\n", "\n", "After adding some heavy isotopes, including 13C, 15N, 2H, and 18O, we obtain 109 kinds of atoms:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2025-01-30T16:49:22.699057Z", "start_time": "2025-01-30T16:49:22.690604Z" }, "execution": { "iopub.execute_input": "2026-01-05T22:43:42.987776Z", "iopub.status.busy": "2026-01-05T22:43:42.987610Z", "iopub.status.idle": "2026-01-05T22:43:45.610885Z", "shell.execute_reply": "2026-01-05T22:43:45.610625Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | abundance | \n", "mass | \n", "
|---|---|---|
| 13C | \n", "[0.01, 0.99] | \n", "[12.0, 13.00335483507] | \n", "
| 14N | \n", "[0.996337, 0.003663] | \n", "[14.00307400443, 15.00010889888] | \n", "
| 15N | \n", "[0.01, 0.99] | \n", "[14.00307400443, 15.00010889888] | \n", "
| 18O | \n", "[0.005, 0.005, 0.99] | \n", "[15.99491461957, 16.9991317565, 17.99915961286] | \n", "
| 2H | \n", "[0.01, 0.99] | \n", "[1.00782503223, 2.01410177812] | \n", "
| ... | \n", "... | \n", "... | \n", "
| Xe | \n", "[0.000952, 0.00089, 0.019102, 0.264006, 0.0407... | \n", "[123.905892, 125.9042983, 127.903531, 128.9047... | \n", "
| Y | \n", "[1.0] | \n", "[88.9058403] | \n", "
| Yb | \n", "[0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0.... | \n", "[167.9338896, 169.9347664, 170.9363302, 171.93... | \n", "
| Zn | \n", "[0.4917, 0.2773, 0.0404, 0.1845, 0.0061] | \n", "[63.92914201, 65.92603381, 66.92712775, 67.924... | \n", "
| Zr | \n", "[0.5145, 0.1122, 0.1715, 0.1738, 0.028] | \n", "[89.9046977, 90.9056396, 91.9050347, 93.906310... | \n", "
109 rows × 2 columns
\n", "| \n", " | 0 | \n", "
|---|---|
| 13C | \n", "13.003355 | \n", "
| 14N | \n", "14.003074 | \n", "
| 15N | \n", "15.000109 | \n", "
| 18O | \n", "17.999160 | \n", "
| 2H | \n", "2.014102 | \n", "
| ... | \n", "... | \n", "
| Xe | \n", "131.904155 | \n", "
| Y | \n", "88.905840 | \n", "
| Yb | \n", "173.938866 | \n", "
| Zn | \n", "63.929142 | \n", "
| Zr | \n", "89.904698 | \n", "
109 rows × 1 columns
\n", "| \n", " | aa | \n", "formula | \n", "smiles | \n", "mass | \n", "
|---|---|---|---|---|
| 65 | \n", "A | \n", "C(3)H(5)N(1)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(C)C(=O)[Ts] | \n", "7.103711e+01 | \n", "
| 66 | \n", "B | \n", "C(1000000) | \n", "NaN | \n", "1.200000e+07 | \n", "
| 67 | \n", "C | \n", "C(3)H(5)N(1)O(1)S(1) | \n", "N([Fl])([Fl])[C@@]([H])(CS)C(=O)[Ts] | \n", "1.030092e+02 | \n", "
| 68 | \n", "D | \n", "C(4)H(5)N(1)O(3)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CC(=O)O)C(=O)[Ts] | \n", "1.150269e+02 | \n", "
| 69 | \n", "E | \n", "C(5)H(7)N(1)O(3)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CCC(=O)O)C(=O)[Ts] | \n", "1.290426e+02 | \n", "
| 70 | \n", "F | \n", "C(9)H(9)N(1)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(Cc1ccccc1)C(=O)[Ts] | \n", "1.470684e+02 | \n", "
| 71 | \n", "G | \n", "C(2)H(3)N(1)O(1)S(0) | \n", "N([Fl])([Fl])CC(=O)[Ts] | \n", "5.702146e+01 | \n", "
| 72 | \n", "H | \n", "C(6)H(7)N(3)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CC1=CN=C-N1)C(=O)[Ts] | \n", "1.370589e+02 | \n", "
| 73 | \n", "I | \n", "C(6)H(11)N(1)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])([C@]([H])(CC)C)C(=O)[Ts] | \n", "1.130841e+02 | \n", "
| 74 | \n", "J | \n", "C(6)H(11)N(1)O(1)S(0) | \n", "NaN | \n", "1.130841e+02 | \n", "
| 75 | \n", "K | \n", "C(6)H(12)N(2)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CCCCN)C(=O)[Ts] | \n", "1.280950e+02 | \n", "
| 76 | \n", "L | \n", "C(6)H(11)N(1)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CC(C)C)C(=O)[Ts] | \n", "1.130841e+02 | \n", "
| 77 | \n", "M | \n", "C(5)H(9)N(1)O(1)S(1) | \n", "N([Fl])([Fl])[C@@]([H])(CCSC)C(=O)[Ts] | \n", "1.310405e+02 | \n", "
| 78 | \n", "N | \n", "C(4)H(6)N(2)O(2)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CC(=O)N)C(=O)[Ts] | \n", "1.140429e+02 | \n", "
| 79 | \n", "O | \n", "C(12)H(19)N(3)O(2) | \n", "C[C@@H]1CC=N[C@H]1C(=O)NCCCC[C@@H](C(=O)[Ts])N... | \n", "2.371477e+02 | \n", "
| 80 | \n", "P | \n", "C(5)H(7)N(1)O(1)S(0) | \n", "N1([Fl])[C@@]([H])(CCC1)C(=O)[Ts] | \n", "9.705276e+01 | \n", "
| 81 | \n", "Q | \n", "C(5)H(8)N(2)O(2)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CCC(=O)N)C(=O)[Ts] | \n", "1.280586e+02 | \n", "
| 82 | \n", "R | \n", "C(6)H(12)N(4)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CCCNC(=N)N)C(=O)[Ts] | \n", "1.561011e+02 | \n", "
| 83 | \n", "S | \n", "C(3)H(5)N(1)O(2)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CO)C(=O)[Ts] | \n", "8.703203e+01 | \n", "
| 84 | \n", "T | \n", "C(4)H(7)N(1)O(2)S(0) | \n", "N([Fl])([Fl])[C@@]([H])([C@]([H])(O)C)C(=O)[Ts] | \n", "1.010477e+02 | \n", "
| 85 | \n", "U | \n", "C(3)H(5)N(1)O(1)Se(1) | \n", "N([Fl])([Fl])[C@@]([H])(C[Se][H])C(=O)[Ts] | \n", "1.509536e+02 | \n", "
| 86 | \n", "V | \n", "C(5)H(9)N(1)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(C(C)C)C(=O)[Ts] | \n", "9.906841e+01 | \n", "
| 87 | \n", "W | \n", "C(11)H(10)N(2)O(1)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(CC(=CN2)C1=C2C=CC=C1)C... | \n", "1.860793e+02 | \n", "
| 88 | \n", "X | \n", "C(1000000) | \n", "NaN | \n", "1.200000e+07 | \n", "
| 89 | \n", "Y | \n", "C(9)H(9)N(1)O(2)S(0) | \n", "N([Fl])([Fl])[C@@]([H])(Cc1ccc(O)cc1)C(=O)[Ts] | \n", "1.630633e+02 | \n", "
| 90 | \n", "Z | \n", "C(1000000) | \n", "NaN | \n", "1.200000e+07 | \n", "
| \n", " | mod_name | \n", "unimod_mass | \n", "unimod_avge_mass | \n", "composition | \n", "unimod_modloss | \n", "modloss_composition | \n", "classification | \n", "unimod_id | \n", "smiles | \n", "modloss_importance | \n", "mass | \n", "modloss_original | \n", "modloss | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mod_name | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| Acetyl@T | \n", "Acetyl@T | \n", "42.010565 | \n", "42.0367 | \n", "H(2)C(2)O(1) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "1 | \n", "\n", " | 0.0 | \n", "42.010565 | \n", "0.0 | \n", "0.0 | \n", "
| Acetyl@Protein_N-term | \n", "Acetyl@Protein_N-term | \n", "42.010565 | \n", "42.0367 | \n", "H(2)C(2)O(1) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "1 | \n", "CC(=O)[Ts] | \n", "0.0 | \n", "42.010565 | \n", "0.0 | \n", "0.0 | \n", "
| Acetyl@S | \n", "Acetyl@S | \n", "42.010565 | \n", "42.0367 | \n", "H(2)C(2)O(1) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "1 | \n", "\n", " | 0.0 | \n", "42.010565 | \n", "0.0 | \n", "0.0 | \n", "
| Acetyl@C | \n", "Acetyl@C | \n", "42.010565 | \n", "42.0367 | \n", "H(2)C(2)O(1) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "1 | \n", "\n", " | 0.0 | \n", "42.010565 | \n", "0.0 | \n", "0.0 | \n", "
| Acetyl@Any_N-term | \n", "Acetyl@Any_N-term | \n", "42.010565 | \n", "42.0367 | \n", "H(2)C(2)O(1) | \n", "0.0 | \n", "\n", " | Multiple | \n", "1 | \n", "CC(=O)[Ts] | \n", "0.0 | \n", "42.010565 | \n", "0.0 | \n", "0.0 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| Lactyl@Any_N-term | \n", "Lactyl@Any_N-term | \n", "72.021129 | \n", "72.0627 | \n", "H(4)C(3)O(2) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "0 | \n", "C[C@@H](O)C(=O)[Ts] | \n", "0.0 | \n", "72.021129 | \n", "0.0 | \n", "0.0 | \n", "
| Lactyl@Protein_N-term | \n", "Lactyl@Protein_N-term | \n", "72.021129 | \n", "72.0627 | \n", "H(4)C(3)O(2) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "0 | \n", "C[C@@H](O)C(=O)[Ts] | \n", "0.0 | \n", "72.021129 | \n", "0.0 | \n", "0.0 | \n", "
| YnLactyl@K | \n", "YnLactyl@K | \n", "239.126991 | \n", "239.2941 | \n", "H(17)C(11)N(3)O(3) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "0 | \n", "OCCCCCCN1C=C(C[C@@H](O)C(=O)NCCCC[C@H](N([Fl])... | \n", "0.0 | \n", "239.126991 | \n", "0.0 | \n", "0.0 | \n", "
| YnLactyl@Any_N-term | \n", "YnLactyl@Any_N-term | \n", "239.126991 | \n", "239.2941 | \n", "H(17)C(11)N(3)O(3) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "0 | \n", "OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1 | \n", "0.0 | \n", "239.126991 | \n", "0.0 | \n", "0.0 | \n", "
| YnLactyl@Protein_N-term | \n", "YnLactyl@Protein_N-term | \n", "239.126991 | \n", "239.2941 | \n", "H(17)C(11)N(3)O(3) | \n", "0.0 | \n", "\n", " | Post-translational | \n", "0 | \n", "OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1 | \n", "0.0 | \n", "239.126991 | \n", "0.0 | \n", "0.0 | \n", "
2852 rows × 13 columns
\n", "| \n", " | abundance | \n", "mass | \n", "mono_idx | \n", "
|---|---|---|---|
| 13C | \n", "[0.01, 0.99] | \n", "[12.0, 13.00335483507] | \n", "1 | \n", "
| 14N | \n", "[0.996337, 0.003663] | \n", "[14.00307400443, 15.00010889888] | \n", "0 | \n", "
| 15N | \n", "[0.01, 0.99] | \n", "[14.00307400443, 15.00010889888] | \n", "1 | \n", "
| 18O | \n", "[0.005, 0.005, 0.99] | \n", "[15.99491461957, 16.9991317565, 17.99915961286] | \n", "2 | \n", "
| 2H | \n", "[0.01, 0.99] | \n", "[1.00782503223, 2.01410177812] | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| Xe | \n", "[0.000952, 0.00089, 0.019102, 0.264006, 0.0407... | \n", "[123.905892, 125.9042983, 127.903531, 128.9047... | \n", "8 | \n", "
| Y | \n", "[1.0] | \n", "[88.9058403] | \n", "0 | \n", "
| Yb | \n", "[0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0.... | \n", "[167.9338896, 169.9347664, 170.9363302, 171.93... | \n", "6 | \n", "
| Zn | \n", "[0.4917, 0.2773, 0.0404, 0.1845, 0.0061] | \n", "[63.92914201, 65.92603381, 66.92712775, 67.924... | \n", "0 | \n", "
| Zr | \n", "[0.5145, 0.1122, 0.1715, 0.1738, 0.028] | \n", "[89.9046977, 90.9056396, 91.9050347, 93.906310... | \n", "0 | \n", "
109 rows × 3 columns
\n", "