{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial: Basic Definitions and Settings\n",
    "\n",
    "Measuring m/z values is the elemental function of MS technologies, therefore the calculation of mass values for a peptide and its fragments becomes the most essential part in MS-based computational tools. AlphaBase calculates all mass values from atoms. And the masses of amino acids and modifications are calculated from their atom compositions, repectively. Eventually, the masses of peptides or precursors as well as their fragments can be calculated by the amino acid sequences with or without modifications (See figure below).\n",
    "\n",
    "Calculating masses from atoms makes it much easier to switch between unlabeled and heavy-labeled peptides, as we did in Stellar MS for 15N-labeled peptides as the reference for targeted proteomics (https://www.biorxiv.org/content/10.1101/2024.06.02.597029v2.full).\n",
    "\n",
    "The other advantage of starting from atoms is that AlphaBase can calculate isotope distributions of peptides based on a pre-defined isotope distribution list of atoms (e.g., NIST atom table in https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl). The isotope information has been applied in our alphaDIA search engine to boost the identification of DIA-MS data (https://www.biorxiv.org/content/10.1101/2024.05.28.596182v1)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![atom-to-peptides.png](atom-to-peptides.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Atoms/Elements\n",
    "\n",
    "The masses of all amino acids and modifications are calculated from their atom compositions.\n",
    "\n",
    "The atom information are defined in https://github.com/MannLabs/alphabase/blob/main/alphabase/constants/const_files/nist_element.yaml which is parsed from NIST, see https://github.com/MannLabs/alphabase/blob/main/scripts/nist_chem_to_yaml.ipynb.\n",
    "\n",
    "After adding some heavy isotopes, including 13C, 15N, 2H, and 18O, we obtain 109 kinds of atoms:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:22.699057Z",
     "start_time": "2025-01-30T16:49:22.690604Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:42.987776Z",
     "iopub.status.busy": "2026-01-05T22:43:42.987610Z",
     "iopub.status.idle": "2026-01-05T22:43:45.610885Z",
     "shell.execute_reply": "2026-01-05T22:43:45.610625Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abundance</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13C</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[12.0, 13.00335483507]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14N</th>\n",
       "      <td>[0.996337, 0.003663]</td>\n",
       "      <td>[14.00307400443, 15.00010889888]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15N</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[14.00307400443, 15.00010889888]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18O</th>\n",
       "      <td>[0.005, 0.005, 0.99]</td>\n",
       "      <td>[15.99491461957, 16.9991317565, 17.99915961286]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2H</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[1.00782503223, 2.01410177812]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Xe</th>\n",
       "      <td>[0.000952, 0.00089, 0.019102, 0.264006, 0.0407...</td>\n",
       "      <td>[123.905892, 125.9042983, 127.903531, 128.9047...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>[1.0]</td>\n",
       "      <td>[88.9058403]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yb</th>\n",
       "      <td>[0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0....</td>\n",
       "      <td>[167.9338896, 169.9347664, 170.9363302, 171.93...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zn</th>\n",
       "      <td>[0.4917, 0.2773, 0.0404, 0.1845, 0.0061]</td>\n",
       "      <td>[63.92914201, 65.92603381, 66.92712775, 67.924...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zr</th>\n",
       "      <td>[0.5145, 0.1122, 0.1715, 0.1738, 0.028]</td>\n",
       "      <td>[89.9046977, 90.9056396, 91.9050347, 93.906310...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>109 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             abundance  \\\n",
       "13C                                       [0.01, 0.99]   \n",
       "14N                               [0.996337, 0.003663]   \n",
       "15N                                       [0.01, 0.99]   \n",
       "18O                               [0.005, 0.005, 0.99]   \n",
       "2H                                        [0.01, 0.99]   \n",
       "..                                                 ...   \n",
       "Xe   [0.000952, 0.00089, 0.019102, 0.264006, 0.0407...   \n",
       "Y                                                [1.0]   \n",
       "Yb   [0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0....   \n",
       "Zn            [0.4917, 0.2773, 0.0404, 0.1845, 0.0061]   \n",
       "Zr             [0.5145, 0.1122, 0.1715, 0.1738, 0.028]   \n",
       "\n",
       "                                                  mass  \n",
       "13C                             [12.0, 13.00335483507]  \n",
       "14N                   [14.00307400443, 15.00010889888]  \n",
       "15N                   [14.00307400443, 15.00010889888]  \n",
       "18O    [15.99491461957, 16.9991317565, 17.99915961286]  \n",
       "2H                      [1.00782503223, 2.01410177812]  \n",
       "..                                                 ...  \n",
       "Xe   [123.905892, 125.9042983, 127.903531, 128.9047...  \n",
       "Y                                         [88.9058403]  \n",
       "Yb   [167.9338896, 169.9347664, 170.9363302, 171.93...  \n",
       "Zn   [63.92914201, 65.92603381, 66.92712775, 67.924...  \n",
       "Zr   [89.9046977, 90.9056396, 91.9050347, 93.906310...  \n",
       "\n",
       "[109 rows x 2 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from alphabase.constants.atom import CHEM_INFO_DICT\n",
    "pd.DataFrame().from_dict(CHEM_INFO_DICT, orient='index')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And their mono-isotopic mass are in `CHEM_MONO_MASS` (dict):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:23.563685Z",
     "start_time": "2025-01-30T16:49:23.559129Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.625017Z",
     "iopub.status.busy": "2026-01-05T22:43:45.624885Z",
     "iopub.status.idle": "2026-01-05T22:43:45.628025Z",
     "shell.execute_reply": "2026-01-05T22:43:45.627787Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13C</th>\n",
       "      <td>13.003355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14N</th>\n",
       "      <td>14.003074</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15N</th>\n",
       "      <td>15.000109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18O</th>\n",
       "      <td>17.999160</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2H</th>\n",
       "      <td>2.014102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Xe</th>\n",
       "      <td>131.904155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>88.905840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yb</th>\n",
       "      <td>173.938866</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zn</th>\n",
       "      <td>63.929142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zr</th>\n",
       "      <td>89.904698</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>109 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              0\n",
       "13C   13.003355\n",
       "14N   14.003074\n",
       "15N   15.000109\n",
       "18O   17.999160\n",
       "2H     2.014102\n",
       "..          ...\n",
       "Xe   131.904155\n",
       "Y     88.905840\n",
       "Yb   173.938866\n",
       "Zn    63.929142\n",
       "Zr    89.904698\n",
       "\n",
       "[109 rows x 1 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.atom import CHEM_MONO_MASS\n",
    "pd.DataFrame().from_dict(CHEM_MONO_MASS, orient='index')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These atom masses are used to calculate the masses of amino acids, modifications, and then subsequent masses of peptides and fragments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Commonly used molecular masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:24.557595Z",
     "start_time": "2025-01-30T16:49:24.555151Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.629265Z",
     "iopub.status.busy": "2026-01-05T22:43:45.629192Z",
     "iopub.status.idle": "2026-01-05T22:43:45.631231Z",
     "shell.execute_reply": "2026-01-05T22:43:45.631023Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1.007276467, 1.0033, 17.02654910112, 18.01056468403)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.atom import (\n",
    "    MASS_PROTON, MASS_ISOTOPE, MASS_NH3, MASS_H2O\n",
    ")\n",
    "MASS_PROTON, MASS_ISOTOPE, MASS_NH3, MASS_H2O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Amino Acids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:25.418105Z",
     "start_time": "2025-01-30T16:49:25.413661Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.632415Z",
     "iopub.status.busy": "2026-01-05T22:43:45.632353Z",
     "iopub.status.idle": "2026-01-05T22:43:45.642932Z",
     "shell.execute_reply": "2026-01-05T22:43:45.642744Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aa</th>\n",
       "      <th>formula</th>\n",
       "      <th>smiles</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>A</td>\n",
       "      <td>C(3)H(5)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(C)C(=O)[Ts]</td>\n",
       "      <td>7.103711e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>B</td>\n",
       "      <td>C(1000000)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.200000e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>C</td>\n",
       "      <td>C(3)H(5)N(1)O(1)S(1)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CS)C(=O)[Ts]</td>\n",
       "      <td>1.030092e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>D</td>\n",
       "      <td>C(4)H(5)N(1)O(3)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CC(=O)O)C(=O)[Ts]</td>\n",
       "      <td>1.150269e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>E</td>\n",
       "      <td>C(5)H(7)N(1)O(3)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CCC(=O)O)C(=O)[Ts]</td>\n",
       "      <td>1.290426e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>F</td>\n",
       "      <td>C(9)H(9)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(Cc1ccccc1)C(=O)[Ts]</td>\n",
       "      <td>1.470684e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>G</td>\n",
       "      <td>C(2)H(3)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])CC(=O)[Ts]</td>\n",
       "      <td>5.702146e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>H</td>\n",
       "      <td>C(6)H(7)N(3)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CC1=CN=C-N1)C(=O)[Ts]</td>\n",
       "      <td>1.370589e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>I</td>\n",
       "      <td>C(6)H(11)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])([C@]([H])(CC)C)C(=O)[Ts]</td>\n",
       "      <td>1.130841e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74</th>\n",
       "      <td>J</td>\n",
       "      <td>C(6)H(11)N(1)O(1)S(0)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.130841e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>K</td>\n",
       "      <td>C(6)H(12)N(2)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CCCCN)C(=O)[Ts]</td>\n",
       "      <td>1.280950e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>L</td>\n",
       "      <td>C(6)H(11)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CC(C)C)C(=O)[Ts]</td>\n",
       "      <td>1.130841e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>M</td>\n",
       "      <td>C(5)H(9)N(1)O(1)S(1)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CCSC)C(=O)[Ts]</td>\n",
       "      <td>1.310405e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>N</td>\n",
       "      <td>C(4)H(6)N(2)O(2)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CC(=O)N)C(=O)[Ts]</td>\n",
       "      <td>1.140429e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>O</td>\n",
       "      <td>C(12)H(19)N(3)O(2)</td>\n",
       "      <td>C[C@@H]1CC=N[C@H]1C(=O)NCCCC[C@@H](C(=O)[Ts])N...</td>\n",
       "      <td>2.371477e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>P</td>\n",
       "      <td>C(5)H(7)N(1)O(1)S(0)</td>\n",
       "      <td>N1([Fl])[C@@]([H])(CCC1)C(=O)[Ts]</td>\n",
       "      <td>9.705276e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Q</td>\n",
       "      <td>C(5)H(8)N(2)O(2)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CCC(=O)N)C(=O)[Ts]</td>\n",
       "      <td>1.280586e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>R</td>\n",
       "      <td>C(6)H(12)N(4)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CCCNC(=N)N)C(=O)[Ts]</td>\n",
       "      <td>1.561011e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>S</td>\n",
       "      <td>C(3)H(5)N(1)O(2)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CO)C(=O)[Ts]</td>\n",
       "      <td>8.703203e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>T</td>\n",
       "      <td>C(4)H(7)N(1)O(2)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])([C@]([H])(O)C)C(=O)[Ts]</td>\n",
       "      <td>1.010477e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>U</td>\n",
       "      <td>C(3)H(5)N(1)O(1)Se(1)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(C[Se][H])C(=O)[Ts]</td>\n",
       "      <td>1.509536e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>V</td>\n",
       "      <td>C(5)H(9)N(1)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(C(C)C)C(=O)[Ts]</td>\n",
       "      <td>9.906841e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>W</td>\n",
       "      <td>C(11)H(10)N(2)O(1)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(CC(=CN2)C1=C2C=CC=C1)C...</td>\n",
       "      <td>1.860793e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>X</td>\n",
       "      <td>C(1000000)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.200000e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>Y</td>\n",
       "      <td>C(9)H(9)N(1)O(2)S(0)</td>\n",
       "      <td>N([Fl])([Fl])[C@@]([H])(Cc1ccc(O)cc1)C(=O)[Ts]</td>\n",
       "      <td>1.630633e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>Z</td>\n",
       "      <td>C(1000000)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.200000e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   aa                 formula  \\\n",
       "65  A    C(3)H(5)N(1)O(1)S(0)   \n",
       "66  B              C(1000000)   \n",
       "67  C    C(3)H(5)N(1)O(1)S(1)   \n",
       "68  D    C(4)H(5)N(1)O(3)S(0)   \n",
       "69  E    C(5)H(7)N(1)O(3)S(0)   \n",
       "70  F    C(9)H(9)N(1)O(1)S(0)   \n",
       "71  G    C(2)H(3)N(1)O(1)S(0)   \n",
       "72  H    C(6)H(7)N(3)O(1)S(0)   \n",
       "73  I   C(6)H(11)N(1)O(1)S(0)   \n",
       "74  J   C(6)H(11)N(1)O(1)S(0)   \n",
       "75  K   C(6)H(12)N(2)O(1)S(0)   \n",
       "76  L   C(6)H(11)N(1)O(1)S(0)   \n",
       "77  M    C(5)H(9)N(1)O(1)S(1)   \n",
       "78  N    C(4)H(6)N(2)O(2)S(0)   \n",
       "79  O      C(12)H(19)N(3)O(2)   \n",
       "80  P    C(5)H(7)N(1)O(1)S(0)   \n",
       "81  Q    C(5)H(8)N(2)O(2)S(0)   \n",
       "82  R   C(6)H(12)N(4)O(1)S(0)   \n",
       "83  S    C(3)H(5)N(1)O(2)S(0)   \n",
       "84  T    C(4)H(7)N(1)O(2)S(0)   \n",
       "85  U   C(3)H(5)N(1)O(1)Se(1)   \n",
       "86  V    C(5)H(9)N(1)O(1)S(0)   \n",
       "87  W  C(11)H(10)N(2)O(1)S(0)   \n",
       "88  X              C(1000000)   \n",
       "89  Y    C(9)H(9)N(1)O(2)S(0)   \n",
       "90  Z              C(1000000)   \n",
       "\n",
       "                                               smiles          mass  \n",
       "65                N([Fl])([Fl])[C@@]([H])(C)C(=O)[Ts]  7.103711e+01  \n",
       "66                                                NaN  1.200000e+07  \n",
       "67               N([Fl])([Fl])[C@@]([H])(CS)C(=O)[Ts]  1.030092e+02  \n",
       "68          N([Fl])([Fl])[C@@]([H])(CC(=O)O)C(=O)[Ts]  1.150269e+02  \n",
       "69         N([Fl])([Fl])[C@@]([H])(CCC(=O)O)C(=O)[Ts]  1.290426e+02  \n",
       "70        N([Fl])([Fl])[C@@]([H])(Cc1ccccc1)C(=O)[Ts]  1.470684e+02  \n",
       "71                            N([Fl])([Fl])CC(=O)[Ts]  5.702146e+01  \n",
       "72      N([Fl])([Fl])[C@@]([H])(CC1=CN=C-N1)C(=O)[Ts]  1.370589e+02  \n",
       "73   N([Fl])([Fl])[C@@]([H])([C@]([H])(CC)C)C(=O)[Ts]  1.130841e+02  \n",
       "74                                                NaN  1.130841e+02  \n",
       "75            N([Fl])([Fl])[C@@]([H])(CCCCN)C(=O)[Ts]  1.280950e+02  \n",
       "76           N([Fl])([Fl])[C@@]([H])(CC(C)C)C(=O)[Ts]  1.130841e+02  \n",
       "77             N([Fl])([Fl])[C@@]([H])(CCSC)C(=O)[Ts]  1.310405e+02  \n",
       "78          N([Fl])([Fl])[C@@]([H])(CC(=O)N)C(=O)[Ts]  1.140429e+02  \n",
       "79  C[C@@H]1CC=N[C@H]1C(=O)NCCCC[C@@H](C(=O)[Ts])N...  2.371477e+02  \n",
       "80                  N1([Fl])[C@@]([H])(CCC1)C(=O)[Ts]  9.705276e+01  \n",
       "81         N([Fl])([Fl])[C@@]([H])(CCC(=O)N)C(=O)[Ts]  1.280586e+02  \n",
       "82       N([Fl])([Fl])[C@@]([H])(CCCNC(=N)N)C(=O)[Ts]  1.561011e+02  \n",
       "83               N([Fl])([Fl])[C@@]([H])(CO)C(=O)[Ts]  8.703203e+01  \n",
       "84    N([Fl])([Fl])[C@@]([H])([C@]([H])(O)C)C(=O)[Ts]  1.010477e+02  \n",
       "85         N([Fl])([Fl])[C@@]([H])(C[Se][H])C(=O)[Ts]  1.509536e+02  \n",
       "86            N([Fl])([Fl])[C@@]([H])(C(C)C)C(=O)[Ts]  9.906841e+01  \n",
       "87  N([Fl])([Fl])[C@@]([H])(CC(=CN2)C1=C2C=CC=C1)C...  1.860793e+02  \n",
       "88                                                NaN  1.200000e+07  \n",
       "89     N([Fl])([Fl])[C@@]([H])(Cc1ccc(O)cc1)C(=O)[Ts]  1.630633e+02  \n",
       "90                                                NaN  1.200000e+07  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.aa import AA_DF\n",
    "AA_DF.loc[ord('A'):ord('Z')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In `AA_DF`, amino acids are encoded by ASCII (128 characters), thus 65==ord('A'), ..., 90==ord('Z'). Unicode strings can be quickly converted to ASCII int32 values using `np.array.view()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:26.227920Z",
     "start_time": "2025-01-30T16:49:26.225581Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.644189Z",
     "iopub.status.busy": "2026-01-05T22:43:45.644126Z",
     "iopub.status.idle": "2026-01-05T22:43:45.646215Z",
     "shell.execute_reply": "2026-01-05T22:43:45.646008Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([65, 66, 67, 88, 89, 90], dtype=int32)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "np.array(['ABCXYZ']).view(np.int32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But users does not need to know this, as we provided easy to use functionalities to get residue masses from sequences."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calculate AA masses in batch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:27.796494Z",
     "start_time": "2025-01-30T16:49:27.793162Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.647393Z",
     "iopub.status.busy": "2026-01-05T22:43:45.647331Z",
     "iopub.status.idle": "2026-01-05T22:43:45.649319Z",
     "shell.execute_reply": "2026-01-05T22:43:45.649133Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[131.04048509,  71.03711379, 103.00918496, 115.02694302,\n",
       "        129.04259309, 147.06841391,  57.02146372],\n",
       "       [131.04048509,  71.03711379, 128.09496302, 115.02694302,\n",
       "        129.04259309, 147.06841391,  57.02146372],\n",
       "       [131.04048509,  71.03711379, 128.09496302, 115.02694302,\n",
       "        129.04259309, 147.06841391, 156.10111102]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.aa import calc_AA_masses_for_same_len_seqs\n",
    "calc_AA_masses_for_same_len_seqs(\n",
    "    [\n",
    "        'MACDEFG', 'MAKDEFG', 'MAKDEFR'\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:28.328268Z",
     "start_time": "2025-01-30T16:49:28.325621Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.650418Z",
     "iopub.status.busy": "2026-01-05T22:43:45.650345Z",
     "iopub.status.idle": "2026-01-05T22:43:45.652264Z",
     "shell.execute_reply": "2026-01-05T22:43:45.652092Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.31040485e+02, 1.00000000e+08, 1.00000000e+08, 1.00000000e+08,\n",
       "        1.00000000e+08, 1.00000000e+08, 1.00000000e+08],\n",
       "       [1.31040485e+02, 7.10371138e+01, 1.28094963e+02, 1.00000000e+08,\n",
       "        1.00000000e+08, 1.00000000e+08, 1.00000000e+08],\n",
       "       [1.31040485e+02, 7.10371138e+01, 1.28094963e+02, 1.15026943e+02,\n",
       "        1.29042593e+02, 1.47068414e+02, 1.56101111e+02]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.aa import calc_AA_masses_for_var_len_seqs\n",
    "calc_AA_masses_for_var_len_seqs(\n",
    "    [\n",
    "        'M', 'MAK', 'MAKDEFR'\n",
    "    ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modifications\n",
    "\n",
    "In AlphaBase, we used `mod_name@aa` to represent a modification, the `mod_name` is from UniMod. We also used `mod_name@Protein_N-term`, `mod_name@Any_N-term` and `mod_name@Any_C-term` for terminal modifications, which follow the UniMod terminal name schema.\n",
    "\n",
    "The default modification TSV is stored in https://github.com/MannLabs/alphabase/blob/main/alphabase/constants/const_files/modification.tsv, which is loaded upon startup of AlphaBase.\n",
    "Users can add more modifications into the tsv file (only `mod_name` and `composition` columns are required), e.g. by using the https://github.com/MannLabs/alphabase/blob/main/scripts/unimod_to_tsv.ipynb notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:29.682938Z",
     "start_time": "2025-01-30T16:49:29.674414Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.653275Z",
     "iopub.status.busy": "2026-01-05T22:43:45.653216Z",
     "iopub.status.idle": "2026-01-05T22:43:45.685059Z",
     "shell.execute_reply": "2026-01-05T22:43:45.684819Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mod_name</th>\n",
       "      <th>unimod_mass</th>\n",
       "      <th>unimod_avge_mass</th>\n",
       "      <th>composition</th>\n",
       "      <th>unimod_modloss</th>\n",
       "      <th>modloss_composition</th>\n",
       "      <th>classification</th>\n",
       "      <th>unimod_id</th>\n",
       "      <th>smiles</th>\n",
       "      <th>modloss_importance</th>\n",
       "      <th>mass</th>\n",
       "      <th>modloss_original</th>\n",
       "      <th>modloss</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mod_name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Acetyl@T</th>\n",
       "      <td>Acetyl@T</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>42.0367</td>\n",
       "      <td>H(2)C(2)O(1)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>0.0</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Acetyl@Protein_N-term</th>\n",
       "      <td>Acetyl@Protein_N-term</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>42.0367</td>\n",
       "      <td>H(2)C(2)O(1)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>1</td>\n",
       "      <td>CC(=O)[Ts]</td>\n",
       "      <td>0.0</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Acetyl@S</th>\n",
       "      <td>Acetyl@S</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>42.0367</td>\n",
       "      <td>H(2)C(2)O(1)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>0.0</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Acetyl@C</th>\n",
       "      <td>Acetyl@C</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>42.0367</td>\n",
       "      <td>H(2)C(2)O(1)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>1</td>\n",
       "      <td></td>\n",
       "      <td>0.0</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Acetyl@Any_N-term</th>\n",
       "      <td>Acetyl@Any_N-term</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>42.0367</td>\n",
       "      <td>H(2)C(2)O(1)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Multiple</td>\n",
       "      <td>1</td>\n",
       "      <td>CC(=O)[Ts]</td>\n",
       "      <td>0.0</td>\n",
       "      <td>42.010565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lactyl@Any_N-term</th>\n",
       "      <td>Lactyl@Any_N-term</td>\n",
       "      <td>72.021129</td>\n",
       "      <td>72.0627</td>\n",
       "      <td>H(4)C(3)O(2)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>0</td>\n",
       "      <td>C[C@@H](O)C(=O)[Ts]</td>\n",
       "      <td>0.0</td>\n",
       "      <td>72.021129</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lactyl@Protein_N-term</th>\n",
       "      <td>Lactyl@Protein_N-term</td>\n",
       "      <td>72.021129</td>\n",
       "      <td>72.0627</td>\n",
       "      <td>H(4)C(3)O(2)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>0</td>\n",
       "      <td>C[C@@H](O)C(=O)[Ts]</td>\n",
       "      <td>0.0</td>\n",
       "      <td>72.021129</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>YnLactyl@K</th>\n",
       "      <td>YnLactyl@K</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>239.2941</td>\n",
       "      <td>H(17)C(11)N(3)O(3)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>0</td>\n",
       "      <td>OCCCCCCN1C=C(C[C@@H](O)C(=O)NCCCC[C@H](N([Fl])...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>YnLactyl@Any_N-term</th>\n",
       "      <td>YnLactyl@Any_N-term</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>239.2941</td>\n",
       "      <td>H(17)C(11)N(3)O(3)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>0</td>\n",
       "      <td>OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>YnLactyl@Protein_N-term</th>\n",
       "      <td>YnLactyl@Protein_N-term</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>239.2941</td>\n",
       "      <td>H(17)C(11)N(3)O(3)</td>\n",
       "      <td>0.0</td>\n",
       "      <td></td>\n",
       "      <td>Post-translational</td>\n",
       "      <td>0</td>\n",
       "      <td>OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>239.126991</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2852 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        mod_name  unimod_mass  \\\n",
       "mod_name                                                        \n",
       "Acetyl@T                                Acetyl@T    42.010565   \n",
       "Acetyl@Protein_N-term      Acetyl@Protein_N-term    42.010565   \n",
       "Acetyl@S                                Acetyl@S    42.010565   \n",
       "Acetyl@C                                Acetyl@C    42.010565   \n",
       "Acetyl@Any_N-term              Acetyl@Any_N-term    42.010565   \n",
       "...                                          ...          ...   \n",
       "Lactyl@Any_N-term              Lactyl@Any_N-term    72.021129   \n",
       "Lactyl@Protein_N-term      Lactyl@Protein_N-term    72.021129   \n",
       "YnLactyl@K                            YnLactyl@K   239.126991   \n",
       "YnLactyl@Any_N-term          YnLactyl@Any_N-term   239.126991   \n",
       "YnLactyl@Protein_N-term  YnLactyl@Protein_N-term   239.126991   \n",
       "\n",
       "                         unimod_avge_mass         composition  unimod_modloss  \\\n",
       "mod_name                                                                        \n",
       "Acetyl@T                          42.0367        H(2)C(2)O(1)             0.0   \n",
       "Acetyl@Protein_N-term             42.0367        H(2)C(2)O(1)             0.0   \n",
       "Acetyl@S                          42.0367        H(2)C(2)O(1)             0.0   \n",
       "Acetyl@C                          42.0367        H(2)C(2)O(1)             0.0   \n",
       "Acetyl@Any_N-term                 42.0367        H(2)C(2)O(1)             0.0   \n",
       "...                                   ...                 ...             ...   \n",
       "Lactyl@Any_N-term                 72.0627        H(4)C(3)O(2)             0.0   \n",
       "Lactyl@Protein_N-term             72.0627        H(4)C(3)O(2)             0.0   \n",
       "YnLactyl@K                       239.2941  H(17)C(11)N(3)O(3)             0.0   \n",
       "YnLactyl@Any_N-term              239.2941  H(17)C(11)N(3)O(3)             0.0   \n",
       "YnLactyl@Protein_N-term          239.2941  H(17)C(11)N(3)O(3)             0.0   \n",
       "\n",
       "                        modloss_composition      classification  unimod_id  \\\n",
       "mod_name                                                                     \n",
       "Acetyl@T                                     Post-translational          1   \n",
       "Acetyl@Protein_N-term                        Post-translational          1   \n",
       "Acetyl@S                                     Post-translational          1   \n",
       "Acetyl@C                                     Post-translational          1   \n",
       "Acetyl@Any_N-term                                      Multiple          1   \n",
       "...                                     ...                 ...        ...   \n",
       "Lactyl@Any_N-term                            Post-translational          0   \n",
       "Lactyl@Protein_N-term                        Post-translational          0   \n",
       "YnLactyl@K                                   Post-translational          0   \n",
       "YnLactyl@Any_N-term                          Post-translational          0   \n",
       "YnLactyl@Protein_N-term                      Post-translational          0   \n",
       "\n",
       "                                                                    smiles  \\\n",
       "mod_name                                                                     \n",
       "Acetyl@T                                                                     \n",
       "Acetyl@Protein_N-term                                           CC(=O)[Ts]   \n",
       "Acetyl@S                                                                     \n",
       "Acetyl@C                                                                     \n",
       "Acetyl@Any_N-term                                               CC(=O)[Ts]   \n",
       "...                                                                    ...   \n",
       "Lactyl@Any_N-term                                      C[C@@H](O)C(=O)[Ts]   \n",
       "Lactyl@Protein_N-term                                  C[C@@H](O)C(=O)[Ts]   \n",
       "YnLactyl@K               OCCCCCCN1C=C(C[C@@H](O)C(=O)NCCCC[C@H](N([Fl])...   \n",
       "YnLactyl@Any_N-term                  OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1   \n",
       "YnLactyl@Protein_N-term              OCCCCCCN1C=C(C[C@@H](O)C(=O)[Ts])N=N1   \n",
       "\n",
       "                         modloss_importance        mass  modloss_original  \\\n",
       "mod_name                                                                    \n",
       "Acetyl@T                                0.0   42.010565               0.0   \n",
       "Acetyl@Protein_N-term                   0.0   42.010565               0.0   \n",
       "Acetyl@S                                0.0   42.010565               0.0   \n",
       "Acetyl@C                                0.0   42.010565               0.0   \n",
       "Acetyl@Any_N-term                       0.0   42.010565               0.0   \n",
       "...                                     ...         ...               ...   \n",
       "Lactyl@Any_N-term                       0.0   72.021129               0.0   \n",
       "Lactyl@Protein_N-term                   0.0   72.021129               0.0   \n",
       "YnLactyl@K                              0.0  239.126991               0.0   \n",
       "YnLactyl@Any_N-term                     0.0  239.126991               0.0   \n",
       "YnLactyl@Protein_N-term                 0.0  239.126991               0.0   \n",
       "\n",
       "                         modloss  \n",
       "mod_name                          \n",
       "Acetyl@T                     0.0  \n",
       "Acetyl@Protein_N-term        0.0  \n",
       "Acetyl@S                     0.0  \n",
       "Acetyl@C                     0.0  \n",
       "Acetyl@Any_N-term            0.0  \n",
       "...                          ...  \n",
       "Lactyl@Any_N-term            0.0  \n",
       "Lactyl@Protein_N-term        0.0  \n",
       "YnLactyl@K                   0.0  \n",
       "YnLactyl@Any_N-term          0.0  \n",
       "YnLactyl@Protein_N-term      0.0  \n",
       "\n",
       "[2852 rows x 13 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.modification import MOD_DF\n",
    "MOD_DF"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Modification sites\n",
    "\n",
    "In alphabase, we use 0 and -1 to represent modification site of N-term and C-term, respectively. For other modification sites, we use 1 to n."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:30.400073Z",
     "start_time": "2025-01-30T16:49:30.397484Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.686247Z",
     "iopub.status.busy": "2026-01-05T22:43:45.686182Z",
     "iopub.status.idle": "2026-01-05T22:43:45.688281Z",
     "shell.execute_reply": "2026-01-05T22:43:45.688102Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([42.01056468,  0.        , 57.02146372,  0.        ,  0.        ,\n",
       "        0.        ,  0.        ])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.modification import calc_modification_mass\n",
    "\n",
    "# example: add two modifications and print the array of mass modifications\n",
    "sequence = 'MACDEFG'\n",
    "mod_names = ['Acetyl@Any_N-term', 'Carbamidomethyl@C']\n",
    "mod_sites = [0, 3] # 0 for N-term, 3 for the third amino acid\n",
    "calc_modification_mass(\n",
    "    nAA=len(sequence),\n",
    "    mod_names=mod_names,\n",
    "    mod_sites=mod_sites\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:31.049003Z",
     "start_time": "2025-01-30T16:49:31.045754Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.689321Z",
     "iopub.status.busy": "2026-01-05T22:43:45.689260Z",
     "iopub.status.idle": "2026-01-05T22:43:45.691232Z",
     "shell.execute_reply": "2026-01-05T22:43:45.691050Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([58.0054793,  0.       ,  0.       ,  0.       ,  0.       ,\n",
       "        0.       ,  0.       ])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# example: add two modifications and print the array of mass modifications\n",
    "sequence = 'MAKDEFG'\n",
    "mod_names = ['Acetyl@Any_N-term', 'Oxidation@M']\n",
    "mod_sites = [0, 1] # 0 for N-term, 1 for the first amino acid\n",
    "calc_modification_mass(\n",
    "    nAA=len(sequence),\n",
    "    mod_names=mod_names,\n",
    "    mod_sites=mod_sites\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Multiple modifications at a single site is supported, for example, in the following example, `K3` contains both `GG@K` and `Dimethyl@K`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:32.133084Z",
     "start_time": "2025-01-30T16:49:32.130276Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.692247Z",
     "iopub.status.busy": "2026-01-05T22:43:45.692192Z",
     "iopub.status.idle": "2026-01-05T22:43:45.694110Z",
     "shell.execute_reply": "2026-01-05T22:43:45.693915Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  0.        ,   0.        , 142.07422757,   0.        ,\n",
       "         0.        ,   0.        ,   0.        ])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sequence = 'MAKDEFR'\n",
    "mod_names = ['GG@K', 'Dimethyl@K']\n",
    "mod_sites = [3, 3]\n",
    "calc_modification_mass(\n",
    "    nAA=len(sequence),\n",
    "    mod_names=mod_names,\n",
    "    mod_sites=mod_sites\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Caculate modification masses in batch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:33.217088Z",
     "start_time": "2025-01-30T16:49:33.213576Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.695235Z",
     "iopub.status.busy": "2026-01-05T22:43:45.695173Z",
     "iopub.status.idle": "2026-01-05T22:43:45.697144Z",
     "shell.execute_reply": "2026-01-05T22:43:45.696950Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 42.01056468,   0.        ,  57.02146372,   0.        ,\n",
       "          0.        ,   0.        ,   0.        ],\n",
       "       [ 58.0054793 ,   0.        ,   0.        ,   0.        ,\n",
       "          0.        ,   0.        ,   0.        ],\n",
       "       [  0.        ,   0.        , 142.07422757,   0.        ,\n",
       "          0.        ,   0.        ,   0.        ]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.modification import calc_mod_masses_for_same_len_seqs\n",
    "calc_mod_masses_for_same_len_seqs(\n",
    "    nAA=7,\n",
    "    mod_names_list=[\n",
    "        ['Acetyl@Any_N-term', 'Carbamidomethyl@C'],\n",
    "        ['Acetyl@Any_N-term', 'Oxidation@M'],\n",
    "        ['GG@K', 'Dimethyl@K'],\n",
    "    ],\n",
    "    mod_sites_list=[\n",
    "        [0, 3],\n",
    "        [0, 1],\n",
    "        [3, 3],\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mass calculation functionalities"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calculate AA and modification masses in batch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:34.937876Z",
     "start_time": "2025-01-30T16:49:34.933525Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.698281Z",
     "iopub.status.busy": "2026-01-05T22:43:45.698218Z",
     "iopub.status.idle": "2026-01-05T22:43:45.700560Z",
     "shell.execute_reply": "2026-01-05T22:43:45.700307Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[173.05104977,  71.03711379, 160.03064868, 115.02694302,\n",
       "        129.04259309, 147.06841391,  57.02146372],\n",
       "       [189.04596439,  71.03711379, 128.09496302, 115.02694302,\n",
       "        129.04259309, 147.06841391,  57.02146372],\n",
       "       [131.04048509,  71.03711379, 270.16919059, 115.02694302,\n",
       "        129.04259309, 147.06841391, 156.10111102]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.aa import calc_AA_masses_for_same_len_seqs\n",
    "from alphabase.constants.modification import calc_mod_masses_for_same_len_seqs\n",
    "mod_masses = calc_mod_masses_for_same_len_seqs(\n",
    "    nAA=7,\n",
    "    mod_names_list=[\n",
    "        ['Acetyl@Any_N-term', 'Carbamidomethyl@C'],\n",
    "        ['Acetyl@Any_N-term', 'Oxidation@M'],\n",
    "        ['GG@K', 'Dimethyl@K'],\n",
    "    ],\n",
    "    mod_sites_list=[\n",
    "        [0, 3],\n",
    "        [0, 1],\n",
    "        [3, 3],\n",
    "    ]\n",
    ")\n",
    "aa_masses = calc_AA_masses_for_same_len_seqs(\n",
    "    [\n",
    "        'MACDEFG', 'MAKDEFG', 'MAKDEFR'\n",
    "    ]\n",
    ")\n",
    "mod_masses+aa_masses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### np.cumsum to get b-ion neutral masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:35.985899Z",
     "start_time": "2025-01-30T16:49:35.982829Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.701672Z",
     "iopub.status.busy": "2026-01-05T22:43:45.701615Z",
     "iopub.status.idle": "2026-01-05T22:43:45.703469Z",
     "shell.execute_reply": "2026-01-05T22:43:45.703288Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 173.05104977,  244.08816356,  404.11881224,  519.14575526,\n",
       "         648.18834835,  795.25676227,  852.27822599],\n",
       "       [ 189.04596439,  260.08307818,  388.17804119,  503.20498422,\n",
       "         632.24757731,  779.31599122,  836.33745494],\n",
       "       [ 131.04048509,  202.07759887,  472.24678946,  587.27373248,\n",
       "         716.31632557,  863.38473949, 1019.48585051]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "np.cumsum(aa_masses+mod_masses, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Mass functionalities in 'mass_calc'\n",
    "\n",
    "The functionalities for peptide and fragment neutral masses have been implement in `alphabase.peptide.mass_calc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:36.899617Z",
     "start_time": "2025-01-30T16:49:36.895322Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.704536Z",
     "iopub.status.busy": "2026-01-05T22:43:45.704480Z",
     "iopub.status.idle": "2026-01-05T22:43:45.706726Z",
     "shell.execute_reply": "2026-01-05T22:43:45.706543Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 870.28879067,  854.34801962, 1037.49641519])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.peptide.mass_calc import calc_peptide_masses_for_same_len_seqs\n",
    "\n",
    "peptide_masses = calc_peptide_masses_for_same_len_seqs(\n",
    "    ['MACDEFG', 'MAKDEFG', 'MAKDEFR'],\n",
    "    mod_list=[\n",
    "        'Acetyl@Any_N-term;Carbamidomethyl@C',\n",
    "        'Acetyl@Any_N-term;Oxidation@M',\n",
    "        'GG@K;Dimethyl@K',\n",
    "    ],\n",
    ")\n",
    "peptide_masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:37.414885Z",
     "start_time": "2025-01-30T16:49:37.411633Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.707675Z",
     "iopub.status.busy": "2026-01-05T22:43:45.707622Z",
     "iopub.status.idle": "2026-01-05T22:43:45.709730Z",
     "shell.execute_reply": "2026-01-05T22:43:45.709543Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 870.28879067,  854.34801962, 1037.49641519])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.peptide.mass_calc import calc_b_y_and_peptide_masses_for_same_len_seqs\n",
    "b_masses, y_masses, peptide_masses = calc_b_y_and_peptide_masses_for_same_len_seqs(\n",
    "    ['MACDEFG', 'MAKDEFG', 'MAKDEFR'],\n",
    "    mod_list=[\n",
    "        ['Acetyl@Any_N-term', 'Carbamidomethyl@C'],\n",
    "        ['Acetyl@Any_N-term', 'Oxidation@M'],\n",
    "        ['GG@K', 'Dimethyl@K'],\n",
    "    ],\n",
    "    site_list=[\n",
    "        [0, 3],\n",
    "        [0, 1],\n",
    "        [3, 3],\n",
    "    ],\n",
    ")\n",
    "peptide_masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:38.022288Z",
     "start_time": "2025-01-30T16:49:38.019932Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.710723Z",
     "iopub.status.busy": "2026-01-05T22:43:45.710668Z",
     "iopub.status.idle": "2026-01-05T22:43:45.712373Z",
     "shell.execute_reply": "2026-01-05T22:43:45.712188Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[173.05104977, 244.08816356, 404.11881224, 519.14575526,\n",
       "        648.18834835, 795.25676227],\n",
       "       [189.04596439, 260.08307818, 388.17804119, 503.20498422,\n",
       "        632.24757731, 779.31599122],\n",
       "       [131.04048509, 202.07759887, 472.24678946, 587.27373248,\n",
       "        716.31632557, 863.38473949]])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b_masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:38.389402Z",
     "start_time": "2025-01-30T16:49:38.387298Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.713386Z",
     "iopub.status.busy": "2026-01-05T22:43:45.713332Z",
     "iopub.status.idle": "2026-01-05T22:43:45.714922Z",
     "shell.execute_reply": "2026-01-05T22:43:45.714737Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[697.2377409 , 626.20062711, 466.16997843, 351.14303541,\n",
       "        222.10044232,  75.0320284 ],\n",
       "       [665.30205523, 594.26494145, 466.16997843, 351.14303541,\n",
       "        222.10044232,  75.0320284 ],\n",
       "       [906.45593011, 835.41881632, 565.24962574, 450.22268271,\n",
       "        321.18008962, 174.11167571]])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_masses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Isotope distribution\n",
    "\n",
    "`alphabase.constants.isotope.IsotopeDistribution` will calculate the isotope distribution and the mono-isotopic idx in the distribution for a given atom composition. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For an atom, the mono-isotopic idx (`mono_idx`) points to the highest abundance isotope, so the value is `round(mass of highest isotope - mass of first isotope)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:39.845574Z",
     "start_time": "2025-01-30T16:49:39.833715Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.715971Z",
     "iopub.status.busy": "2026-01-05T22:43:45.715901Z",
     "iopub.status.idle": "2026-01-05T22:43:45.721522Z",
     "shell.execute_reply": "2026-01-05T22:43:45.721334Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abundance</th>\n",
       "      <th>mass</th>\n",
       "      <th>mono_idx</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13C</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[12.0, 13.00335483507]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14N</th>\n",
       "      <td>[0.996337, 0.003663]</td>\n",
       "      <td>[14.00307400443, 15.00010889888]</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15N</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[14.00307400443, 15.00010889888]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18O</th>\n",
       "      <td>[0.005, 0.005, 0.99]</td>\n",
       "      <td>[15.99491461957, 16.9991317565, 17.99915961286]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2H</th>\n",
       "      <td>[0.01, 0.99]</td>\n",
       "      <td>[1.00782503223, 2.01410177812]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Xe</th>\n",
       "      <td>[0.000952, 0.00089, 0.019102, 0.264006, 0.0407...</td>\n",
       "      <td>[123.905892, 125.9042983, 127.903531, 128.9047...</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>[1.0]</td>\n",
       "      <td>[88.9058403]</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Yb</th>\n",
       "      <td>[0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0....</td>\n",
       "      <td>[167.9338896, 169.9347664, 170.9363302, 171.93...</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zn</th>\n",
       "      <td>[0.4917, 0.2773, 0.0404, 0.1845, 0.0061]</td>\n",
       "      <td>[63.92914201, 65.92603381, 66.92712775, 67.924...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zr</th>\n",
       "      <td>[0.5145, 0.1122, 0.1715, 0.1738, 0.028]</td>\n",
       "      <td>[89.9046977, 90.9056396, 91.9050347, 93.906310...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>109 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             abundance  \\\n",
       "13C                                       [0.01, 0.99]   \n",
       "14N                               [0.996337, 0.003663]   \n",
       "15N                                       [0.01, 0.99]   \n",
       "18O                               [0.005, 0.005, 0.99]   \n",
       "2H                                        [0.01, 0.99]   \n",
       "..                                                 ...   \n",
       "Xe   [0.000952, 0.00089, 0.019102, 0.264006, 0.0407...   \n",
       "Y                                                [1.0]   \n",
       "Yb   [0.00123, 0.02982, 0.1409, 0.2168, 0.16103, 0....   \n",
       "Zn            [0.4917, 0.2773, 0.0404, 0.1845, 0.0061]   \n",
       "Zr             [0.5145, 0.1122, 0.1715, 0.1738, 0.028]   \n",
       "\n",
       "                                                  mass  mono_idx  \n",
       "13C                             [12.0, 13.00335483507]         1  \n",
       "14N                   [14.00307400443, 15.00010889888]         0  \n",
       "15N                   [14.00307400443, 15.00010889888]         1  \n",
       "18O    [15.99491461957, 16.9991317565, 17.99915961286]         2  \n",
       "2H                      [1.00782503223, 2.01410177812]         1  \n",
       "..                                                 ...       ...  \n",
       "Xe   [123.905892, 125.9042983, 127.903531, 128.9047...         8  \n",
       "Y                                         [88.9058403]         0  \n",
       "Yb   [167.9338896, 169.9347664, 170.9363302, 171.93...         6  \n",
       "Zn   [63.92914201, 65.92603381, 66.92712775, 67.924...         0  \n",
       "Zr   [89.9046977, 90.9056396, 91.9050347, 93.906310...         0  \n",
       "\n",
       "[109 rows x 3 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from alphabase.constants.atom import CHEM_INFO_DICT\n",
    "atom_df = pd.DataFrame().from_dict(CHEM_INFO_DICT, orient='index')\n",
    "def get_mono(masses_abundances):\n",
    "    masses, abundances = masses_abundances\n",
    "    return round(masses[np.argmax(abundances)]-masses[0])\n",
    "atom_df['mono_idx'] = atom_df[['mass','abundance']].apply(\n",
    "    get_mono, axis=1\n",
    ")\n",
    "atom_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`mono_idx` of an atom composition refers to the sum of the `mono_idx` of all atoms. In AlphaBase, `alphabase.constants.isotope.IsotopeDistribution` calculate both isotope abundance and `mono_idx`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, `Fe`'s `mono_idx` is 2 (mass from 53.94 to 55.93), "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:42.838219Z",
     "start_time": "2025-01-30T16:49:42.833554Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.722650Z",
     "iopub.status.busy": "2026-01-05T22:43:45.722593Z",
     "iopub.status.idle": "2026-01-05T22:43:45.724635Z",
     "shell.execute_reply": "2026-01-05T22:43:45.724439Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "abundance                 [0.05845, 0.91754, 0.02119, 0.00282]\n",
       "mass         [53.93960899, 55.93493633, 56.93539284, 57.933...\n",
       "mono_idx                                                     2\n",
       "Name: Fe, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "atom_df.loc['Fe']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So `C(1)Fe(1)`'s `mono_idx` is also 2:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:43.824372Z",
     "start_time": "2025-01-30T16:49:43.261666Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:45.725796Z",
     "iopub.status.busy": "2026-01-05T22:43:45.725738Z",
     "iopub.status.idle": "2026-01-05T22:43:46.211927Z",
     "shell.execute_reply": "2026-01-05T22:43:46.211682Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([5.78245850e-02, 6.25415000e-04, 9.07722322e-01, 3.07809450e-02,\n",
       "        3.01655900e-03, 3.01740000e-05, 0.00000000e+00, 0.00000000e+00,\n",
       "        0.00000000e+00, 0.00000000e+00]),\n",
       " 2)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.isotope import IsotopeDistribution, parse_formula\n",
    "iso = IsotopeDistribution()\n",
    "iso.calc_formula_distribution(\n",
    "    [('C',1),('Fe',1)]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But `13C(1)Fe(1)`'s `mono_idx` should be 3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:43.893066Z",
     "start_time": "2025-01-30T16:49:43.890803Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:46.213218Z",
     "iopub.status.busy": "2026-01-05T22:43:46.213144Z",
     "iopub.status.idle": "2026-01-05T22:43:46.215077Z",
     "shell.execute_reply": "2026-01-05T22:43:46.214896Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([5.845000e-04, 5.786550e-02, 9.175400e-03, 9.085765e-01,\n",
       "        2.100630e-02, 2.791800e-03, 0.000000e+00, 0.000000e+00,\n",
       "        0.000000e+00, 0.000000e+00]),\n",
       " 3)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "iso.calc_formula_distribution(\n",
    "    [('13C',1),('Fe',1)]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `mono_idx` of unlabeled atom compositions is always 0, no matter how big the compositions are. This means `mono` isotope is not necessary to be the `highest` isotope peak, especially when the composition get larger. Here are three examples from small composition to large ones, we can see that the highest peaks move from 0 to 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:44.477521Z",
     "start_time": "2025-01-30T16:49:44.466197Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:46.216208Z",
     "iopub.status.busy": "2026-01-05T22:43:46.216146Z",
     "iopub.status.idle": "2026-01-05T22:43:46.227408Z",
     "shell.execute_reply": "2026-01-05T22:43:46.227215Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('mono=0, highest=0',\n",
       " array([5.53058051e-01, 3.06480210e-01, 1.06031073e-01, 2.73885413e-02,\n",
       "        5.79597328e-03, 1.05055134e-03, 1.67897345e-04, 2.41173838e-05,\n",
       "        3.15729577e-06, 3.80635657e-07]))"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from alphabase.constants.isotope import IsotopeDistribution, parse_formula\n",
    "iso = IsotopeDistribution()\n",
    "\n",
    "formula = 'C(50)H(50)O(20)Na(1)'\n",
    "formula = parse_formula(formula)\n",
    "dist, mono = iso.calc_formula_distribution(formula)\n",
    "f\"mono={mono}, highest={dist.argmax()}\", dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:45.165377Z",
     "start_time": "2025-01-30T16:49:45.162386Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:46.228519Z",
     "iopub.status.busy": "2026-01-05T22:43:46.228449Z",
     "iopub.status.idle": "2026-01-05T22:43:46.230468Z",
     "shell.execute_reply": "2026-01-05T22:43:46.230278Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('mono=0, highest=1',\n",
       " array([3.21124792e-01, 3.53459703e-01, 2.05844502e-01, 8.38383715e-02,\n",
       "        2.66913129e-02, 7.04911613e-03, 1.60206285e-03, 3.21190201e-04,\n",
       "        5.78218885e-05, 9.47198919e-06]))"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = 'C(100)H(100)O(20)Na(1)'\n",
    "formula = parse_formula(formula)\n",
    "dist, mono = iso.calc_formula_distribution(formula)\n",
    "f\"mono={mono}, highest={dist.argmax()}\", dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-01-30T16:49:45.672851Z",
     "start_time": "2025-01-30T16:49:45.670264Z"
    },
    "execution": {
     "iopub.execute_input": "2026-01-05T22:43:46.231464Z",
     "iopub.status.busy": "2026-01-05T22:43:46.231406Z",
     "iopub.status.idle": "2026-01-05T22:43:46.233227Z",
     "shell.execute_reply": "2026-01-05T22:43:46.233050Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('mono=0, highest=2',\n",
       " array([0.10312113, 0.22700935, 0.25713731, 0.19936063, 0.11878142,\n",
       "        0.05791123, 0.02402947, 0.00871637, 0.00281814, 0.00082412]))"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = 'C(200)H(200)O(40)Na(1)'\n",
    "formula = parse_formula(formula)\n",
    "dist, mono = iso.calc_formula_distribution(formula)\n",
    "f\"mono={mono}, highest={dist.argmax()}\", dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.3 ('base')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  },
  "vscode": {
   "interpreter": {
    "hash": "8a3b27e141e49c996c9b863f8707e97aabd49c4a7e8445b9b783b34e4a21a9b2"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}