{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Peptide and Fragment DataFrames\n",
"\n",
"AlphaBase uses Pandas dataframes, a tabular-like data structure, to represent peptides and fragments. The dataframe structure is easy to read from human's perspective, and efficient for input and output from machine's perspective. \n",
"See [tutorial_basic_definitions.ipynb](https://github.com/MannLabs/alphabase/blob/main/docs/tutorials/tutorial_basic_definitions.ipynb) for an introduction to basic concepts and\n",
"[tutorial_spectral_libraries.ipynb](https://github.com/MannLabs/alphabase/blob/main/docs/tutorials/tutorial_spectral_libraries.ipynb) for an introduction to spectral libraries."
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": ""
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Peptide DataFrame\n",
"\n",
"The peptide dataframe must contain four columns: \n",
" - `sequence` for amino acid sequence (str);\n",
" - `mods` for modification names (str, separated by `;`);\n",
" - `mod_sites` for modification sites (str, separated by `;`);\n",
" - `charge` for precursor charge states (int).\n",
"\n",
"Other columns like `precursor_mz` can be flexibly added into the dataframe if necessary; AlphaBase provides functionalities to calculate e.g. `precursor_mz` and isotopes columns."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:22.109951Z",
"start_time": "2025-01-30T14:18:22.089583Z"
}
},
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"peptide_df = pd.DataFrame({\n",
" 'sequence': ['ACDEFHIK', 'APDEFMNIK', 'WDSEFMNTIRAAAAKDDDDR'],\n",
" 'mods': ['Carbamidomethyl@C', '', 'Phospho@S;Oxidation@M'],\n",
" 'mod_sites': ['2', '', '3;6'],\n",
" 'charge': [1,2,3],\n",
"})\n",
"peptide_df"
],
"outputs": [
{
"data": {
"text/plain": [
" sequence mods mod_sites charge\n",
"0 ACDEFHIK Carbamidomethyl@C 2 1\n",
"1 APDEFMNIK 2\n",
"2 WDSEFMNTIRAAAAKDDDDR Phospho@S;Oxidation@M 3;6 3"
],
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" mods | \n",
" mod_sites | \n",
" charge | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" ACDEFHIK | \n",
" Carbamidomethyl@C | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" | 1 | \n",
" APDEFMNIK | \n",
" | \n",
" | \n",
" 2 | \n",
"
\n",
" \n",
" | 2 | \n",
" WDSEFMNTIRAAAAKDDDDR | \n",
" Phospho@S;Oxidation@M | \n",
" 3;6 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fragment DataFrame\n",
"\n",
"The fragments are also organized in a dataframe structure. The column names of the dataframe represent the fragment type, using the schema `Type[_LossType]_zn`, where:\n",
" - `Type` can be `b,y,c,z,a,x`\n",
" - the optional `_LossType` can be `_modloss`, `_H2O`, or `_NH3`, this is optional.\n",
" - `n` is the charge state, for example `1`.\n",
"\n",
"Here are some examples:"
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:22.992763Z",
"start_time": "2025-01-30T14:18:22.968662Z"
}
},
"source": [
"from alphabase.peptide.fragment import create_fragment_mz_dataframe\n",
"frag_mz_df = create_fragment_mz_dataframe(\n",
" peptide_df,\n",
" charged_frag_types=['a_z1','b_z1','c_z1','b_z2','x_z1','y_z1', 'y_H2O_z1','y_modloss_z1','z_z1']\n",
")\n",
"frag_mz_df"
],
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a_z1 | \n",
" b_z1 | \n",
" c_z1 | \n",
" b_z2 | \n",
" x_z1 | \n",
" y_z1 | \n",
" y_H2O_z1 | \n",
" y_modloss_z1 | \n",
" z_z1 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 44.049477 | \n",
" 72.044388 | \n",
" 89.070938 | \n",
" 0.000000 | \n",
" 974.403625 | \n",
" 948.424377 | \n",
" 930.413818 | \n",
" 0.000000 | \n",
" 932.405640 | \n",
"
\n",
" \n",
" | 1 | \n",
" 204.080124 | \n",
" 232.075043 | \n",
" 249.101593 | \n",
" 0.000000 | \n",
" 814.372986 | \n",
" 788.393738 | \n",
" 770.383179 | \n",
" 0.000000 | \n",
" 772.375000 | \n",
"
\n",
" \n",
" | 2 | \n",
" 319.107056 | \n",
" 347.101990 | \n",
" 364.128540 | \n",
" 0.000000 | \n",
" 699.346069 | \n",
" 673.366760 | \n",
" 655.356201 | \n",
" 0.000000 | \n",
" 657.348083 | \n",
"
\n",
" \n",
" | 3 | \n",
" 448.149658 | \n",
" 476.144562 | \n",
" 493.171112 | \n",
" 0.000000 | \n",
" 570.303467 | \n",
" 544.324219 | \n",
" 526.313660 | \n",
" 0.000000 | \n",
" 528.305481 | \n",
"
\n",
" \n",
" | 4 | \n",
" 595.218079 | \n",
" 623.213013 | \n",
" 640.239563 | \n",
" 0.000000 | \n",
" 423.235046 | \n",
" 397.255768 | \n",
" 379.245209 | \n",
" 0.000000 | \n",
" 381.237061 | \n",
"
\n",
" \n",
" | 5 | \n",
" 732.276978 | \n",
" 760.271912 | \n",
" 777.298462 | \n",
" 0.000000 | \n",
" 286.176147 | \n",
" 260.196869 | \n",
" 242.186310 | \n",
" 0.000000 | \n",
" 244.178146 | \n",
"
\n",
" \n",
" | 6 | \n",
" 845.361023 | \n",
" 873.355957 | \n",
" 890.382507 | \n",
" 0.000000 | \n",
" 173.092072 | \n",
" 147.112808 | \n",
" 129.102234 | \n",
" 0.000000 | \n",
" 131.094086 | \n",
"
\n",
" \n",
" | 7 | \n",
" 44.049477 | \n",
" 72.044388 | \n",
" 89.070938 | \n",
" 36.525833 | \n",
" 1019.450256 | \n",
" 993.471008 | \n",
" 975.460449 | \n",
" 0.000000 | \n",
" 977.452271 | \n",
"
\n",
" \n",
" | 8 | \n",
" 141.102234 | \n",
" 169.097153 | \n",
" 186.123703 | \n",
" 85.052216 | \n",
" 922.397522 | \n",
" 896.418213 | \n",
" 878.407654 | \n",
" 0.000000 | \n",
" 880.399536 | \n",
"
\n",
" \n",
" | 9 | \n",
" 256.129181 | \n",
" 284.124084 | \n",
" 301.150635 | \n",
" 142.565689 | \n",
" 807.370544 | \n",
" 781.391296 | \n",
" 763.380737 | \n",
" 0.000000 | \n",
" 765.372559 | \n",
"
\n",
" \n",
" | 10 | \n",
" 385.171783 | \n",
" 413.166687 | \n",
" 430.193237 | \n",
" 207.086990 | \n",
" 678.327942 | \n",
" 652.348694 | \n",
" 634.338135 | \n",
" 0.000000 | \n",
" 636.329956 | \n",
"
\n",
" \n",
" | 11 | \n",
" 532.240173 | \n",
" 560.235107 | \n",
" 577.261658 | \n",
" 280.621185 | \n",
" 531.259521 | \n",
" 505.280273 | \n",
" 487.269714 | \n",
" 0.000000 | \n",
" 489.261566 | \n",
"
\n",
" \n",
" | 12 | \n",
" 663.280701 | \n",
" 691.275574 | \n",
" 708.302124 | \n",
" 346.141418 | \n",
" 400.219055 | \n",
" 374.239807 | \n",
" 356.229218 | \n",
" 0.000000 | \n",
" 358.221069 | \n",
"
\n",
" \n",
" | 13 | \n",
" 777.323608 | \n",
" 805.318542 | \n",
" 822.345093 | \n",
" 403.162903 | \n",
" 286.176147 | \n",
" 260.196869 | \n",
" 242.186310 | \n",
" 0.000000 | \n",
" 244.178146 | \n",
"
\n",
" \n",
" | 14 | \n",
" 890.407654 | \n",
" 918.402588 | \n",
" 935.429138 | \n",
" 459.704926 | \n",
" 173.092072 | \n",
" 147.112808 | \n",
" 129.102234 | \n",
" 0.000000 | \n",
" 131.094086 | \n",
"
\n",
" \n",
" | 15 | \n",
" 159.091675 | \n",
" 187.086594 | \n",
" 204.113144 | \n",
" 94.046936 | \n",
" 2262.896973 | \n",
" 2236.917725 | \n",
" 2218.906982 | \n",
" 2138.940674 | \n",
" 2220.898926 | \n",
"
\n",
" \n",
" | 16 | \n",
" 274.118622 | \n",
" 302.113525 | \n",
" 319.140076 | \n",
" 151.560410 | \n",
" 2147.869873 | \n",
" 2121.890625 | \n",
" 2103.880127 | \n",
" 2023.913818 | \n",
" 2105.872070 | \n",
"
\n",
" \n",
" | 17 | \n",
" 441.116974 | \n",
" 469.111877 | \n",
" 486.138428 | \n",
" 235.059586 | \n",
" 1980.871582 | \n",
" 1954.892334 | \n",
" 1936.881714 | \n",
" 0.000000 | \n",
" 1938.873657 | \n",
"
\n",
" \n",
" | 18 | \n",
" 570.159546 | \n",
" 598.154480 | \n",
" 615.181030 | \n",
" 299.580872 | \n",
" 1851.828979 | \n",
" 1825.849731 | \n",
" 1807.839111 | \n",
" 0.000000 | \n",
" 1809.831055 | \n",
"
\n",
" \n",
" | 19 | \n",
" 717.227966 | \n",
" 745.222900 | \n",
" 762.249451 | \n",
" 373.115082 | \n",
" 1704.760620 | \n",
" 1678.781372 | \n",
" 1660.770752 | \n",
" 0.000000 | \n",
" 1662.762573 | \n",
"
\n",
" \n",
" | 20 | \n",
" 864.263367 | \n",
" 892.258301 | \n",
" 909.284851 | \n",
" 446.632782 | \n",
" 1557.725220 | \n",
" 1531.745972 | \n",
" 1513.735352 | \n",
" 0.000000 | \n",
" 1515.727173 | \n",
"
\n",
" \n",
" | 21 | \n",
" 978.306335 | \n",
" 1006.301208 | \n",
" 1023.327759 | \n",
" 503.654266 | \n",
" 1443.682251 | \n",
" 1417.703003 | \n",
" 1399.692383 | \n",
" 0.000000 | \n",
" 1401.684326 | \n",
"
\n",
" \n",
" | 22 | \n",
" 1079.354004 | \n",
" 1107.348877 | \n",
" 1124.375488 | \n",
" 554.178101 | \n",
" 1342.634521 | \n",
" 1316.655273 | \n",
" 1298.644775 | \n",
" 0.000000 | \n",
" 1300.636597 | \n",
"
\n",
" \n",
" | 23 | \n",
" 1192.438110 | \n",
" 1220.432983 | \n",
" 1237.459473 | \n",
" 610.720093 | \n",
" 1229.550537 | \n",
" 1203.571289 | \n",
" 1185.560669 | \n",
" 0.000000 | \n",
" 1187.552490 | \n",
"
\n",
" \n",
" | 24 | \n",
" 1348.539185 | \n",
" 1376.534058 | \n",
" 1393.560669 | \n",
" 688.770691 | \n",
" 1073.449463 | \n",
" 1047.470093 | \n",
" 1029.459595 | \n",
" 0.000000 | \n",
" 1031.451416 | \n",
"
\n",
" \n",
" | 25 | \n",
" 1419.576294 | \n",
" 1447.571167 | \n",
" 1464.597778 | \n",
" 724.289246 | \n",
" 1002.412292 | \n",
" 976.433044 | \n",
" 958.422485 | \n",
" 0.000000 | \n",
" 960.414307 | \n",
"
\n",
" \n",
" | 26 | \n",
" 1490.613403 | \n",
" 1518.608276 | \n",
" 1535.634888 | \n",
" 759.807800 | \n",
" 931.375183 | \n",
" 905.395935 | \n",
" 887.385376 | \n",
" 0.000000 | \n",
" 889.377197 | \n",
"
\n",
" \n",
" | 27 | \n",
" 1561.650513 | \n",
" 1589.645386 | \n",
" 1606.671997 | \n",
" 795.326355 | \n",
" 860.338074 | \n",
" 834.358826 | \n",
" 816.348267 | \n",
" 0.000000 | \n",
" 818.340088 | \n",
"
\n",
" \n",
" | 28 | \n",
" 1632.687622 | \n",
" 1660.682495 | \n",
" 1677.709106 | \n",
" 830.844910 | \n",
" 789.300964 | \n",
" 763.321716 | \n",
" 745.311096 | \n",
" 0.000000 | \n",
" 747.302979 | \n",
"
\n",
" \n",
" | 29 | \n",
" 1760.782593 | \n",
" 1788.777466 | \n",
" 1805.804077 | \n",
" 894.892395 | \n",
" 661.205994 | \n",
" 635.226746 | \n",
" 617.216187 | \n",
" 0.000000 | \n",
" 619.208008 | \n",
"
\n",
" \n",
" | 30 | \n",
" 1875.809570 | \n",
" 1903.804443 | \n",
" 1920.830933 | \n",
" 952.405884 | \n",
" 546.179016 | \n",
" 520.199768 | \n",
" 502.189209 | \n",
" 0.000000 | \n",
" 504.181061 | \n",
"
\n",
" \n",
" | 31 | \n",
" 1990.836426 | \n",
" 2018.831421 | \n",
" 2035.857910 | \n",
" 1009.919312 | \n",
" 431.152100 | \n",
" 405.172852 | \n",
" 387.162262 | \n",
" 0.000000 | \n",
" 389.154114 | \n",
"
\n",
" \n",
" | 32 | \n",
" 2105.863525 | \n",
" 2133.858398 | \n",
" 2150.884766 | \n",
" 1067.432861 | \n",
" 316.125153 | \n",
" 290.145905 | \n",
" 272.135345 | \n",
" 0.000000 | \n",
" 274.127167 | \n",
"
\n",
" \n",
" | 33 | \n",
" 2220.890381 | \n",
" 2248.885254 | \n",
" 2265.911865 | \n",
" 1124.946289 | \n",
" 201.098221 | \n",
" 175.118958 | \n",
" 157.108383 | \n",
" 0.000000 | \n",
" 159.100235 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a_z1 b_z1 c_z1 b_z2 x_z1 \\\n",
"0 44.049477 72.044388 89.070938 0.000000 974.403625 \n",
"1 204.080124 232.075043 249.101593 0.000000 814.372986 \n",
"2 319.107056 347.101990 364.128540 0.000000 699.346069 \n",
"3 448.149658 476.144562 493.171112 0.000000 570.303467 \n",
"4 595.218079 623.213013 640.239563 0.000000 423.235046 \n",
"5 732.276978 760.271912 777.298462 0.000000 286.176147 \n",
"6 845.361023 873.355957 890.382507 0.000000 173.092072 \n",
"7 44.049477 72.044388 89.070938 36.525833 1019.450256 \n",
"8 141.102234 169.097153 186.123703 85.052216 922.397522 \n",
"9 256.129181 284.124084 301.150635 142.565689 807.370544 \n",
"10 385.171783 413.166687 430.193237 207.086990 678.327942 \n",
"11 532.240173 560.235107 577.261658 280.621185 531.259521 \n",
"12 663.280701 691.275574 708.302124 346.141418 400.219055 \n",
"13 777.323608 805.318542 822.345093 403.162903 286.176147 \n",
"14 890.407654 918.402588 935.429138 459.704926 173.092072 \n",
"15 159.091675 187.086594 204.113144 94.046936 2262.896973 \n",
"16 274.118622 302.113525 319.140076 151.560410 2147.869873 \n",
"17 441.116974 469.111877 486.138428 235.059586 1980.871582 \n",
"18 570.159546 598.154480 615.181030 299.580872 1851.828979 \n",
"19 717.227966 745.222900 762.249451 373.115082 1704.760620 \n",
"20 864.263367 892.258301 909.284851 446.632782 1557.725220 \n",
"21 978.306335 1006.301208 1023.327759 503.654266 1443.682251 \n",
"22 1079.354004 1107.348877 1124.375488 554.178101 1342.634521 \n",
"23 1192.438110 1220.432983 1237.459473 610.720093 1229.550537 \n",
"24 1348.539185 1376.534058 1393.560669 688.770691 1073.449463 \n",
"25 1419.576294 1447.571167 1464.597778 724.289246 1002.412292 \n",
"26 1490.613403 1518.608276 1535.634888 759.807800 931.375183 \n",
"27 1561.650513 1589.645386 1606.671997 795.326355 860.338074 \n",
"28 1632.687622 1660.682495 1677.709106 830.844910 789.300964 \n",
"29 1760.782593 1788.777466 1805.804077 894.892395 661.205994 \n",
"30 1875.809570 1903.804443 1920.830933 952.405884 546.179016 \n",
"31 1990.836426 2018.831421 2035.857910 1009.919312 431.152100 \n",
"32 2105.863525 2133.858398 2150.884766 1067.432861 316.125153 \n",
"33 2220.890381 2248.885254 2265.911865 1124.946289 201.098221 \n",
"\n",
" y_z1 y_H2O_z1 y_modloss_z1 z_z1 \n",
"0 948.424377 930.413818 0.000000 932.405640 \n",
"1 788.393738 770.383179 0.000000 772.375000 \n",
"2 673.366760 655.356201 0.000000 657.348083 \n",
"3 544.324219 526.313660 0.000000 528.305481 \n",
"4 397.255768 379.245209 0.000000 381.237061 \n",
"5 260.196869 242.186310 0.000000 244.178146 \n",
"6 147.112808 129.102234 0.000000 131.094086 \n",
"7 993.471008 975.460449 0.000000 977.452271 \n",
"8 896.418213 878.407654 0.000000 880.399536 \n",
"9 781.391296 763.380737 0.000000 765.372559 \n",
"10 652.348694 634.338135 0.000000 636.329956 \n",
"11 505.280273 487.269714 0.000000 489.261566 \n",
"12 374.239807 356.229218 0.000000 358.221069 \n",
"13 260.196869 242.186310 0.000000 244.178146 \n",
"14 147.112808 129.102234 0.000000 131.094086 \n",
"15 2236.917725 2218.906982 2138.940674 2220.898926 \n",
"16 2121.890625 2103.880127 2023.913818 2105.872070 \n",
"17 1954.892334 1936.881714 0.000000 1938.873657 \n",
"18 1825.849731 1807.839111 0.000000 1809.831055 \n",
"19 1678.781372 1660.770752 0.000000 1662.762573 \n",
"20 1531.745972 1513.735352 0.000000 1515.727173 \n",
"21 1417.703003 1399.692383 0.000000 1401.684326 \n",
"22 1316.655273 1298.644775 0.000000 1300.636597 \n",
"23 1203.571289 1185.560669 0.000000 1187.552490 \n",
"24 1047.470093 1029.459595 0.000000 1031.451416 \n",
"25 976.433044 958.422485 0.000000 960.414307 \n",
"26 905.395935 887.385376 0.000000 889.377197 \n",
"27 834.358826 816.348267 0.000000 818.340088 \n",
"28 763.321716 745.311096 0.000000 747.302979 \n",
"29 635.226746 617.216187 0.000000 619.208008 \n",
"30 520.199768 502.189209 0.000000 504.181061 \n",
"31 405.172852 387.162262 0.000000 389.154114 \n",
"32 290.145905 272.135345 0.000000 274.127167 \n",
"33 175.118958 157.108383 0.000000 159.100235 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that all N-term (a/b/c) fragment mz values are in ascending order, e.g. from b[1] to b[n-1]; and all C-term (x/y/z) fragments are in descending order, e.g. from y[n-1] to y[1]."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "The fragment dataframe is connected to the peptide (precursor) dataframe by the `frag_start_idx` and `frag_stop_idx` columns of the peptide dataframe. These two values can locate all fragments of a peptide in the fragment dataframe, as shown in the figure."
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:24.889327Z",
"start_time": "2025-01-30T14:18:24.864750Z"
}
},
"source": "peptide_df",
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" mods | \n",
" mod_sites | \n",
" charge | \n",
" nAA | \n",
" frag_start_idx | \n",
" frag_stop_idx | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" ACDEFHIK | \n",
" Carbamidomethyl@C | \n",
" 2 | \n",
" 1 | \n",
" 8 | \n",
" 0 | \n",
" 7 | \n",
"
\n",
" \n",
" | 1 | \n",
" APDEFMNIK | \n",
" | \n",
" | \n",
" 2 | \n",
" 9 | \n",
" 7 | \n",
" 15 | \n",
"
\n",
" \n",
" | 2 | \n",
" WDSEFMNTIRAAAAKDDDDR | \n",
" Phospho@S;Oxidation@M | \n",
" 3;6 | \n",
" 3 | \n",
" 20 | \n",
" 15 | \n",
" 34 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence mods mod_sites charge nAA \\\n",
"0 ACDEFHIK Carbamidomethyl@C 2 1 8 \n",
"1 APDEFMNIK 2 9 \n",
"2 WDSEFMNTIRAAAAKDDDDR Phospho@S;Oxidation@M 3;6 3 20 \n",
"\n",
" frag_start_idx frag_stop_idx \n",
"0 0 7 \n",
"1 7 15 \n",
"2 15 34 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 15
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:25.406770Z",
"start_time": "2025-01-30T14:18:25.388535Z"
}
},
"source": [
"selected_peptide_index = -1 # last peptide\n",
"start = peptide_df['frag_start_idx'].values[selected_peptide_index]\n",
"stop = peptide_df['frag_stop_idx'].values[selected_peptide_index]\n",
"frag_mz_df.iloc[start:stop]"
],
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a_z1 | \n",
" b_z1 | \n",
" c_z1 | \n",
" b_z2 | \n",
" x_z1 | \n",
" y_z1 | \n",
" y_H2O_z1 | \n",
" y_modloss_z1 | \n",
" z_z1 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 15 | \n",
" 159.091675 | \n",
" 187.086594 | \n",
" 204.113144 | \n",
" 94.046936 | \n",
" 2262.896973 | \n",
" 2236.917725 | \n",
" 2218.906982 | \n",
" 2138.940674 | \n",
" 2220.898926 | \n",
"
\n",
" \n",
" | 16 | \n",
" 274.118622 | \n",
" 302.113525 | \n",
" 319.140076 | \n",
" 151.560410 | \n",
" 2147.869873 | \n",
" 2121.890625 | \n",
" 2103.880127 | \n",
" 2023.913818 | \n",
" 2105.872070 | \n",
"
\n",
" \n",
" | 17 | \n",
" 441.116974 | \n",
" 469.111877 | \n",
" 486.138428 | \n",
" 235.059586 | \n",
" 1980.871582 | \n",
" 1954.892334 | \n",
" 1936.881714 | \n",
" 0.000000 | \n",
" 1938.873657 | \n",
"
\n",
" \n",
" | 18 | \n",
" 570.159546 | \n",
" 598.154480 | \n",
" 615.181030 | \n",
" 299.580872 | \n",
" 1851.828979 | \n",
" 1825.849731 | \n",
" 1807.839111 | \n",
" 0.000000 | \n",
" 1809.831055 | \n",
"
\n",
" \n",
" | 19 | \n",
" 717.227966 | \n",
" 745.222900 | \n",
" 762.249451 | \n",
" 373.115082 | \n",
" 1704.760620 | \n",
" 1678.781372 | \n",
" 1660.770752 | \n",
" 0.000000 | \n",
" 1662.762573 | \n",
"
\n",
" \n",
" | 20 | \n",
" 864.263367 | \n",
" 892.258301 | \n",
" 909.284851 | \n",
" 446.632782 | \n",
" 1557.725220 | \n",
" 1531.745972 | \n",
" 1513.735352 | \n",
" 0.000000 | \n",
" 1515.727173 | \n",
"
\n",
" \n",
" | 21 | \n",
" 978.306335 | \n",
" 1006.301208 | \n",
" 1023.327759 | \n",
" 503.654266 | \n",
" 1443.682251 | \n",
" 1417.703003 | \n",
" 1399.692383 | \n",
" 0.000000 | \n",
" 1401.684326 | \n",
"
\n",
" \n",
" | 22 | \n",
" 1079.354004 | \n",
" 1107.348877 | \n",
" 1124.375488 | \n",
" 554.178101 | \n",
" 1342.634521 | \n",
" 1316.655273 | \n",
" 1298.644775 | \n",
" 0.000000 | \n",
" 1300.636597 | \n",
"
\n",
" \n",
" | 23 | \n",
" 1192.438110 | \n",
" 1220.432983 | \n",
" 1237.459473 | \n",
" 610.720093 | \n",
" 1229.550537 | \n",
" 1203.571289 | \n",
" 1185.560669 | \n",
" 0.000000 | \n",
" 1187.552490 | \n",
"
\n",
" \n",
" | 24 | \n",
" 1348.539185 | \n",
" 1376.534058 | \n",
" 1393.560669 | \n",
" 688.770691 | \n",
" 1073.449463 | \n",
" 1047.470093 | \n",
" 1029.459595 | \n",
" 0.000000 | \n",
" 1031.451416 | \n",
"
\n",
" \n",
" | 25 | \n",
" 1419.576294 | \n",
" 1447.571167 | \n",
" 1464.597778 | \n",
" 724.289246 | \n",
" 1002.412292 | \n",
" 976.433044 | \n",
" 958.422485 | \n",
" 0.000000 | \n",
" 960.414307 | \n",
"
\n",
" \n",
" | 26 | \n",
" 1490.613403 | \n",
" 1518.608276 | \n",
" 1535.634888 | \n",
" 759.807800 | \n",
" 931.375183 | \n",
" 905.395935 | \n",
" 887.385376 | \n",
" 0.000000 | \n",
" 889.377197 | \n",
"
\n",
" \n",
" | 27 | \n",
" 1561.650513 | \n",
" 1589.645386 | \n",
" 1606.671997 | \n",
" 795.326355 | \n",
" 860.338074 | \n",
" 834.358826 | \n",
" 816.348267 | \n",
" 0.000000 | \n",
" 818.340088 | \n",
"
\n",
" \n",
" | 28 | \n",
" 1632.687622 | \n",
" 1660.682495 | \n",
" 1677.709106 | \n",
" 830.844910 | \n",
" 789.300964 | \n",
" 763.321716 | \n",
" 745.311096 | \n",
" 0.000000 | \n",
" 747.302979 | \n",
"
\n",
" \n",
" | 29 | \n",
" 1760.782593 | \n",
" 1788.777466 | \n",
" 1805.804077 | \n",
" 894.892395 | \n",
" 661.205994 | \n",
" 635.226746 | \n",
" 617.216187 | \n",
" 0.000000 | \n",
" 619.208008 | \n",
"
\n",
" \n",
" | 30 | \n",
" 1875.809570 | \n",
" 1903.804443 | \n",
" 1920.830933 | \n",
" 952.405884 | \n",
" 546.179016 | \n",
" 520.199768 | \n",
" 502.189209 | \n",
" 0.000000 | \n",
" 504.181061 | \n",
"
\n",
" \n",
" | 31 | \n",
" 1990.836426 | \n",
" 2018.831421 | \n",
" 2035.857910 | \n",
" 1009.919312 | \n",
" 431.152100 | \n",
" 405.172852 | \n",
" 387.162262 | \n",
" 0.000000 | \n",
" 389.154114 | \n",
"
\n",
" \n",
" | 32 | \n",
" 2105.863525 | \n",
" 2133.858398 | \n",
" 2150.884766 | \n",
" 1067.432861 | \n",
" 316.125153 | \n",
" 290.145905 | \n",
" 272.135345 | \n",
" 0.000000 | \n",
" 274.127167 | \n",
"
\n",
" \n",
" | 33 | \n",
" 2220.890381 | \n",
" 2248.885254 | \n",
" 2265.911865 | \n",
" 1124.946289 | \n",
" 201.098221 | \n",
" 175.118958 | \n",
" 157.108383 | \n",
" 0.000000 | \n",
" 159.100235 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a_z1 b_z1 c_z1 b_z2 x_z1 \\\n",
"15 159.091675 187.086594 204.113144 94.046936 2262.896973 \n",
"16 274.118622 302.113525 319.140076 151.560410 2147.869873 \n",
"17 441.116974 469.111877 486.138428 235.059586 1980.871582 \n",
"18 570.159546 598.154480 615.181030 299.580872 1851.828979 \n",
"19 717.227966 745.222900 762.249451 373.115082 1704.760620 \n",
"20 864.263367 892.258301 909.284851 446.632782 1557.725220 \n",
"21 978.306335 1006.301208 1023.327759 503.654266 1443.682251 \n",
"22 1079.354004 1107.348877 1124.375488 554.178101 1342.634521 \n",
"23 1192.438110 1220.432983 1237.459473 610.720093 1229.550537 \n",
"24 1348.539185 1376.534058 1393.560669 688.770691 1073.449463 \n",
"25 1419.576294 1447.571167 1464.597778 724.289246 1002.412292 \n",
"26 1490.613403 1518.608276 1535.634888 759.807800 931.375183 \n",
"27 1561.650513 1589.645386 1606.671997 795.326355 860.338074 \n",
"28 1632.687622 1660.682495 1677.709106 830.844910 789.300964 \n",
"29 1760.782593 1788.777466 1805.804077 894.892395 661.205994 \n",
"30 1875.809570 1903.804443 1920.830933 952.405884 546.179016 \n",
"31 1990.836426 2018.831421 2035.857910 1009.919312 431.152100 \n",
"32 2105.863525 2133.858398 2150.884766 1067.432861 316.125153 \n",
"33 2220.890381 2248.885254 2265.911865 1124.946289 201.098221 \n",
"\n",
" y_z1 y_H2O_z1 y_modloss_z1 z_z1 \n",
"15 2236.917725 2218.906982 2138.940674 2220.898926 \n",
"16 2121.890625 2103.880127 2023.913818 2105.872070 \n",
"17 1954.892334 1936.881714 0.000000 1938.873657 \n",
"18 1825.849731 1807.839111 0.000000 1809.831055 \n",
"19 1678.781372 1660.770752 0.000000 1662.762573 \n",
"20 1531.745972 1513.735352 0.000000 1515.727173 \n",
"21 1417.703003 1399.692383 0.000000 1401.684326 \n",
"22 1316.655273 1298.644775 0.000000 1300.636597 \n",
"23 1203.571289 1185.560669 0.000000 1187.552490 \n",
"24 1047.470093 1029.459595 0.000000 1031.451416 \n",
"25 976.433044 958.422485 0.000000 960.414307 \n",
"26 905.395935 887.385376 0.000000 889.377197 \n",
"27 834.358826 816.348267 0.000000 818.340088 \n",
"28 763.321716 745.311096 0.000000 747.302979 \n",
"29 635.226746 617.216187 0.000000 619.208008 \n",
"30 520.199768 502.189209 0.000000 504.181061 \n",
"31 405.172852 387.162262 0.000000 389.154114 \n",
"32 290.145905 272.135345 0.000000 274.127167 \n",
"33 175.118958 157.108383 0.000000 159.100235 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using on several fragment dataframes (e.g., m/z and intensity dataframes) may be not convenient in some situations, especially when we need to operate subsets of the dataframes. \n",
"Therefore, alphabase also provides a flattened fragment dataframe structure to store all fragment information."
]
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:26.925743Z",
"start_time": "2025-01-30T14:18:26.899465Z"
}
},
"source": [
"from alphabase.peptide.fragment import flatten_fragments\n",
"\n",
"dummy_frag_intensity_df = pd.DataFrame(\n",
" np.zeros_like(frag_mz_df.values),\n",
" columns=frag_mz_df.columns\n",
" )\n",
"\n",
"precursor_df, flat_frag_df = flatten_fragments(\n",
" precursor_df=peptide_df, \n",
" fragment_mz_df=frag_mz_df, \n",
" fragment_intensity_df=dummy_frag_intensity_df\n",
")"
],
"outputs": [],
"execution_count": 17
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:27.632516Z",
"start_time": "2025-01-30T14:18:27.612635Z"
}
},
"source": [
"precursor_df"
],
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" mods | \n",
" mod_sites | \n",
" charge | \n",
" nAA | \n",
" frag_start_idx | \n",
" frag_stop_idx | \n",
" flat_frag_start_idx | \n",
" flat_frag_stop_idx | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" ACDEFHIK | \n",
" Carbamidomethyl@C | \n",
" 2 | \n",
" 1 | \n",
" 8 | \n",
" 0 | \n",
" 7 | \n",
" 0 | \n",
" 49 | \n",
"
\n",
" \n",
" | 1 | \n",
" APDEFMNIK | \n",
" | \n",
" | \n",
" 2 | \n",
" 9 | \n",
" 7 | \n",
" 15 | \n",
" 49 | \n",
" 113 | \n",
"
\n",
" \n",
" | 2 | \n",
" WDSEFMNTIRAAAAKDDDDR | \n",
" Phospho@S;Oxidation@M | \n",
" 3;6 | \n",
" 3 | \n",
" 20 | \n",
" 15 | \n",
" 34 | \n",
" 113 | \n",
" 267 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence mods mod_sites charge nAA \\\n",
"0 ACDEFHIK Carbamidomethyl@C 2 1 8 \n",
"1 APDEFMNIK 2 9 \n",
"2 WDSEFMNTIRAAAAKDDDDR Phospho@S;Oxidation@M 3;6 3 20 \n",
"\n",
" frag_start_idx frag_stop_idx flat_frag_start_idx flat_frag_stop_idx \n",
"0 0 7 0 49 \n",
"1 7 15 49 113 \n",
"2 15 34 113 267 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 18
},
{
"cell_type": "code",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-30T14:18:28.259584Z",
"start_time": "2025-01-30T14:18:28.237372Z"
}
},
"source": [
"flat_frag_df"
],
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mz | \n",
" intensity | \n",
" type | \n",
" loss_type | \n",
" charge | \n",
" number | \n",
" position | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 44.049477 | \n",
" 0.0 | \n",
" 97 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 72.044388 | \n",
" 0.0 | \n",
" 98 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 89.070938 | \n",
" 0.0 | \n",
" 99 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 974.403625 | \n",
" 0.0 | \n",
" 120 | \n",
" 0 | \n",
" 1 | \n",
" 7 | \n",
" 0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 948.424377 | \n",
" 0.0 | \n",
" 121 | \n",
" 0 | \n",
" 1 | \n",
" 7 | \n",
" 0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 262 | \n",
" 1124.946289 | \n",
" 0.0 | \n",
" 98 | \n",
" 0 | \n",
" 2 | \n",
" 19 | \n",
" 18 | \n",
"
\n",
" \n",
" | 263 | \n",
" 201.098221 | \n",
" 0.0 | \n",
" 120 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 18 | \n",
"
\n",
" \n",
" | 264 | \n",
" 175.118958 | \n",
" 0.0 | \n",
" 121 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 18 | \n",
"
\n",
" \n",
" | 265 | \n",
" 157.108383 | \n",
" 0.0 | \n",
" 121 | \n",
" 18 | \n",
" 1 | \n",
" 1 | \n",
" 18 | \n",
"
\n",
" \n",
" | 266 | \n",
" 159.100235 | \n",
" 0.0 | \n",
" 122 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 18 | \n",
"
\n",
" \n",
"
\n",
"
267 rows × 7 columns
\n",
"
"
],
"text/plain": [
" mz intensity type loss_type charge number position\n",
"0 44.049477 0.0 97 0 1 1 0\n",
"1 72.044388 0.0 98 0 1 1 0\n",
"2 89.070938 0.0 99 0 1 1 0\n",
"3 974.403625 0.0 120 0 1 7 0\n",
"4 948.424377 0.0 121 0 1 7 0\n",
".. ... ... ... ... ... ... ...\n",
"262 1124.946289 0.0 98 0 2 19 18\n",
"263 201.098221 0.0 120 0 1 1 18\n",
"264 175.118958 0.0 121 0 1 1 18\n",
"265 157.108383 0.0 121 18 1 1 18\n",
"266 159.100235 0.0 122 0 1 1 18\n",
"\n",
"[267 rows x 7 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the flattened fragment dataframe, it contains `mz`, `intensity`, `type`, `loss_type`, `charge`, `number`, and `position` columns, other columns can be flexibly added. All columns are converted to numeric values for better processing in numpy and numba package. For instance , `type` is the ASCII code of `abc/xyz` ions, `a`=97, `b`=98, `c`=99, `x`=120, `y`=121, and `z`=122. Losses are also converted to numbers as well, therefore, Water loss becomes `18`, and phospho loss becomes `98`. \n",
"\n",
"And similar to `frag_start_idx` and `frag_stop_idx`, we use `flat_frag_start_idx` and `flat_frag_stop_idx` to keep the connection between the precursor dataframe and the flattened fragment dataframe."
]
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": ""
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.3 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "8a3b27e141e49c996c9b863f8707e97aabd49c4a7e8445b9b783b34e4a21a9b2"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}