{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Spectral Libraries\n", "\n", "This notebook introduces functionalities for spectral libraries to developers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Base Library Class\n", "\n", "`alphabase.spectral_library.base.SpecLibBase` is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via `alphabase.protein.fasta.SpecLibFasta`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `SpecLibFasta`\n", "\n", "Almost all DataFrame functionalities to process proteins and peptides have been integrated into `alphabase.protein.fasta.SpecLibFasta`. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n" ] } ], "source": [ "from alphabase.protein.fasta import SpecLibFasta\n", "\n", "fasta_lib = SpecLibFasta(\n", " charged_frag_types=['b_z1','y_z1'],\n", " protease='trypsin',\n", " fix_mods=['Carbamidomethyl@C'],\n", " var_mods=['Acetyl@Protein_N-term','Oxidation@M'],\n", " decoy=None,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start from fasta/proteins\n", "\n", "The SpecLibFasta will do following for us:\n", "\n", "- Load fasta files into a protein_dict\n", "- Digest proteins into peptide sequences\n", "- Append decoy peptide sequences if self.decoy is not None\n", "- Add fixed and variable modifications\n", "- [Add special modifications]\n", "- [Add peptide labeling]\n", "- Add charge states to peptides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load fasta files into a protein_dict" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# from alphabase.protein.fasta import load_all_proteins\n", "# protein_dict = load_all_proteins(fasta_files)\n", "\n", "# For example, the protein_dict is:\n", "protein_dict = {\n", " 'yy': {\n", " 'protein_id': 'yy',\n", " 'full_name': 'yy_yy',\n", " 'gene_name': 'y_y',\n", " 'sequence': 'FGHIKLMNPQR'\n", " },\n", " 'xx': {\n", " 'protein_id': 'xx',\n", " 'full_name': 'xx_xx',\n", " 'gene_name': 'x_x',\n", " 'sequence': 'MACDESTYKXKFGHIKLMNPQRST'\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Digest proteins into peptide sequences" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAA
0XKFGHIK11FalseFalse7
1LMNPQRST11FalseTrue8
2ACDESTYK10TrueFalse8
3MACDESTYK10TrueFalse9
4ACDESTYKXK11TrueFalse10
5FGHIKLMNPQR0;11TrueTrue11
6MACDESTYKXK11TrueFalse11
7XKFGHIKLMNPQR12FalseFalse13
8FGHIKLMNPQRST12FalseTrue13
9ACDESTYKXKFGHIK12TrueFalse15
10MACDESTYKXKFGHIK12TrueFalse16
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm \\\n", "0 XKFGHIK 1 1 False \n", "1 LMNPQRST 1 1 False \n", "2 ACDESTYK 1 0 True \n", "3 MACDESTYK 1 0 True \n", "4 ACDESTYKXK 1 1 True \n", "5 FGHIKLMNPQR 0;1 1 True \n", "6 MACDESTYKXK 1 1 True \n", "7 XKFGHIKLMNPQR 1 2 False \n", "8 FGHIKLMNPQRST 1 2 False \n", "9 ACDESTYKXKFGHIK 1 2 True \n", "10 MACDESTYKXKFGHIK 1 2 True \n", "\n", " is_prot_cterm mods mod_sites nAA \n", "0 False 7 \n", "1 True 8 \n", "2 False 8 \n", "3 False 9 \n", "4 False 10 \n", "5 True 11 \n", "6 False 11 \n", "7 False 13 \n", "8 True 13 \n", "9 False 15 \n", "10 False 16 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.get_peptides_from_protein_dict(protein_dict)\n", "fasta_lib.precursor_df" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
protein_idfull_namegene_namesequence
0yyyy_yyy_yFGHIKLMNPQR
1xxxx_xxx_xMACDESTYKXKFGHIKLMNPQRST
\n", "
" ], "text/plain": [ " protein_id full_name gene_name sequence\n", "0 yy yy_yy y_y FGHIKLMNPQR\n", "1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.protein_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Append decoy sequences\n", "\n", "This depends on self.decoy:str, its value can be \n", "\n", "- `protein_reverse`: Reverse on target protein sequences\n", "- `pseudo_reverse`: Pseudo-reverse on target peptide sequences\n", "- `diann`: DiaNN-like decoy\n", "- None: no decoy. \n", " \n", "Let's take `diann` as an example:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoy
20MACDESTYKXKFGHIK12TrueFalse160
10FGHIKLMNPQR0;11TrueTrue110
14FLHIKLMNPQRTT12FalseTrue131
13FLHIKLMNPNR0;11TrueTrue111
1XLFGHVK11FalseFalse71
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm \\\n", "20 MACDESTYKXKFGHIK 1 2 True \n", "10 FGHIKLMNPQR 0;1 1 True \n", "14 FLHIKLMNPQRTT 1 2 False \n", "13 FLHIKLMNPNR 0;1 1 True \n", "1 XLFGHVK 1 1 False \n", "\n", " is_prot_cterm mods mod_sites nAA decoy \n", "20 False 16 0 \n", "10 True 11 0 \n", "14 True 13 1 \n", "13 True 11 1 \n", "1 False 7 1 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.decoy = 'diann'\n", "fasta_lib.append_decoy_sequence()\n", "fasta_lib.precursor_df.sample(5, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add modifications\n", "\n", "`add_modifications()` will add fixed and variable modifications. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoy
35FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;7111
34FLHIKLMNPNR0;11TrueTrue111
41FGHIKLMNPQRST12FalseTrueOxidation@M7130
27MACDESTYKXK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3110
11MACDESTYK10TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;390
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n", "35 FLHIKLMNPNR 0;1 1 True True \n", "34 FLHIKLMNPNR 0;1 1 True True \n", "41 FGHIKLMNPQRST 1 2 False True \n", "27 MACDESTYKXK 1 1 True False \n", "11 MACDESTYK 1 0 True False \n", "\n", " mods mod_sites nAA decoy \n", "35 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n", "34 11 1 \n", "41 Oxidation@M 7 13 0 \n", "27 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 0 \n", "11 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.add_modifications()\n", "fasta_lib.precursor_df.sample(5, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add special modifications\n", "\n", "Special modifications here refer to some PTMs we want to have more controls on:\n", "\n", "1. We only needs peptides without unmodified forms\n", "2. `GlyGly@K` cannot occur on peptide C-term because trypsin cannot cleave Lys with `GlyGly`\n", "3. For some special modifications like `Phospho@S` and `HexNAc@S`, we would like to limit the number of peptidome forms to control the memory usage." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoy
45MACDESTYKXKFGHIK12TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K0;3;9160
33ASDESTYKXKFGHVK12TrueFalseAcetyl@Protein_N-term;GlyGly@K0;8151
40MACDESTYKXKFGHIK12TrueFalseOxidation@M;Carbamidomethyl@C;GlyGly@K1;3;11160
26FGHIKLMNPQRST12FalseTrueGlyGly@K5130
11MACDESTYKXK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3;9110
2ACDESTYKXK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K0;2;8100
32ASDESTYKXKFGHVK12TrueFalseGlyGly@K10151
43MACDESTYKXKFGHIK12TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3;9160
46MACDESTYKXKFGHIK12TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K0;3;11160
30XKFGHIKLMNPQR12FalseFalseGlyGly@K7130
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm \\\n", "45 MACDESTYKXKFGHIK 1 2 True \n", "33 ASDESTYKXKFGHVK 1 2 True \n", "40 MACDESTYKXKFGHIK 1 2 True \n", "26 FGHIKLMNPQRST 1 2 False \n", "11 MACDESTYKXK 1 1 True \n", "2 ACDESTYKXK 1 1 True \n", "32 ASDESTYKXKFGHVK 1 2 True \n", "43 MACDESTYKXKFGHIK 1 2 True \n", "46 MACDESTYKXKFGHIK 1 2 True \n", "30 XKFGHIKLMNPQR 1 2 False \n", "\n", " is_prot_cterm mods \\\n", "45 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n", "33 False Acetyl@Protein_N-term;GlyGly@K \n", "40 False Oxidation@M;Carbamidomethyl@C;GlyGly@K \n", "26 True GlyGly@K \n", "11 False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... \n", "2 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n", "32 False GlyGly@K \n", "43 False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... \n", "46 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n", "30 False GlyGly@K \n", "\n", " mod_sites nAA decoy \n", "45 0;3;9 16 0 \n", "33 0;8 15 1 \n", "40 1;3;11 16 0 \n", "26 5 13 0 \n", "11 0;1;3;9 11 0 \n", "2 0;2;8 10 0 \n", "32 10 15 1 \n", "43 0;1;3;9 16 0 \n", "46 0;3;11 16 0 \n", "30 7 13 0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.special_mods = ['GlyGly@K']\n", "fasta_lib.special_mods_cannot_modify_pep_c_term = True\n", "fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms\n", "fasta_lib.max_special_mod_num = 1 # limit the number of \n", "fasta_lib.add_special_modifications()\n", "fasta_lib.precursor_df.sample(10, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add peptide labeling\n", "\n", "For example Dimethyl:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoylabeling_channel
85XKFGHIKLMNPQR12FalseFalseGlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H...7;0;2;71304
10MACDESTYKXK11TrueFalseCarbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term...3;9;0;9;111100
75FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K0;5;51114
2ACDESTYKXK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly...0;2;8;8;101000
24XLFGHIKLMNPNR12FalseFalseGlyGly@K;Dimethyl@Any_N-term;Dimethyl@K7;0;71310
101MACDESTYKXKFGHIK12TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly...0;3;11;9;11;161604
109MLCDESTYKXKFGHVK12TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly...0;3;11;9;11;161614
7FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim...0;7;5;51100
16MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C;GlyGly...0;3;9;9;111110
91ACDESTYKXKFGHIK12TrueFalseCarbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_...2;10;0;8;10;151504
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm \\\n", "85 XKFGHIKLMNPQR 1 2 False \n", "10 MACDESTYKXK 1 1 True \n", "75 FLHIKLMNPNR 0;1 1 True \n", "2 ACDESTYKXK 1 1 True \n", "24 XLFGHIKLMNPNR 1 2 False \n", "101 MACDESTYKXKFGHIK 1 2 True \n", "109 MLCDESTYKXKFGHVK 1 2 True \n", "7 FGHIKLMNPQR 0;1 1 True \n", "16 MLCDESTYKVK 1 1 True \n", "91 ACDESTYKXKFGHIK 1 2 True \n", "\n", " is_prot_cterm mods \\\n", "85 False GlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H... \n", "10 False Carbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term... \n", "75 True Acetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K \n", "2 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n", "24 False GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K \n", "101 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n", "109 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n", "7 True Acetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim... \n", "16 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n", "91 False Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_... \n", "\n", " mod_sites nAA decoy labeling_channel \n", "85 7;0;2;7 13 0 4 \n", "10 3;9;0;9;11 11 0 0 \n", "75 0;5;5 11 1 4 \n", "2 0;2;8;8;10 10 0 0 \n", "24 7;0;7 13 1 0 \n", "101 0;3;11;9;11;16 16 0 4 \n", "109 0;3;11;9;11;16 16 1 4 \n", "7 0;7;5;5 11 0 0 \n", "16 0;3;9;9;11 11 1 0 \n", "91 2;10;0;8;10;15 15 0 4 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.labeling_channels = {\n", " 0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],\n", " 4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],\n", "}\n", "fasta_lib.add_peptide_labeling()\n", "fasta_lib.precursor_df.sample(10, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add charge states" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoylabeling_channelcharge
122MACDESTYKXKFGHIK12TrueFalseOxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy...1;3;11;0;9;11;1616004
66FLHIKLMNPQRTT12FalseTrueGlyGly@K;Dimethyl@Any_N-term;Dimethyl@K5;0;513102
142MLCDESTYKXKFGHVK12TrueFalseOxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy...1;3;9;0;9;11;1616103
246XKFGHIKLMNPQR12FalseFalseOxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term...9;2;0;2;713042
146MLCDESTYKXKFGHVK12TrueFalseOxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy...1;3;11;0;9;11;1616104
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm \\\n", "122 MACDESTYKXKFGHIK 1 2 True \n", "66 FLHIKLMNPQRTT 1 2 False \n", "142 MLCDESTYKXKFGHVK 1 2 True \n", "246 XKFGHIKLMNPQR 1 2 False \n", "146 MLCDESTYKXKFGHVK 1 2 True \n", "\n", " is_prot_cterm mods \\\n", "122 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n", "66 True GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K \n", "142 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n", "246 False Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term... \n", "146 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n", "\n", " mod_sites nAA decoy labeling_channel charge \n", "122 1;3;11;0;9;11;16 16 0 0 4 \n", "66 5;0;5 13 1 0 2 \n", "142 1;3;9;0;9;11;16 16 1 0 3 \n", "246 9;2;0;2;7 13 0 4 2 \n", "146 1;3;11;0;9;11;16 16 1 0 4 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.add_charge()\n", "fasta_lib.precursor_df.sample(5, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `import_and_process_protein_dict()` combines all steps\n", "\n", "Or `import_and_process_fasta()` for fasta files." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
protein_idfull_namegene_namesequence
0yyyy_yyy_yFGHIKLMNPQR
1xxxx_xxx_xMACDESTYKXKFGHIKLMNPQRST
\n", "
" ], "text/plain": [ " protein_id full_name gene_name sequence\n", "0 yy yy_yy y_y FGHIKLMNPQR\n", "1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.special_mods = []\n", "fasta_lib.labeling_channels = None\n", "fasta_lib.import_and_process_protein_dict(protein_dict)\n", "fasta_lib.protein_df" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoychargeprecursor_mz
0LMNPQRST11FalseTrueOxidation@M2802481.739834
1LMNPQRST11FalseTrue802473.742377
2ACDESTYK10TrueFalseCarbamidomethyl@C2802487.200207
3ACDESTYK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;2802508.205490
4LLNPQRTT11FalseTrue812471.771991
5ASDESTSK10TrueFalse812412.685247
6ASDESTSK10TrueFalseAcetyl@Protein_N-term0812433.690529
7MACDESTYK10TrueFalseOxidation@M;Carbamidomethyl@C1;3902560.717907
8MACDESTYK10TrueFalseCarbamidomethyl@C3902552.720450
9MACDESTYK10TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3902581.723190
10MACDESTYK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;3902573.725732
11MLCDESTSK10TrueFalseOxidation@M;Carbamidomethyl@C1;3912543.725732
12MLCDESTSK10TrueFalseCarbamidomethyl@C3912535.728275
13MLCDESTSK10TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3912564.731015
14MLCDESTSK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;3912556.733557
15ASDESTYKVK11TrueFalse1012564.282586
16ASDESTYKVK11TrueFalseAcetyl@Protein_N-term01012585.287868
17FGHIKLMNPQR0;11TrueTrueOxidation@M71102678.863889
18FGHIKLMNPQR0;11TrueTrueOxidation@M71103452.911685
19FGHIKLMNPQR0;11TrueTrue1102670.866431
20FGHIKLMNPQR0;11TrueTrue1103447.580046
21FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71102699.869171
22FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71103466.915206
23FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term01102691.871714
24FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term01103461.583568
25MLCDESTYKVK11TrueFalseOxidation@M;Carbamidomethyl@C1;31112695.323071
26MLCDESTYKVK11TrueFalseOxidation@M;Carbamidomethyl@C1;31113463.884473
27MLCDESTYKVK11TrueFalseCarbamidomethyl@C31112687.325613
28MLCDESTYKVK11TrueFalseCarbamidomethyl@C31113458.552834
29MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;31112716.328353
30MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;31113477.887994
31MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;31112708.330896
32MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;31113472.556356
33FLHIKLMNPNR0;11TrueTrueOxidation@M71112699.887364
34FLHIKLMNPNR0;11TrueTrueOxidation@M71113466.927335
35FLHIKLMNPNR0;11TrueTrue1112691.889907
36FLHIKLMNPNR0;11TrueTrue1113461.595697
37FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71112720.892646
38FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71113480.930856
39FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term01112712.895189
40FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term01113475.599218
41FLHIKLMNPQRTT12FalseTrueOxidation@M71312807.942867
42FLHIKLMNPQRTT12FalseTrueOxidation@M71313538.964337
43FLHIKLMNPQRTT12FalseTrueOxidation@M71314404.475072
44FLHIKLMNPQRTT12FalseTrue1312799.945410
45FLHIKLMNPQRTT12FalseTrue1313533.632699
46FLHIKLMNPQRTT12FalseTrue1314400.476343
47FGHIKLMNPQRST12FalseTrueOxidation@M71302772.903742
48FGHIKLMNPQRST12FalseTrueOxidation@M71303515.604920
49FGHIKLMNPQRST12FalseTrue1302764.906285
50FGHIKLMNPQRST12FalseTrue1303510.273282
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n", "0 LMNPQRST 1 1 False True \n", "1 LMNPQRST 1 1 False True \n", "2 ACDESTYK 1 0 True False \n", "3 ACDESTYK 1 0 True False \n", "4 LLNPQRTT 1 1 False True \n", "5 ASDESTSK 1 0 True False \n", "6 ASDESTSK 1 0 True False \n", "7 MACDESTYK 1 0 True False \n", "8 MACDESTYK 1 0 True False \n", "9 MACDESTYK 1 0 True False \n", "10 MACDESTYK 1 0 True False \n", "11 MLCDESTSK 1 0 True False \n", "12 MLCDESTSK 1 0 True False \n", "13 MLCDESTSK 1 0 True False \n", "14 MLCDESTSK 1 0 True False \n", "15 ASDESTYKVK 1 1 True False \n", "16 ASDESTYKVK 1 1 True False \n", "17 FGHIKLMNPQR 0;1 1 True True \n", "18 FGHIKLMNPQR 0;1 1 True True \n", "19 FGHIKLMNPQR 0;1 1 True True \n", "20 FGHIKLMNPQR 0;1 1 True True \n", "21 FGHIKLMNPQR 0;1 1 True True \n", "22 FGHIKLMNPQR 0;1 1 True True \n", "23 FGHIKLMNPQR 0;1 1 True True \n", "24 FGHIKLMNPQR 0;1 1 True True \n", "25 MLCDESTYKVK 1 1 True False \n", "26 MLCDESTYKVK 1 1 True False \n", "27 MLCDESTYKVK 1 1 True False \n", "28 MLCDESTYKVK 1 1 True False \n", "29 MLCDESTYKVK 1 1 True False \n", "30 MLCDESTYKVK 1 1 True False \n", "31 MLCDESTYKVK 1 1 True False \n", "32 MLCDESTYKVK 1 1 True False \n", "33 FLHIKLMNPNR 0;1 1 True True \n", "34 FLHIKLMNPNR 0;1 1 True True \n", "35 FLHIKLMNPNR 0;1 1 True True \n", "36 FLHIKLMNPNR 0;1 1 True True \n", "37 FLHIKLMNPNR 0;1 1 True True \n", "38 FLHIKLMNPNR 0;1 1 True True \n", "39 FLHIKLMNPNR 0;1 1 True True \n", "40 FLHIKLMNPNR 0;1 1 True True \n", "41 FLHIKLMNPQRTT 1 2 False True \n", "42 FLHIKLMNPQRTT 1 2 False True \n", "43 FLHIKLMNPQRTT 1 2 False True \n", "44 FLHIKLMNPQRTT 1 2 False True \n", "45 FLHIKLMNPQRTT 1 2 False True \n", "46 FLHIKLMNPQRTT 1 2 False True \n", "47 FGHIKLMNPQRST 1 2 False True \n", "48 FGHIKLMNPQRST 1 2 False True \n", "49 FGHIKLMNPQRST 1 2 False True \n", "50 FGHIKLMNPQRST 1 2 False True \n", "\n", " mods mod_sites nAA decoy \\\n", "0 Oxidation@M 2 8 0 \n", "1 8 0 \n", "2 Carbamidomethyl@C 2 8 0 \n", "3 Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 \n", "4 8 1 \n", "5 8 1 \n", "6 Acetyl@Protein_N-term 0 8 1 \n", "7 Oxidation@M;Carbamidomethyl@C 1;3 9 0 \n", "8 Carbamidomethyl@C 3 9 0 \n", "9 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 \n", "10 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 \n", "11 Oxidation@M;Carbamidomethyl@C 1;3 9 1 \n", "12 Carbamidomethyl@C 3 9 1 \n", "13 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 \n", "14 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 \n", "15 10 1 \n", "16 Acetyl@Protein_N-term 0 10 1 \n", "17 Oxidation@M 7 11 0 \n", "18 Oxidation@M 7 11 0 \n", "19 11 0 \n", "20 11 0 \n", "21 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n", "22 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n", "23 Acetyl@Protein_N-term 0 11 0 \n", "24 Acetyl@Protein_N-term 0 11 0 \n", "25 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n", "26 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n", "27 Carbamidomethyl@C 3 11 1 \n", "28 Carbamidomethyl@C 3 11 1 \n", "29 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n", "30 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n", "31 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n", "32 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n", "33 Oxidation@M 7 11 1 \n", "34 Oxidation@M 7 11 1 \n", "35 11 1 \n", "36 11 1 \n", "37 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n", "38 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n", "39 Acetyl@Protein_N-term 0 11 1 \n", "40 Acetyl@Protein_N-term 0 11 1 \n", "41 Oxidation@M 7 13 1 \n", "42 Oxidation@M 7 13 1 \n", "43 Oxidation@M 7 13 1 \n", "44 13 1 \n", "45 13 1 \n", "46 13 1 \n", "47 Oxidation@M 7 13 0 \n", "48 Oxidation@M 7 13 0 \n", "49 13 0 \n", "50 13 0 \n", "\n", " charge precursor_mz \n", "0 2 481.739834 \n", "1 2 473.742377 \n", "2 2 487.200207 \n", "3 2 508.205490 \n", "4 2 471.771991 \n", "5 2 412.685247 \n", "6 2 433.690529 \n", "7 2 560.717907 \n", "8 2 552.720450 \n", "9 2 581.723190 \n", "10 2 573.725732 \n", "11 2 543.725732 \n", "12 2 535.728275 \n", "13 2 564.731015 \n", "14 2 556.733557 \n", "15 2 564.282586 \n", "16 2 585.287868 \n", "17 2 678.863889 \n", "18 3 452.911685 \n", "19 2 670.866431 \n", "20 3 447.580046 \n", "21 2 699.869171 \n", "22 3 466.915206 \n", "23 2 691.871714 \n", "24 3 461.583568 \n", "25 2 695.323071 \n", "26 3 463.884473 \n", "27 2 687.325613 \n", "28 3 458.552834 \n", "29 2 716.328353 \n", "30 3 477.887994 \n", "31 2 708.330896 \n", "32 3 472.556356 \n", "33 2 699.887364 \n", "34 3 466.927335 \n", "35 2 691.889907 \n", "36 3 461.595697 \n", "37 2 720.892646 \n", "38 3 480.930856 \n", "39 2 712.895189 \n", "40 3 475.599218 \n", "41 2 807.942867 \n", "42 3 538.964337 \n", "43 4 404.475072 \n", "44 2 799.945410 \n", "45 3 533.632699 \n", "46 4 400.476343 \n", "47 2 772.903742 \n", "48 3 515.604920 \n", "49 2 764.906285 \n", "50 3 510.273282 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.precursor_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start from peptides instead of proteins\n", "\n", "The modularity design of `SpecLibFasta` allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.\n", "\n", "For example, we have a list of sequences, and we what to add modifications using `SpecLibFasta` functionalities:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequencenAAis_prot_ntermis_prot_ctermmodsmod_siteslabeling_channelchargeprecursor_mz
0OPQRST6FalseFalseDimethyl@Any_N-term002427.248152
1HIJKLMN7FalseFalseOxidation@M;Dimethyl@Any_N-term;Dimethyl@K6;0;402470.786056
2HIJKLMN7FalseFalseDimethyl@Any_N-term;Dimethyl@K0;402462.788599
3OPQRST6FalseFalseDimethyl:2H(4)@Any_N-term042429.260705
4HIJKLMN7FalseFalseOxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl...6;0;442474.811163
5HIJKLMN7FalseFalseDimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K0;442466.813706
\n", "
" ], "text/plain": [ " sequence nAA is_prot_nterm is_prot_cterm \\\n", "0 OPQRST 6 False False \n", "1 HIJKLMN 7 False False \n", "2 HIJKLMN 7 False False \n", "3 OPQRST 6 False False \n", "4 HIJKLMN 7 False False \n", "5 HIJKLMN 7 False False \n", "\n", " mods mod_sites \\\n", "0 Dimethyl@Any_N-term 0 \n", "1 Oxidation@M;Dimethyl@Any_N-term;Dimethyl@K 6;0;4 \n", "2 Dimethyl@Any_N-term;Dimethyl@K 0;4 \n", "3 Dimethyl:2H(4)@Any_N-term 0 \n", "4 Oxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl... 6;0;4 \n", "5 Dimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K 0;4 \n", "\n", " labeling_channel charge precursor_mz \n", "0 0 2 427.248152 \n", "1 0 2 470.786056 \n", "2 0 2 462.788599 \n", "3 4 2 429.260705 \n", "4 4 2 474.811163 \n", "5 4 2 466.813706 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pep_lib = SpecLibFasta(\n", " charged_frag_types=['b_z1','y_z1'],\n", " fix_mods=['Carbamidomethyl@C'],\n", " var_mods=['Acetyl@Protein_N-term','Oxidation@M'],\n", " labeling_channels={\n", " 0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],\n", " 4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],\n", " },\n", " decoy=None,\n", ")\n", "\n", "pep_lib.precursor_df = pd.DataFrame({\n", " 'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']\n", "})\n", "pep_lib.process_from_naked_peptide_seqs()\n", "pep_lib.precursor_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate masses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculate precursor m/z" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sequenceprotein_idxesmiss_cleavageis_prot_ntermis_prot_ctermmodsmod_sitesnAAdecoychargeprecursor_mz
0LMNPQRST11FalseTrueOxidation@M2802481.739834
1LMNPQRST11FalseTrue802473.742377
2ACDESTYK10TrueFalseCarbamidomethyl@C2802487.200207
3ACDESTYK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;2802508.205490
4LLNPQRTT11FalseTrue812471.771991
5ASDESTSK10TrueFalse812412.685247
6ASDESTSK10TrueFalseAcetyl@Protein_N-term0812433.690529
7MACDESTYK10TrueFalseOxidation@M;Carbamidomethyl@C1;3902560.717907
8MACDESTYK10TrueFalseCarbamidomethyl@C3902552.720450
9MACDESTYK10TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3902581.723190
10MACDESTYK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;3902573.725732
11MLCDESTSK10TrueFalseOxidation@M;Carbamidomethyl@C1;3912543.725732
12MLCDESTSK10TrueFalseCarbamidomethyl@C3912535.728275
13MLCDESTSK10TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;3912564.731015
14MLCDESTSK10TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;3912556.733557
15ASDESTYKVK11TrueFalse1012564.282586
16ASDESTYKVK11TrueFalseAcetyl@Protein_N-term01012585.287868
17FGHIKLMNPQR0;11TrueTrueOxidation@M71102678.863889
18FGHIKLMNPQR0;11TrueTrueOxidation@M71103452.911685
19FGHIKLMNPQR0;11TrueTrue1102670.866431
20FGHIKLMNPQR0;11TrueTrue1103447.580046
21FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71102699.869171
22FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71103466.915206
23FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term01102691.871714
24FGHIKLMNPQR0;11TrueTrueAcetyl@Protein_N-term01103461.583568
25MLCDESTYKVK11TrueFalseOxidation@M;Carbamidomethyl@C1;31112695.323071
26MLCDESTYKVK11TrueFalseOxidation@M;Carbamidomethyl@C1;31113463.884473
27MLCDESTYKVK11TrueFalseCarbamidomethyl@C31112687.325613
28MLCDESTYKVK11TrueFalseCarbamidomethyl@C31113458.552834
29MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;31112716.328353
30MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Oxidation@M;Carbamidomet...0;1;31113477.887994
31MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;31112708.330896
32MLCDESTYKVK11TrueFalseAcetyl@Protein_N-term;Carbamidomethyl@C0;31113472.556356
33FLHIKLMNPNR0;11TrueTrueOxidation@M71112699.887364
34FLHIKLMNPNR0;11TrueTrueOxidation@M71113466.927335
35FLHIKLMNPNR0;11TrueTrue1112691.889907
36FLHIKLMNPNR0;11TrueTrue1113461.595697
37FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71112720.892646
38FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term;Oxidation@M0;71113480.930856
39FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term01112712.895189
40FLHIKLMNPNR0;11TrueTrueAcetyl@Protein_N-term01113475.599218
41FLHIKLMNPQRTT12FalseTrueOxidation@M71312807.942867
42FLHIKLMNPQRTT12FalseTrueOxidation@M71313538.964337
43FLHIKLMNPQRTT12FalseTrueOxidation@M71314404.475072
44FLHIKLMNPQRTT12FalseTrue1312799.945410
45FLHIKLMNPQRTT12FalseTrue1313533.632699
46FLHIKLMNPQRTT12FalseTrue1314400.476343
47FGHIKLMNPQRST12FalseTrueOxidation@M71302772.903742
48FGHIKLMNPQRST12FalseTrueOxidation@M71303515.604920
49FGHIKLMNPQRST12FalseTrue1302764.906285
50FGHIKLMNPQRST12FalseTrue1303510.273282
\n", "
" ], "text/plain": [ " sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n", "0 LMNPQRST 1 1 False True \n", "1 LMNPQRST 1 1 False True \n", "2 ACDESTYK 1 0 True False \n", "3 ACDESTYK 1 0 True False \n", "4 LLNPQRTT 1 1 False True \n", "5 ASDESTSK 1 0 True False \n", "6 ASDESTSK 1 0 True False \n", "7 MACDESTYK 1 0 True False \n", "8 MACDESTYK 1 0 True False \n", "9 MACDESTYK 1 0 True False \n", "10 MACDESTYK 1 0 True False \n", "11 MLCDESTSK 1 0 True False \n", "12 MLCDESTSK 1 0 True False \n", "13 MLCDESTSK 1 0 True False \n", "14 MLCDESTSK 1 0 True False \n", "15 ASDESTYKVK 1 1 True False \n", "16 ASDESTYKVK 1 1 True False \n", "17 FGHIKLMNPQR 0;1 1 True True \n", "18 FGHIKLMNPQR 0;1 1 True True \n", "19 FGHIKLMNPQR 0;1 1 True True \n", "20 FGHIKLMNPQR 0;1 1 True True \n", "21 FGHIKLMNPQR 0;1 1 True True \n", "22 FGHIKLMNPQR 0;1 1 True True \n", "23 FGHIKLMNPQR 0;1 1 True True \n", "24 FGHIKLMNPQR 0;1 1 True True \n", "25 MLCDESTYKVK 1 1 True False \n", "26 MLCDESTYKVK 1 1 True False \n", "27 MLCDESTYKVK 1 1 True False \n", "28 MLCDESTYKVK 1 1 True False \n", "29 MLCDESTYKVK 1 1 True False \n", "30 MLCDESTYKVK 1 1 True False \n", "31 MLCDESTYKVK 1 1 True False \n", "32 MLCDESTYKVK 1 1 True False \n", "33 FLHIKLMNPNR 0;1 1 True True \n", "34 FLHIKLMNPNR 0;1 1 True True \n", "35 FLHIKLMNPNR 0;1 1 True True \n", "36 FLHIKLMNPNR 0;1 1 True True \n", "37 FLHIKLMNPNR 0;1 1 True True \n", "38 FLHIKLMNPNR 0;1 1 True True \n", "39 FLHIKLMNPNR 0;1 1 True True \n", "40 FLHIKLMNPNR 0;1 1 True True \n", "41 FLHIKLMNPQRTT 1 2 False True \n", "42 FLHIKLMNPQRTT 1 2 False True \n", "43 FLHIKLMNPQRTT 1 2 False True \n", "44 FLHIKLMNPQRTT 1 2 False True \n", "45 FLHIKLMNPQRTT 1 2 False True \n", "46 FLHIKLMNPQRTT 1 2 False True \n", "47 FGHIKLMNPQRST 1 2 False True \n", "48 FGHIKLMNPQRST 1 2 False True \n", "49 FGHIKLMNPQRST 1 2 False True \n", "50 FGHIKLMNPQRST 1 2 False True \n", "\n", " mods mod_sites nAA decoy \\\n", "0 Oxidation@M 2 8 0 \n", "1 8 0 \n", "2 Carbamidomethyl@C 2 8 0 \n", "3 Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 \n", "4 8 1 \n", "5 8 1 \n", "6 Acetyl@Protein_N-term 0 8 1 \n", "7 Oxidation@M;Carbamidomethyl@C 1;3 9 0 \n", "8 Carbamidomethyl@C 3 9 0 \n", "9 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 \n", "10 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 \n", "11 Oxidation@M;Carbamidomethyl@C 1;3 9 1 \n", "12 Carbamidomethyl@C 3 9 1 \n", "13 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 \n", "14 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 \n", "15 10 1 \n", "16 Acetyl@Protein_N-term 0 10 1 \n", "17 Oxidation@M 7 11 0 \n", "18 Oxidation@M 7 11 0 \n", "19 11 0 \n", "20 11 0 \n", "21 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n", "22 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n", "23 Acetyl@Protein_N-term 0 11 0 \n", "24 Acetyl@Protein_N-term 0 11 0 \n", "25 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n", "26 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n", "27 Carbamidomethyl@C 3 11 1 \n", "28 Carbamidomethyl@C 3 11 1 \n", "29 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n", "30 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n", "31 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n", "32 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n", "33 Oxidation@M 7 11 1 \n", "34 Oxidation@M 7 11 1 \n", "35 11 1 \n", "36 11 1 \n", "37 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n", "38 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n", "39 Acetyl@Protein_N-term 0 11 1 \n", "40 Acetyl@Protein_N-term 0 11 1 \n", "41 Oxidation@M 7 13 1 \n", "42 Oxidation@M 7 13 1 \n", "43 Oxidation@M 7 13 1 \n", "44 13 1 \n", "45 13 1 \n", "46 13 1 \n", "47 Oxidation@M 7 13 0 \n", "48 Oxidation@M 7 13 0 \n", "49 13 0 \n", "50 13 0 \n", "\n", " charge precursor_mz \n", "0 2 481.739834 \n", "1 2 473.742377 \n", "2 2 487.200207 \n", "3 2 508.205490 \n", "4 2 471.771991 \n", "5 2 412.685247 \n", "6 2 433.690529 \n", "7 2 560.717907 \n", "8 2 552.720450 \n", "9 2 581.723190 \n", "10 2 573.725732 \n", "11 2 543.725732 \n", "12 2 535.728275 \n", "13 2 564.731015 \n", "14 2 556.733557 \n", "15 2 564.282586 \n", "16 2 585.287868 \n", "17 2 678.863889 \n", "18 3 452.911685 \n", "19 2 670.866431 \n", "20 3 447.580046 \n", "21 2 699.869171 \n", "22 3 466.915206 \n", "23 2 691.871714 \n", "24 3 461.583568 \n", "25 2 695.323071 \n", "26 3 463.884473 \n", "27 2 687.325613 \n", "28 3 458.552834 \n", "29 2 716.328353 \n", "30 3 477.887994 \n", "31 2 708.330896 \n", "32 3 472.556356 \n", "33 2 699.887364 \n", "34 3 466.927335 \n", "35 2 691.889907 \n", "36 3 461.595697 \n", "37 2 720.892646 \n", "38 3 480.930856 \n", "39 2 712.895189 \n", "40 3 475.599218 \n", "41 2 807.942867 \n", "42 3 538.964337 \n", "43 4 404.475072 \n", "44 2 799.945410 \n", "45 3 533.632699 \n", "46 4 400.476343 \n", "47 2 772.903742 \n", "48 3 515.604920 \n", "49 2 764.906285 \n", "50 3 510.273282 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.calc_precursor_mz()\n", "# fasta_lib.calc_precursor_isotope()\n", "fasta_lib.precursor_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After `calc_precursor_mz()`, all sequences containing `x` are removed because `x`'s mass is very large which is out of the range of `fasta_lib.min_precursor_mz` and `fasta_lib.max_precursor_mz`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(400.0, 2000.0, \"mass of 'x' is 100000000.0\")" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from alphabase.constants.aa import AA_ASCII_MASS\n", "(\n", " fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz, \n", " f\"mass of 'x' is {AA_ASCII_MASS[ord('x')]}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculate fragment m/z" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_z1y_z1
0114.091339849.388306
1261.126740702.352905
2375.169678588.309998
3472.222443491.257233
4600.281006363.198669
.........
486941.502563588.309998
4871038.555298491.257233
4881166.613892363.198669
4891322.714966207.097549
4901409.747070120.065521
\n", "

491 rows × 2 columns

\n", "
" ], "text/plain": [ " b_z1 y_z1\n", "0 114.091339 849.388306\n", "1 261.126740 702.352905\n", "2 375.169678 588.309998\n", "3 472.222443 491.257233\n", "4 600.281006 363.198669\n", ".. ... ...\n", "486 941.502563 588.309998\n", "487 1038.555298 491.257233\n", "488 1166.613892 363.198669\n", "489 1322.714966 207.097549\n", "490 1409.747070 120.065521\n", "\n", "[491 rows x 2 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fasta_lib.calc_fragment_mz_df()\n", "fasta_lib.fragment_mz_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `frag_start_idx` and `frag_stop_idx` in precursor_df to locate the corresponding fragments" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_z1y_z1
3572.044388753.326111
36159.076416666.294067
37274.103363551.267151
38403.145966422.224548
39490.177979335.192505
40591.225647234.144836
41678.257690147.112808
\n", "
" ], "text/plain": [ " b_z1 y_z1\n", "35 72.044388 753.326111\n", "36 159.076416 666.294067\n", "37 274.103363 551.267151\n", "38 403.145966 422.224548\n", "39 490.177979 335.192505\n", "40 591.225647 234.144836\n", "41 678.257690 147.112808" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ith_pep = 5\n", "frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values\n", "fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.3 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "8a3b27e141e49c996c9b863f8707e97aabd49c4a7e8445b9b783b34e4a21a9b2" } } }, "nbformat": 4, "nbformat_minor": 2 }