{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tutorial: Spectral Libraries\n",
"\n",
"This notebook introduces functionalities for spectral libraries to developers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Base Library Class\n",
"\n",
"`alphabase.spectral_library.base.SpecLibBase` is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via `alphabase.protein.fasta.SpecLibFasta`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `SpecLibFasta`\n",
"\n",
"Almost all DataFrame functionalities to process proteins and peptides have been integrated into `alphabase.protein.fasta.SpecLibFasta`. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n"
]
}
],
"source": [
"from alphabase.protein.fasta import SpecLibFasta\n",
"\n",
"fasta_lib = SpecLibFasta(\n",
" charged_frag_types=['b_z1','y_z1'],\n",
" protease='trypsin',\n",
" fix_mods=['Carbamidomethyl@C'],\n",
" var_mods=['Acetyl@Protein_N-term','Oxidation@M'],\n",
" decoy=None,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start from fasta/proteins\n",
"\n",
"The SpecLibFasta will do following for us:\n",
"\n",
"- Load fasta files into a protein_dict\n",
"- Digest proteins into peptide sequences\n",
"- Append decoy peptide sequences if self.decoy is not None\n",
"- Add fixed and variable modifications\n",
"- [Add special modifications]\n",
"- [Add peptide labeling]\n",
"- Add charge states to peptides"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load fasta files into a protein_dict"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# from alphabase.protein.fasta import load_all_proteins\n",
"# protein_dict = load_all_proteins(fasta_files)\n",
"\n",
"# For example, the protein_dict is:\n",
"protein_dict = {\n",
" 'yy': {\n",
" 'protein_id': 'yy',\n",
" 'full_name': 'yy_yy',\n",
" 'gene_name': 'y_y',\n",
" 'sequence': 'FGHIKLMNPQR'\n",
" },\n",
" 'xx': {\n",
" 'protein_id': 'xx',\n",
" 'full_name': 'xx_xx',\n",
" 'gene_name': 'x_x',\n",
" 'sequence': 'MACDESTYKXKFGHIKLMNPQRST'\n",
" },\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Digest proteins into peptide sequences"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" XKFGHIK | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" False | \n",
" | \n",
" | \n",
" 7 | \n",
"
\n",
" \n",
" | 1 | \n",
" LMNPQRST | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 8 | \n",
"
\n",
" \n",
" | 2 | \n",
" ACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 8 | \n",
"
\n",
" \n",
" | 3 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 9 | \n",
"
\n",
" \n",
" | 4 | \n",
" ACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 10 | \n",
"
\n",
" \n",
" | 5 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
"
\n",
" \n",
" | 6 | \n",
" MACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 11 | \n",
"
\n",
" \n",
" | 7 | \n",
" XKFGHIKLMNPQR | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" False | \n",
" | \n",
" | \n",
" 13 | \n",
"
\n",
" \n",
" | 8 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
"
\n",
" \n",
" | 9 | \n",
" ACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 15 | \n",
"
\n",
" \n",
" | 10 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 16 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm \\\n",
"0 XKFGHIK 1 1 False \n",
"1 LMNPQRST 1 1 False \n",
"2 ACDESTYK 1 0 True \n",
"3 MACDESTYK 1 0 True \n",
"4 ACDESTYKXK 1 1 True \n",
"5 FGHIKLMNPQR 0;1 1 True \n",
"6 MACDESTYKXK 1 1 True \n",
"7 XKFGHIKLMNPQR 1 2 False \n",
"8 FGHIKLMNPQRST 1 2 False \n",
"9 ACDESTYKXKFGHIK 1 2 True \n",
"10 MACDESTYKXKFGHIK 1 2 True \n",
"\n",
" is_prot_cterm mods mod_sites nAA \n",
"0 False 7 \n",
"1 True 8 \n",
"2 False 8 \n",
"3 False 9 \n",
"4 False 10 \n",
"5 True 11 \n",
"6 False 11 \n",
"7 False 13 \n",
"8 True 13 \n",
"9 False 15 \n",
"10 False 16 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.get_peptides_from_protein_dict(protein_dict)\n",
"fasta_lib.precursor_df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" protein_id | \n",
" full_name | \n",
" gene_name | \n",
" sequence | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" yy | \n",
" yy_yy | \n",
" y_y | \n",
" FGHIKLMNPQR | \n",
"
\n",
" \n",
" | 1 | \n",
" xx | \n",
" xx_xx | \n",
" x_x | \n",
" MACDESTYKXKFGHIKLMNPQRST | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" protein_id full_name gene_name sequence\n",
"0 yy yy_yy y_y FGHIKLMNPQR\n",
"1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.protein_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Append decoy sequences\n",
"\n",
"This depends on self.decoy:str, its value can be \n",
"\n",
"- `protein_reverse`: Reverse on target protein sequences\n",
"- `pseudo_reverse`: Pseudo-reverse on target peptide sequences\n",
"- `diann`: DiaNN-like decoy\n",
"- None: no decoy. \n",
" \n",
"Let's take `diann` as an example:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
"
\n",
" \n",
" \n",
" \n",
" | 20 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 16 | \n",
" 0 | \n",
"
\n",
" \n",
" | 10 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 0 | \n",
"
\n",
" \n",
" | 14 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
"
\n",
" \n",
" | 13 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
"
\n",
" \n",
" | 1 | \n",
" XLFGHVK | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" False | \n",
" | \n",
" | \n",
" 7 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm \\\n",
"20 MACDESTYKXKFGHIK 1 2 True \n",
"10 FGHIKLMNPQR 0;1 1 True \n",
"14 FLHIKLMNPQRTT 1 2 False \n",
"13 FLHIKLMNPNR 0;1 1 True \n",
"1 XLFGHVK 1 1 False \n",
"\n",
" is_prot_cterm mods mod_sites nAA decoy \n",
"20 False 16 0 \n",
"10 True 11 0 \n",
"14 True 13 1 \n",
"13 True 11 1 \n",
"1 False 7 1 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.decoy = 'diann'\n",
"fasta_lib.append_decoy_sequence()\n",
"fasta_lib.precursor_df.sample(5, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add modifications\n",
"\n",
"`add_modifications()` will add fixed and variable modifications. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
"
\n",
" \n",
" \n",
" \n",
" | 35 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 1 | \n",
"
\n",
" \n",
" | 34 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
"
\n",
" \n",
" | 41 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
"
\n",
" \n",
" | 27 | \n",
" MACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 11 | \n",
" 0 | \n",
"
\n",
" \n",
" | 11 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 9 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n",
"35 FLHIKLMNPNR 0;1 1 True True \n",
"34 FLHIKLMNPNR 0;1 1 True True \n",
"41 FGHIKLMNPQRST 1 2 False True \n",
"27 MACDESTYKXK 1 1 True False \n",
"11 MACDESTYK 1 0 True False \n",
"\n",
" mods mod_sites nAA decoy \n",
"35 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n",
"34 11 1 \n",
"41 Oxidation@M 7 13 0 \n",
"27 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 0 \n",
"11 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.add_modifications()\n",
"fasta_lib.precursor_df.sample(5, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add special modifications\n",
"\n",
"Special modifications here refer to some PTMs we want to have more controls on:\n",
"\n",
"1. We only needs peptides without unmodified forms\n",
"2. `GlyGly@K` cannot occur on peptide C-term because trypsin cannot cleave Lys with `GlyGly`\n",
"3. For some special modifications like `Phospho@S` and `HexNAc@S`, we would like to limit the number of peptidome forms to control the memory usage."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
"
\n",
" \n",
" \n",
" \n",
" | 45 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | \n",
" 0;3;9 | \n",
" 16 | \n",
" 0 | \n",
"
\n",
" \n",
" | 33 | \n",
" ASDESTYKXKFGHVK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;GlyGly@K | \n",
" 0;8 | \n",
" 15 | \n",
" 1 | \n",
"
\n",
" \n",
" | 40 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C;GlyGly@K | \n",
" 1;3;11 | \n",
" 16 | \n",
" 0 | \n",
"
\n",
" \n",
" | 26 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" GlyGly@K | \n",
" 5 | \n",
" 13 | \n",
" 0 | \n",
"
\n",
" \n",
" | 11 | \n",
" MACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3;9 | \n",
" 11 | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" ACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | \n",
" 0;2;8 | \n",
" 10 | \n",
" 0 | \n",
"
\n",
" \n",
" | 32 | \n",
" ASDESTYKXKFGHVK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" GlyGly@K | \n",
" 10 | \n",
" 15 | \n",
" 1 | \n",
"
\n",
" \n",
" | 43 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3;9 | \n",
" 16 | \n",
" 0 | \n",
"
\n",
" \n",
" | 46 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | \n",
" 0;3;11 | \n",
" 16 | \n",
" 0 | \n",
"
\n",
" \n",
" | 30 | \n",
" XKFGHIKLMNPQR | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" False | \n",
" GlyGly@K | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm \\\n",
"45 MACDESTYKXKFGHIK 1 2 True \n",
"33 ASDESTYKXKFGHVK 1 2 True \n",
"40 MACDESTYKXKFGHIK 1 2 True \n",
"26 FGHIKLMNPQRST 1 2 False \n",
"11 MACDESTYKXK 1 1 True \n",
"2 ACDESTYKXK 1 1 True \n",
"32 ASDESTYKXKFGHVK 1 2 True \n",
"43 MACDESTYKXKFGHIK 1 2 True \n",
"46 MACDESTYKXKFGHIK 1 2 True \n",
"30 XKFGHIKLMNPQR 1 2 False \n",
"\n",
" is_prot_cterm mods \\\n",
"45 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n",
"33 False Acetyl@Protein_N-term;GlyGly@K \n",
"40 False Oxidation@M;Carbamidomethyl@C;GlyGly@K \n",
"26 True GlyGly@K \n",
"11 False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... \n",
"2 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n",
"32 False GlyGly@K \n",
"43 False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... \n",
"46 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K \n",
"30 False GlyGly@K \n",
"\n",
" mod_sites nAA decoy \n",
"45 0;3;9 16 0 \n",
"33 0;8 15 1 \n",
"40 1;3;11 16 0 \n",
"26 5 13 0 \n",
"11 0;1;3;9 11 0 \n",
"2 0;2;8 10 0 \n",
"32 10 15 1 \n",
"43 0;1;3;9 16 0 \n",
"46 0;3;11 16 0 \n",
"30 7 13 0 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.special_mods = ['GlyGly@K']\n",
"fasta_lib.special_mods_cannot_modify_pep_c_term = True\n",
"fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms\n",
"fasta_lib.max_special_mod_num = 1 # limit the number of \n",
"fasta_lib.add_special_modifications()\n",
"fasta_lib.precursor_df.sample(10, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add peptide labeling\n",
"\n",
"For example Dimethyl:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
" labeling_channel | \n",
"
\n",
" \n",
" \n",
" \n",
" | 85 | \n",
" XKFGHIKLMNPQR | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" False | \n",
" GlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H... | \n",
" 7;0;2;7 | \n",
" 13 | \n",
" 0 | \n",
" 4 | \n",
"
\n",
" \n",
" | 10 | \n",
" MACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term... | \n",
" 3;9;0;9;11 | \n",
" 11 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 75 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K | \n",
" 0;5;5 | \n",
" 11 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" | 2 | \n",
" ACDESTYKXK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | \n",
" 0;2;8;8;10 | \n",
" 10 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 24 | \n",
" XLFGHIKLMNPNR | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" False | \n",
" GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K | \n",
" 7;0;7 | \n",
" 13 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 101 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | \n",
" 0;3;11;9;11;16 | \n",
" 16 | \n",
" 0 | \n",
" 4 | \n",
"
\n",
" \n",
" | 109 | \n",
" MLCDESTYKXKFGHVK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | \n",
" 0;3;11;9;11;16 | \n",
" 16 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" | 7 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim... | \n",
" 0;7;5;5 | \n",
" 11 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 16 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | \n",
" 0;3;9;9;11 | \n",
" 11 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 91 | \n",
" ACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_... | \n",
" 2;10;0;8;10;15 | \n",
" 15 | \n",
" 0 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm \\\n",
"85 XKFGHIKLMNPQR 1 2 False \n",
"10 MACDESTYKXK 1 1 True \n",
"75 FLHIKLMNPNR 0;1 1 True \n",
"2 ACDESTYKXK 1 1 True \n",
"24 XLFGHIKLMNPNR 1 2 False \n",
"101 MACDESTYKXKFGHIK 1 2 True \n",
"109 MLCDESTYKXKFGHVK 1 2 True \n",
"7 FGHIKLMNPQR 0;1 1 True \n",
"16 MLCDESTYKVK 1 1 True \n",
"91 ACDESTYKXKFGHIK 1 2 True \n",
"\n",
" is_prot_cterm mods \\\n",
"85 False GlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H... \n",
"10 False Carbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term... \n",
"75 True Acetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K \n",
"2 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n",
"24 False GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K \n",
"101 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n",
"109 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n",
"7 True Acetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim... \n",
"16 False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... \n",
"91 False Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_... \n",
"\n",
" mod_sites nAA decoy labeling_channel \n",
"85 7;0;2;7 13 0 4 \n",
"10 3;9;0;9;11 11 0 0 \n",
"75 0;5;5 11 1 4 \n",
"2 0;2;8;8;10 10 0 0 \n",
"24 7;0;7 13 1 0 \n",
"101 0;3;11;9;11;16 16 0 4 \n",
"109 0;3;11;9;11;16 16 1 4 \n",
"7 0;7;5;5 11 0 0 \n",
"16 0;3;9;9;11 11 1 0 \n",
"91 2;10;0;8;10;15 15 0 4 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.labeling_channels = {\n",
" 0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],\n",
" 4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],\n",
"}\n",
"fasta_lib.add_peptide_labeling()\n",
"fasta_lib.precursor_df.sample(10, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add charge states"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
" labeling_channel | \n",
" charge | \n",
"
\n",
" \n",
" \n",
" \n",
" | 122 | \n",
" MACDESTYKXKFGHIK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | \n",
" 1;3;11;0;9;11;16 | \n",
" 16 | \n",
" 0 | \n",
" 0 | \n",
" 4 | \n",
"
\n",
" \n",
" | 66 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K | \n",
" 5;0;5 | \n",
" 13 | \n",
" 1 | \n",
" 0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 142 | \n",
" MLCDESTYKXKFGHVK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | \n",
" 1;3;9;0;9;11;16 | \n",
" 16 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
"
\n",
" \n",
" | 246 | \n",
" XKFGHIKLMNPQR | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" False | \n",
" Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term... | \n",
" 9;2;0;2;7 | \n",
" 13 | \n",
" 0 | \n",
" 4 | \n",
" 2 | \n",
"
\n",
" \n",
" | 146 | \n",
" MLCDESTYKXKFGHVK | \n",
" 1 | \n",
" 2 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | \n",
" 1;3;11;0;9;11;16 | \n",
" 16 | \n",
" 1 | \n",
" 0 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm \\\n",
"122 MACDESTYKXKFGHIK 1 2 True \n",
"66 FLHIKLMNPQRTT 1 2 False \n",
"142 MLCDESTYKXKFGHVK 1 2 True \n",
"246 XKFGHIKLMNPQR 1 2 False \n",
"146 MLCDESTYKXKFGHVK 1 2 True \n",
"\n",
" is_prot_cterm mods \\\n",
"122 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n",
"66 True GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K \n",
"142 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n",
"246 False Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term... \n",
"146 False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... \n",
"\n",
" mod_sites nAA decoy labeling_channel charge \n",
"122 1;3;11;0;9;11;16 16 0 0 4 \n",
"66 5;0;5 13 1 0 2 \n",
"142 1;3;9;0;9;11;16 16 1 0 3 \n",
"246 9;2;0;2;7 13 0 4 2 \n",
"146 1;3;11;0;9;11;16 16 1 0 4 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.add_charge()\n",
"fasta_lib.precursor_df.sample(5, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `import_and_process_protein_dict()` combines all steps\n",
"\n",
"Or `import_and_process_fasta()` for fasta files."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" protein_id | \n",
" full_name | \n",
" gene_name | \n",
" sequence | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" yy | \n",
" yy_yy | \n",
" y_y | \n",
" FGHIKLMNPQR | \n",
"
\n",
" \n",
" | 1 | \n",
" xx | \n",
" xx_xx | \n",
" x_x | \n",
" MACDESTYKXKFGHIKLMNPQRST | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" protein_id full_name gene_name sequence\n",
"0 yy yy_yy y_y FGHIKLMNPQR\n",
"1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.special_mods = []\n",
"fasta_lib.labeling_channels = None\n",
"fasta_lib.import_and_process_protein_dict(protein_dict)\n",
"fasta_lib.protein_df"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
" charge | \n",
" precursor_mz | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LMNPQRST | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 481.739834 | \n",
"
\n",
" \n",
" | 1 | \n",
" LMNPQRST | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 473.742377 | \n",
"
\n",
" \n",
" | 2 | \n",
" ACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 487.200207 | \n",
"
\n",
" \n",
" | 3 | \n",
" ACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 508.205490 | \n",
"
\n",
" \n",
" | 4 | \n",
" LLNPQRTT | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 471.771991 | \n",
"
\n",
" \n",
" | 5 | \n",
" ASDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 412.685247 | \n",
"
\n",
" \n",
" | 6 | \n",
" ASDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 433.690529 | \n",
"
\n",
" \n",
" | 7 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 560.717907 | \n",
"
\n",
" \n",
" | 8 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 552.720450 | \n",
"
\n",
" \n",
" | 9 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 581.723190 | \n",
"
\n",
" \n",
" | 10 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 573.725732 | \n",
"
\n",
" \n",
" | 11 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 543.725732 | \n",
"
\n",
" \n",
" | 12 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 535.728275 | \n",
"
\n",
" \n",
" | 13 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 564.731015 | \n",
"
\n",
" \n",
" | 14 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 556.733557 | \n",
"
\n",
" \n",
" | 15 | \n",
" ASDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 10 | \n",
" 1 | \n",
" 2 | \n",
" 564.282586 | \n",
"
\n",
" \n",
" | 16 | \n",
" ASDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 10 | \n",
" 1 | \n",
" 2 | \n",
" 585.287868 | \n",
"
\n",
" \n",
" | 17 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 678.863889 | \n",
"
\n",
" \n",
" | 18 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 452.911685 | \n",
"
\n",
" \n",
" | 19 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 670.866431 | \n",
"
\n",
" \n",
" | 20 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 447.580046 | \n",
"
\n",
" \n",
" | 21 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 699.869171 | \n",
"
\n",
" \n",
" | 22 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 466.915206 | \n",
"
\n",
" \n",
" | 23 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 691.871714 | \n",
"
\n",
" \n",
" | 24 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 461.583568 | \n",
"
\n",
" \n",
" | 25 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 695.323071 | \n",
"
\n",
" \n",
" | 26 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 463.884473 | \n",
"
\n",
" \n",
" | 27 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 687.325613 | \n",
"
\n",
" \n",
" | 28 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 458.552834 | \n",
"
\n",
" \n",
" | 29 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 716.328353 | \n",
"
\n",
" \n",
" | 30 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 477.887994 | \n",
"
\n",
" \n",
" | 31 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 708.330896 | \n",
"
\n",
" \n",
" | 32 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 472.556356 | \n",
"
\n",
" \n",
" | 33 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 699.887364 | \n",
"
\n",
" \n",
" | 34 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 466.927335 | \n",
"
\n",
" \n",
" | 35 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 691.889907 | \n",
"
\n",
" \n",
" | 36 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 461.595697 | \n",
"
\n",
" \n",
" | 37 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 720.892646 | \n",
"
\n",
" \n",
" | 38 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 480.930856 | \n",
"
\n",
" \n",
" | 39 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 712.895189 | \n",
"
\n",
" \n",
" | 40 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 475.599218 | \n",
"
\n",
" \n",
" | 41 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 2 | \n",
" 807.942867 | \n",
"
\n",
" \n",
" | 42 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 3 | \n",
" 538.964337 | \n",
"
\n",
" \n",
" | 43 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 4 | \n",
" 404.475072 | \n",
"
\n",
" \n",
" | 44 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 2 | \n",
" 799.945410 | \n",
"
\n",
" \n",
" | 45 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 3 | \n",
" 533.632699 | \n",
"
\n",
" \n",
" | 46 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 4 | \n",
" 400.476343 | \n",
"
\n",
" \n",
" | 47 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
" 2 | \n",
" 772.903742 | \n",
"
\n",
" \n",
" | 48 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
" 3 | \n",
" 515.604920 | \n",
"
\n",
" \n",
" | 49 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 0 | \n",
" 2 | \n",
" 764.906285 | \n",
"
\n",
" \n",
" | 50 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 0 | \n",
" 3 | \n",
" 510.273282 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n",
"0 LMNPQRST 1 1 False True \n",
"1 LMNPQRST 1 1 False True \n",
"2 ACDESTYK 1 0 True False \n",
"3 ACDESTYK 1 0 True False \n",
"4 LLNPQRTT 1 1 False True \n",
"5 ASDESTSK 1 0 True False \n",
"6 ASDESTSK 1 0 True False \n",
"7 MACDESTYK 1 0 True False \n",
"8 MACDESTYK 1 0 True False \n",
"9 MACDESTYK 1 0 True False \n",
"10 MACDESTYK 1 0 True False \n",
"11 MLCDESTSK 1 0 True False \n",
"12 MLCDESTSK 1 0 True False \n",
"13 MLCDESTSK 1 0 True False \n",
"14 MLCDESTSK 1 0 True False \n",
"15 ASDESTYKVK 1 1 True False \n",
"16 ASDESTYKVK 1 1 True False \n",
"17 FGHIKLMNPQR 0;1 1 True True \n",
"18 FGHIKLMNPQR 0;1 1 True True \n",
"19 FGHIKLMNPQR 0;1 1 True True \n",
"20 FGHIKLMNPQR 0;1 1 True True \n",
"21 FGHIKLMNPQR 0;1 1 True True \n",
"22 FGHIKLMNPQR 0;1 1 True True \n",
"23 FGHIKLMNPQR 0;1 1 True True \n",
"24 FGHIKLMNPQR 0;1 1 True True \n",
"25 MLCDESTYKVK 1 1 True False \n",
"26 MLCDESTYKVK 1 1 True False \n",
"27 MLCDESTYKVK 1 1 True False \n",
"28 MLCDESTYKVK 1 1 True False \n",
"29 MLCDESTYKVK 1 1 True False \n",
"30 MLCDESTYKVK 1 1 True False \n",
"31 MLCDESTYKVK 1 1 True False \n",
"32 MLCDESTYKVK 1 1 True False \n",
"33 FLHIKLMNPNR 0;1 1 True True \n",
"34 FLHIKLMNPNR 0;1 1 True True \n",
"35 FLHIKLMNPNR 0;1 1 True True \n",
"36 FLHIKLMNPNR 0;1 1 True True \n",
"37 FLHIKLMNPNR 0;1 1 True True \n",
"38 FLHIKLMNPNR 0;1 1 True True \n",
"39 FLHIKLMNPNR 0;1 1 True True \n",
"40 FLHIKLMNPNR 0;1 1 True True \n",
"41 FLHIKLMNPQRTT 1 2 False True \n",
"42 FLHIKLMNPQRTT 1 2 False True \n",
"43 FLHIKLMNPQRTT 1 2 False True \n",
"44 FLHIKLMNPQRTT 1 2 False True \n",
"45 FLHIKLMNPQRTT 1 2 False True \n",
"46 FLHIKLMNPQRTT 1 2 False True \n",
"47 FGHIKLMNPQRST 1 2 False True \n",
"48 FGHIKLMNPQRST 1 2 False True \n",
"49 FGHIKLMNPQRST 1 2 False True \n",
"50 FGHIKLMNPQRST 1 2 False True \n",
"\n",
" mods mod_sites nAA decoy \\\n",
"0 Oxidation@M 2 8 0 \n",
"1 8 0 \n",
"2 Carbamidomethyl@C 2 8 0 \n",
"3 Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 \n",
"4 8 1 \n",
"5 8 1 \n",
"6 Acetyl@Protein_N-term 0 8 1 \n",
"7 Oxidation@M;Carbamidomethyl@C 1;3 9 0 \n",
"8 Carbamidomethyl@C 3 9 0 \n",
"9 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 \n",
"10 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 \n",
"11 Oxidation@M;Carbamidomethyl@C 1;3 9 1 \n",
"12 Carbamidomethyl@C 3 9 1 \n",
"13 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 \n",
"14 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 \n",
"15 10 1 \n",
"16 Acetyl@Protein_N-term 0 10 1 \n",
"17 Oxidation@M 7 11 0 \n",
"18 Oxidation@M 7 11 0 \n",
"19 11 0 \n",
"20 11 0 \n",
"21 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n",
"22 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n",
"23 Acetyl@Protein_N-term 0 11 0 \n",
"24 Acetyl@Protein_N-term 0 11 0 \n",
"25 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n",
"26 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n",
"27 Carbamidomethyl@C 3 11 1 \n",
"28 Carbamidomethyl@C 3 11 1 \n",
"29 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n",
"30 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n",
"31 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n",
"32 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n",
"33 Oxidation@M 7 11 1 \n",
"34 Oxidation@M 7 11 1 \n",
"35 11 1 \n",
"36 11 1 \n",
"37 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n",
"38 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n",
"39 Acetyl@Protein_N-term 0 11 1 \n",
"40 Acetyl@Protein_N-term 0 11 1 \n",
"41 Oxidation@M 7 13 1 \n",
"42 Oxidation@M 7 13 1 \n",
"43 Oxidation@M 7 13 1 \n",
"44 13 1 \n",
"45 13 1 \n",
"46 13 1 \n",
"47 Oxidation@M 7 13 0 \n",
"48 Oxidation@M 7 13 0 \n",
"49 13 0 \n",
"50 13 0 \n",
"\n",
" charge precursor_mz \n",
"0 2 481.739834 \n",
"1 2 473.742377 \n",
"2 2 487.200207 \n",
"3 2 508.205490 \n",
"4 2 471.771991 \n",
"5 2 412.685247 \n",
"6 2 433.690529 \n",
"7 2 560.717907 \n",
"8 2 552.720450 \n",
"9 2 581.723190 \n",
"10 2 573.725732 \n",
"11 2 543.725732 \n",
"12 2 535.728275 \n",
"13 2 564.731015 \n",
"14 2 556.733557 \n",
"15 2 564.282586 \n",
"16 2 585.287868 \n",
"17 2 678.863889 \n",
"18 3 452.911685 \n",
"19 2 670.866431 \n",
"20 3 447.580046 \n",
"21 2 699.869171 \n",
"22 3 466.915206 \n",
"23 2 691.871714 \n",
"24 3 461.583568 \n",
"25 2 695.323071 \n",
"26 3 463.884473 \n",
"27 2 687.325613 \n",
"28 3 458.552834 \n",
"29 2 716.328353 \n",
"30 3 477.887994 \n",
"31 2 708.330896 \n",
"32 3 472.556356 \n",
"33 2 699.887364 \n",
"34 3 466.927335 \n",
"35 2 691.889907 \n",
"36 3 461.595697 \n",
"37 2 720.892646 \n",
"38 3 480.930856 \n",
"39 2 712.895189 \n",
"40 3 475.599218 \n",
"41 2 807.942867 \n",
"42 3 538.964337 \n",
"43 4 404.475072 \n",
"44 2 799.945410 \n",
"45 3 533.632699 \n",
"46 4 400.476343 \n",
"47 2 772.903742 \n",
"48 3 515.604920 \n",
"49 2 764.906285 \n",
"50 3 510.273282 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.precursor_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start from peptides instead of proteins\n",
"\n",
"The modularity design of `SpecLibFasta` allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.\n",
"\n",
"For example, we have a list of sequences, and we what to add modifications using `SpecLibFasta` functionalities:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" nAA | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" labeling_channel | \n",
" charge | \n",
" precursor_mz | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" OPQRST | \n",
" 6 | \n",
" False | \n",
" False | \n",
" Dimethyl@Any_N-term | \n",
" 0 | \n",
" 0 | \n",
" 2 | \n",
" 427.248152 | \n",
"
\n",
" \n",
" | 1 | \n",
" HIJKLMN | \n",
" 7 | \n",
" False | \n",
" False | \n",
" Oxidation@M;Dimethyl@Any_N-term;Dimethyl@K | \n",
" 6;0;4 | \n",
" 0 | \n",
" 2 | \n",
" 470.786056 | \n",
"
\n",
" \n",
" | 2 | \n",
" HIJKLMN | \n",
" 7 | \n",
" False | \n",
" False | \n",
" Dimethyl@Any_N-term;Dimethyl@K | \n",
" 0;4 | \n",
" 0 | \n",
" 2 | \n",
" 462.788599 | \n",
"
\n",
" \n",
" | 3 | \n",
" OPQRST | \n",
" 6 | \n",
" False | \n",
" False | \n",
" Dimethyl:2H(4)@Any_N-term | \n",
" 0 | \n",
" 4 | \n",
" 2 | \n",
" 429.260705 | \n",
"
\n",
" \n",
" | 4 | \n",
" HIJKLMN | \n",
" 7 | \n",
" False | \n",
" False | \n",
" Oxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl... | \n",
" 6;0;4 | \n",
" 4 | \n",
" 2 | \n",
" 474.811163 | \n",
"
\n",
" \n",
" | 5 | \n",
" HIJKLMN | \n",
" 7 | \n",
" False | \n",
" False | \n",
" Dimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K | \n",
" 0;4 | \n",
" 4 | \n",
" 2 | \n",
" 466.813706 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence nAA is_prot_nterm is_prot_cterm \\\n",
"0 OPQRST 6 False False \n",
"1 HIJKLMN 7 False False \n",
"2 HIJKLMN 7 False False \n",
"3 OPQRST 6 False False \n",
"4 HIJKLMN 7 False False \n",
"5 HIJKLMN 7 False False \n",
"\n",
" mods mod_sites \\\n",
"0 Dimethyl@Any_N-term 0 \n",
"1 Oxidation@M;Dimethyl@Any_N-term;Dimethyl@K 6;0;4 \n",
"2 Dimethyl@Any_N-term;Dimethyl@K 0;4 \n",
"3 Dimethyl:2H(4)@Any_N-term 0 \n",
"4 Oxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl... 6;0;4 \n",
"5 Dimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K 0;4 \n",
"\n",
" labeling_channel charge precursor_mz \n",
"0 0 2 427.248152 \n",
"1 0 2 470.786056 \n",
"2 0 2 462.788599 \n",
"3 4 2 429.260705 \n",
"4 4 2 474.811163 \n",
"5 4 2 466.813706 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"pep_lib = SpecLibFasta(\n",
" charged_frag_types=['b_z1','y_z1'],\n",
" fix_mods=['Carbamidomethyl@C'],\n",
" var_mods=['Acetyl@Protein_N-term','Oxidation@M'],\n",
" labeling_channels={\n",
" 0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],\n",
" 4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],\n",
" },\n",
" decoy=None,\n",
")\n",
"\n",
"pep_lib.precursor_df = pd.DataFrame({\n",
" 'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']\n",
"})\n",
"pep_lib.process_from_naked_peptide_seqs()\n",
"pep_lib.precursor_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate masses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Calculate precursor m/z"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sequence | \n",
" protein_idxes | \n",
" miss_cleavage | \n",
" is_prot_nterm | \n",
" is_prot_cterm | \n",
" mods | \n",
" mod_sites | \n",
" nAA | \n",
" decoy | \n",
" charge | \n",
" precursor_mz | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" LMNPQRST | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 481.739834 | \n",
"
\n",
" \n",
" | 1 | \n",
" LMNPQRST | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 473.742377 | \n",
"
\n",
" \n",
" | 2 | \n",
" ACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 487.200207 | \n",
"
\n",
" \n",
" | 3 | \n",
" ACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;2 | \n",
" 8 | \n",
" 0 | \n",
" 2 | \n",
" 508.205490 | \n",
"
\n",
" \n",
" | 4 | \n",
" LLNPQRTT | \n",
" 1 | \n",
" 1 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 471.771991 | \n",
"
\n",
" \n",
" | 5 | \n",
" ASDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 412.685247 | \n",
"
\n",
" \n",
" | 6 | \n",
" ASDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 8 | \n",
" 1 | \n",
" 2 | \n",
" 433.690529 | \n",
"
\n",
" \n",
" | 7 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 560.717907 | \n",
"
\n",
" \n",
" | 8 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 552.720450 | \n",
"
\n",
" \n",
" | 9 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 581.723190 | \n",
"
\n",
" \n",
" | 10 | \n",
" MACDESTYK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 9 | \n",
" 0 | \n",
" 2 | \n",
" 573.725732 | \n",
"
\n",
" \n",
" | 11 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 543.725732 | \n",
"
\n",
" \n",
" | 12 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 535.728275 | \n",
"
\n",
" \n",
" | 13 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 564.731015 | \n",
"
\n",
" \n",
" | 14 | \n",
" MLCDESTSK | \n",
" 1 | \n",
" 0 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 9 | \n",
" 1 | \n",
" 2 | \n",
" 556.733557 | \n",
"
\n",
" \n",
" | 15 | \n",
" ASDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" | \n",
" | \n",
" 10 | \n",
" 1 | \n",
" 2 | \n",
" 564.282586 | \n",
"
\n",
" \n",
" | 16 | \n",
" ASDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 10 | \n",
" 1 | \n",
" 2 | \n",
" 585.287868 | \n",
"
\n",
" \n",
" | 17 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 678.863889 | \n",
"
\n",
" \n",
" | 18 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 452.911685 | \n",
"
\n",
" \n",
" | 19 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 670.866431 | \n",
"
\n",
" \n",
" | 20 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 447.580046 | \n",
"
\n",
" \n",
" | 21 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 699.869171 | \n",
"
\n",
" \n",
" | 22 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 466.915206 | \n",
"
\n",
" \n",
" | 23 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 0 | \n",
" 2 | \n",
" 691.871714 | \n",
"
\n",
" \n",
" | 24 | \n",
" FGHIKLMNPQR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 0 | \n",
" 3 | \n",
" 461.583568 | \n",
"
\n",
" \n",
" | 25 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 695.323071 | \n",
"
\n",
" \n",
" | 26 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Oxidation@M;Carbamidomethyl@C | \n",
" 1;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 463.884473 | \n",
"
\n",
" \n",
" | 27 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 687.325613 | \n",
"
\n",
" \n",
" | 28 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Carbamidomethyl@C | \n",
" 3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 458.552834 | \n",
"
\n",
" \n",
" | 29 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 716.328353 | \n",
"
\n",
" \n",
" | 30 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | \n",
" 0;1;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 477.887994 | \n",
"
\n",
" \n",
" | 31 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 708.330896 | \n",
"
\n",
" \n",
" | 32 | \n",
" MLCDESTYKVK | \n",
" 1 | \n",
" 1 | \n",
" True | \n",
" False | \n",
" Acetyl@Protein_N-term;Carbamidomethyl@C | \n",
" 0;3 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 472.556356 | \n",
"
\n",
" \n",
" | 33 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 699.887364 | \n",
"
\n",
" \n",
" | 34 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 466.927335 | \n",
"
\n",
" \n",
" | 35 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 691.889907 | \n",
"
\n",
" \n",
" | 36 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" | \n",
" | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 461.595697 | \n",
"
\n",
" \n",
" | 37 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 720.892646 | \n",
"
\n",
" \n",
" | 38 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term;Oxidation@M | \n",
" 0;7 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 480.930856 | \n",
"
\n",
" \n",
" | 39 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 1 | \n",
" 2 | \n",
" 712.895189 | \n",
"
\n",
" \n",
" | 40 | \n",
" FLHIKLMNPNR | \n",
" 0;1 | \n",
" 1 | \n",
" True | \n",
" True | \n",
" Acetyl@Protein_N-term | \n",
" 0 | \n",
" 11 | \n",
" 1 | \n",
" 3 | \n",
" 475.599218 | \n",
"
\n",
" \n",
" | 41 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 2 | \n",
" 807.942867 | \n",
"
\n",
" \n",
" | 42 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 3 | \n",
" 538.964337 | \n",
"
\n",
" \n",
" | 43 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 1 | \n",
" 4 | \n",
" 404.475072 | \n",
"
\n",
" \n",
" | 44 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 2 | \n",
" 799.945410 | \n",
"
\n",
" \n",
" | 45 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 3 | \n",
" 533.632699 | \n",
"
\n",
" \n",
" | 46 | \n",
" FLHIKLMNPQRTT | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 1 | \n",
" 4 | \n",
" 400.476343 | \n",
"
\n",
" \n",
" | 47 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
" 2 | \n",
" 772.903742 | \n",
"
\n",
" \n",
" | 48 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" Oxidation@M | \n",
" 7 | \n",
" 13 | \n",
" 0 | \n",
" 3 | \n",
" 515.604920 | \n",
"
\n",
" \n",
" | 49 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 0 | \n",
" 2 | \n",
" 764.906285 | \n",
"
\n",
" \n",
" | 50 | \n",
" FGHIKLMNPQRST | \n",
" 1 | \n",
" 2 | \n",
" False | \n",
" True | \n",
" | \n",
" | \n",
" 13 | \n",
" 0 | \n",
" 3 | \n",
" 510.273282 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm \\\n",
"0 LMNPQRST 1 1 False True \n",
"1 LMNPQRST 1 1 False True \n",
"2 ACDESTYK 1 0 True False \n",
"3 ACDESTYK 1 0 True False \n",
"4 LLNPQRTT 1 1 False True \n",
"5 ASDESTSK 1 0 True False \n",
"6 ASDESTSK 1 0 True False \n",
"7 MACDESTYK 1 0 True False \n",
"8 MACDESTYK 1 0 True False \n",
"9 MACDESTYK 1 0 True False \n",
"10 MACDESTYK 1 0 True False \n",
"11 MLCDESTSK 1 0 True False \n",
"12 MLCDESTSK 1 0 True False \n",
"13 MLCDESTSK 1 0 True False \n",
"14 MLCDESTSK 1 0 True False \n",
"15 ASDESTYKVK 1 1 True False \n",
"16 ASDESTYKVK 1 1 True False \n",
"17 FGHIKLMNPQR 0;1 1 True True \n",
"18 FGHIKLMNPQR 0;1 1 True True \n",
"19 FGHIKLMNPQR 0;1 1 True True \n",
"20 FGHIKLMNPQR 0;1 1 True True \n",
"21 FGHIKLMNPQR 0;1 1 True True \n",
"22 FGHIKLMNPQR 0;1 1 True True \n",
"23 FGHIKLMNPQR 0;1 1 True True \n",
"24 FGHIKLMNPQR 0;1 1 True True \n",
"25 MLCDESTYKVK 1 1 True False \n",
"26 MLCDESTYKVK 1 1 True False \n",
"27 MLCDESTYKVK 1 1 True False \n",
"28 MLCDESTYKVK 1 1 True False \n",
"29 MLCDESTYKVK 1 1 True False \n",
"30 MLCDESTYKVK 1 1 True False \n",
"31 MLCDESTYKVK 1 1 True False \n",
"32 MLCDESTYKVK 1 1 True False \n",
"33 FLHIKLMNPNR 0;1 1 True True \n",
"34 FLHIKLMNPNR 0;1 1 True True \n",
"35 FLHIKLMNPNR 0;1 1 True True \n",
"36 FLHIKLMNPNR 0;1 1 True True \n",
"37 FLHIKLMNPNR 0;1 1 True True \n",
"38 FLHIKLMNPNR 0;1 1 True True \n",
"39 FLHIKLMNPNR 0;1 1 True True \n",
"40 FLHIKLMNPNR 0;1 1 True True \n",
"41 FLHIKLMNPQRTT 1 2 False True \n",
"42 FLHIKLMNPQRTT 1 2 False True \n",
"43 FLHIKLMNPQRTT 1 2 False True \n",
"44 FLHIKLMNPQRTT 1 2 False True \n",
"45 FLHIKLMNPQRTT 1 2 False True \n",
"46 FLHIKLMNPQRTT 1 2 False True \n",
"47 FGHIKLMNPQRST 1 2 False True \n",
"48 FGHIKLMNPQRST 1 2 False True \n",
"49 FGHIKLMNPQRST 1 2 False True \n",
"50 FGHIKLMNPQRST 1 2 False True \n",
"\n",
" mods mod_sites nAA decoy \\\n",
"0 Oxidation@M 2 8 0 \n",
"1 8 0 \n",
"2 Carbamidomethyl@C 2 8 0 \n",
"3 Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 \n",
"4 8 1 \n",
"5 8 1 \n",
"6 Acetyl@Protein_N-term 0 8 1 \n",
"7 Oxidation@M;Carbamidomethyl@C 1;3 9 0 \n",
"8 Carbamidomethyl@C 3 9 0 \n",
"9 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 \n",
"10 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 \n",
"11 Oxidation@M;Carbamidomethyl@C 1;3 9 1 \n",
"12 Carbamidomethyl@C 3 9 1 \n",
"13 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 \n",
"14 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 \n",
"15 10 1 \n",
"16 Acetyl@Protein_N-term 0 10 1 \n",
"17 Oxidation@M 7 11 0 \n",
"18 Oxidation@M 7 11 0 \n",
"19 11 0 \n",
"20 11 0 \n",
"21 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n",
"22 Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 \n",
"23 Acetyl@Protein_N-term 0 11 0 \n",
"24 Acetyl@Protein_N-term 0 11 0 \n",
"25 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n",
"26 Oxidation@M;Carbamidomethyl@C 1;3 11 1 \n",
"27 Carbamidomethyl@C 3 11 1 \n",
"28 Carbamidomethyl@C 3 11 1 \n",
"29 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n",
"30 Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 \n",
"31 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n",
"32 Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 \n",
"33 Oxidation@M 7 11 1 \n",
"34 Oxidation@M 7 11 1 \n",
"35 11 1 \n",
"36 11 1 \n",
"37 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n",
"38 Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 \n",
"39 Acetyl@Protein_N-term 0 11 1 \n",
"40 Acetyl@Protein_N-term 0 11 1 \n",
"41 Oxidation@M 7 13 1 \n",
"42 Oxidation@M 7 13 1 \n",
"43 Oxidation@M 7 13 1 \n",
"44 13 1 \n",
"45 13 1 \n",
"46 13 1 \n",
"47 Oxidation@M 7 13 0 \n",
"48 Oxidation@M 7 13 0 \n",
"49 13 0 \n",
"50 13 0 \n",
"\n",
" charge precursor_mz \n",
"0 2 481.739834 \n",
"1 2 473.742377 \n",
"2 2 487.200207 \n",
"3 2 508.205490 \n",
"4 2 471.771991 \n",
"5 2 412.685247 \n",
"6 2 433.690529 \n",
"7 2 560.717907 \n",
"8 2 552.720450 \n",
"9 2 581.723190 \n",
"10 2 573.725732 \n",
"11 2 543.725732 \n",
"12 2 535.728275 \n",
"13 2 564.731015 \n",
"14 2 556.733557 \n",
"15 2 564.282586 \n",
"16 2 585.287868 \n",
"17 2 678.863889 \n",
"18 3 452.911685 \n",
"19 2 670.866431 \n",
"20 3 447.580046 \n",
"21 2 699.869171 \n",
"22 3 466.915206 \n",
"23 2 691.871714 \n",
"24 3 461.583568 \n",
"25 2 695.323071 \n",
"26 3 463.884473 \n",
"27 2 687.325613 \n",
"28 3 458.552834 \n",
"29 2 716.328353 \n",
"30 3 477.887994 \n",
"31 2 708.330896 \n",
"32 3 472.556356 \n",
"33 2 699.887364 \n",
"34 3 466.927335 \n",
"35 2 691.889907 \n",
"36 3 461.595697 \n",
"37 2 720.892646 \n",
"38 3 480.930856 \n",
"39 2 712.895189 \n",
"40 3 475.599218 \n",
"41 2 807.942867 \n",
"42 3 538.964337 \n",
"43 4 404.475072 \n",
"44 2 799.945410 \n",
"45 3 533.632699 \n",
"46 4 400.476343 \n",
"47 2 772.903742 \n",
"48 3 515.604920 \n",
"49 2 764.906285 \n",
"50 3 510.273282 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.calc_precursor_mz()\n",
"# fasta_lib.calc_precursor_isotope()\n",
"fasta_lib.precursor_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After `calc_precursor_mz()`, all sequences containing `x` are removed because `x`'s mass is very large which is out of the range of `fasta_lib.min_precursor_mz` and `fasta_lib.max_precursor_mz`."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(400.0, 2000.0, \"mass of 'x' is 100000000.0\")"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from alphabase.constants.aa import AA_ASCII_MASS\n",
"(\n",
" fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz, \n",
" f\"mass of 'x' is {AA_ASCII_MASS[ord('x')]}\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Calculate fragment m/z"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" b_z1 | \n",
" y_z1 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 114.091339 | \n",
" 849.388306 | \n",
"
\n",
" \n",
" | 1 | \n",
" 261.126740 | \n",
" 702.352905 | \n",
"
\n",
" \n",
" | 2 | \n",
" 375.169678 | \n",
" 588.309998 | \n",
"
\n",
" \n",
" | 3 | \n",
" 472.222443 | \n",
" 491.257233 | \n",
"
\n",
" \n",
" | 4 | \n",
" 600.281006 | \n",
" 363.198669 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 486 | \n",
" 941.502563 | \n",
" 588.309998 | \n",
"
\n",
" \n",
" | 487 | \n",
" 1038.555298 | \n",
" 491.257233 | \n",
"
\n",
" \n",
" | 488 | \n",
" 1166.613892 | \n",
" 363.198669 | \n",
"
\n",
" \n",
" | 489 | \n",
" 1322.714966 | \n",
" 207.097549 | \n",
"
\n",
" \n",
" | 490 | \n",
" 1409.747070 | \n",
" 120.065521 | \n",
"
\n",
" \n",
"
\n",
"
491 rows × 2 columns
\n",
"
"
],
"text/plain": [
" b_z1 y_z1\n",
"0 114.091339 849.388306\n",
"1 261.126740 702.352905\n",
"2 375.169678 588.309998\n",
"3 472.222443 491.257233\n",
"4 600.281006 363.198669\n",
".. ... ...\n",
"486 941.502563 588.309998\n",
"487 1038.555298 491.257233\n",
"488 1166.613892 363.198669\n",
"489 1322.714966 207.097549\n",
"490 1409.747070 120.065521\n",
"\n",
"[491 rows x 2 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fasta_lib.calc_fragment_mz_df()\n",
"fasta_lib.fragment_mz_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `frag_start_idx` and `frag_stop_idx` in precursor_df to locate the corresponding fragments"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" b_z1 | \n",
" y_z1 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 35 | \n",
" 72.044388 | \n",
" 753.326111 | \n",
"
\n",
" \n",
" | 36 | \n",
" 159.076416 | \n",
" 666.294067 | \n",
"
\n",
" \n",
" | 37 | \n",
" 274.103363 | \n",
" 551.267151 | \n",
"
\n",
" \n",
" | 38 | \n",
" 403.145966 | \n",
" 422.224548 | \n",
"
\n",
" \n",
" | 39 | \n",
" 490.177979 | \n",
" 335.192505 | \n",
"
\n",
" \n",
" | 40 | \n",
" 591.225647 | \n",
" 234.144836 | \n",
"
\n",
" \n",
" | 41 | \n",
" 678.257690 | \n",
" 147.112808 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" b_z1 y_z1\n",
"35 72.044388 753.326111\n",
"36 159.076416 666.294067\n",
"37 274.103363 551.267151\n",
"38 403.145966 422.224548\n",
"39 490.177979 335.192505\n",
"40 591.225647 234.144836\n",
"41 678.257690 147.112808"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ith_pep = 5\n",
"frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values\n",
"fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.3 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "8a3b27e141e49c996c9b863f8707e97aabd49c4a7e8445b9b783b34e4a21a9b2"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}