Tutorial for Dev: Spectral Libraries#
This notebook introduces functionalities for spectral libraries to developers.
[1]:
# One or two methods are renamed and only available in v1.0.1 (main branch),
# please install alphabase from github:
# %pip install git+https://github.com/mannlabs/alphabase
[2]:
import alphabase
# the version must be 1.0.1
alphabase.__version__
[2]:
'1.0.1'
The Base Library Class#
alphabase.spectral_library.base.SpecLibBase
is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via alphabase.protein.fasta.SpecLibFasta
.
SpecLibFasta
#
Almost all DataFrame functionalities to process proteins and peptides have been integrated into alphabase.protein.fasta.SpecLibFasta
.
[3]:
from alphabase.protein.fasta import SpecLibFasta
fasta_lib = SpecLibFasta(
charged_frag_types=['b_z1','y_z1'],
protease='trypsin',
fix_mods=['Carbamidomethyl@C'],
var_mods=['Acetyl@Protein N-term','Oxidation@M'],
decoy=None,
)
Start from fasta/proteins#
The SpecLibFasta will do following for us:
Load fasta files into a protein_dict
Digest proteins into peptide sequences
Append decoy peptide sequences if self.decoy is not None
Add fixed and variable modifications
[Add special modifications]
[Add peptide labeling]
Add charge states to peptides
Load fasta files into a protein_dict#
[4]:
# from alphabase.protein.fasta import load_all_proteins
# protein_dict = load_all_proteins(fasta_files)
# For example, the protein_dict is:
protein_dict = {
'yy': {
'protein_id': 'yy',
'full_name': 'yy_yy',
'gene_name': 'y_y',
'sequence': 'FGHIKLMNPQR'
},
'xx': {
'protein_id': 'xx',
'full_name': 'xx_xx',
'gene_name': 'x_x',
'sequence': 'MACDESTYKxKFGHIKLMNPQRST'
},
}
Digest proteins into peptide sequences#
[5]:
fasta_lib.get_peptides_from_protein_dict(protein_dict)
fasta_lib.precursor_df
[5]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | |
---|---|---|---|---|---|---|---|---|
0 | xKFGHIK | 1 | 1 | False | False | 7 | ||
1 | LMNPQRST | 1 | 1 | False | True | 8 | ||
2 | ACDESTYK | 1 | 0 | True | False | 8 | ||
3 | MACDESTYK | 1 | 0 | True | False | 9 | ||
4 | ACDESTYKxK | 1 | 1 | True | False | 10 | ||
5 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | ||
6 | MACDESTYKxK | 1 | 1 | True | False | 11 | ||
7 | xKFGHIKLMNPQR | 1 | 2 | False | False | 13 | ||
8 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | ||
9 | ACDESTYKxKFGHIK | 1 | 2 | True | False | 15 | ||
10 | MACDESTYKxKFGHIK | 1 | 2 | True | False | 16 |
[6]:
fasta_lib.protein_df
[6]:
protein_id | full_name | gene_name | sequence | |
---|---|---|---|---|
0 | yy | yy_yy | y_y | FGHIKLMNPQR |
1 | xx | xx_xx | x_x | MACDESTYKxKFGHIKLMNPQRST |
Append decoy sequences#
This depends on self.decoy:str, its value can be
protein_reverse
: Reverse on target protein sequencespseudo_reverse
: Pseudo-reverse on target peptide sequencesdiann
: DiaNN-like decoyNone: no decoy.
Let’s take protein_reverse
as an example:
[7]:
fasta_lib.decoy = 'protein_reverse'
fasta_lib.append_decoy_sequence()
fasta_lib.precursor_df.sample(5, random_state=0)
[7]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
---|---|---|---|---|---|---|---|---|---|
20 | ACDESTYKxKFGHIK | 1 | 2 | True | False | 15 | 0 | ||
10 | QPNMLKIHGF | 2 | 1 | False | True | 10 | 1 | ||
14 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | ||
13 | MACDESTYKxK | 1 | 1 | True | False | 11 | 0 | ||
1 | IHGFKxK | 3 | 1 | False | False | 7 | 1 |
As protein_reverse
is a protein-level decoy, the protein_df
is changed too:
[8]:
fasta_lib.protein_df
[8]:
protein_id | full_name | gene_name | sequence | decoy | |
---|---|---|---|---|---|
0 | yy | yy_yy | y_y | FGHIKLMNPQR | 0 |
1 | xx | xx_xx | x_x | MACDESTYKxKFGHIKLMNPQRST | 0 |
2 | REV_yy | REV_yy_yy | REV_y_y | RQPNMLKIHGF | 1 |
3 | REV_xx | REV_xx_xx | REV_x_x | TSRQPNMLKIHGFKxKYTSEDCAM | 1 |
Add modifications#
add_modifications()
will add fixed and variable modifications.
[9]:
fasta_lib.add_modifications()
fasta_lib.precursor_df.sample(5, random_state=0)
[9]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
---|---|---|---|---|---|---|---|---|---|
35 | MACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 3;0 | 11 | 0 |
34 | MACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1 | 11 | 0 |
42 | xKFGHIKLMNPQR | 1 | 2 | False | False | Oxidation@M | 9 | 13 | 0 |
27 | QPNMLKIHGFK | 3 | 1 | False | False | 11 | 1 | ||
11 | YTSEDCAM | 3 | 0 | False | True | Carbamidomethyl@C | 6 | 8 | 1 |
Add special modifications#
Special modifications here refer to some PTMs we want to have more controls on:
We only needs peptides without unmodified forms
GlyGly@K
cannot occur on peptide C-term because trypsin cannot cleave Lys withGlyGly
For some special modifications like
Phospho@S
andHexNAc@S
, we would like to limit the number of peptidome forms to control the memory usage.
[10]:
fasta_lib.special_mods = ['GlyGly@K']
fasta_lib.special_mods_cannot_modify_pep_c_term = True
fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms
fasta_lib.max_special_mod_num = 1 # limit the number of
fasta_lib.add_special_modifications()
fasta_lib.precursor_df.sample(10, random_state=0)
[10]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
---|---|---|---|---|---|---|---|---|---|
28 | QPNMLKIHGFKxK | 3 | 2 | False | False | Oxidation@M;GlyGly@K | 4;6 | 13 | 1 |
37 | IHGFKxKYTSEDCAM | 3 | 2 | False | True | Carbamidomethyl@C;Oxidation@M;GlyGly@K | 13;15;7 | 15 | 1 |
11 | RQPNMLKIHGF | 2 | 2 | True | True | GlyGly@K | 7 | 11 | 1 |
34 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term;Oxidation@M;GlyGly@K | 0;7;9 | 14 | 1 |
2 | ACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;GlyGly@K | 2;8 | 10 | 0 |
30 | QPNMLKIHGFKxK | 3 | 2 | False | False | GlyGly@K | 6 | 13 | 1 |
40 | ACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;GlyGly@K | 2;8 | 15 | 0 |
32 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Oxidation@M;GlyGly@K | 7;9 | 14 | 1 |
26 | xKFGHIKLMNPQR | 1 | 2 | False | False | GlyGly@K | 2 | 13 | 0 |
4 | xKYTSEDCAM | 3 | 1 | False | True | Carbamidomethyl@C;Oxidation@M;GlyGly@K | 8;10;2 | 10 | 1 |
Add peptide labeling#
For example Dimethyl:
[11]:
fasta_lib.labeling_channels = {
0: ['Dimethyl@K', 'Dimethyl@Any N-term'],
4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any N-term'],
}
fasta_lib.add_peptide_labeling()
fasta_lib.precursor_df.sample(10, random_state=0)
[11]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | labeling_channel | |
---|---|---|---|---|---|---|---|---|---|---|
26 | xKFGHIKLMNPQR | 1 | 2 | False | False | GlyGly@K;Dimethyl@Any N-term;Dimethyl@K;Dimeth... | 2;0;2;7 | 13 | 0 | 0 |
61 | QPNMLKIHGFK | 3 | 1 | False | False | GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H... | 6;0;6;11 | 11 | 1 | 4 |
2 | ACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;GlyGly@K;Dimethyl@Any N-term... | 2;8;0;8;10 | 10 | 0 | 0 |
62 | RQPNMLKIHGF | 2 | 2 | True | True | Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any N-term... | 5;7;0;7 | 11 | 1 | 4 |
85 | TSRQPNMLKIHGFK | 3 | 2 | True | False | GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H... | 9;0;9;14 | 14 | 1 | 4 |
48 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1;9;9;11;16 | 16 | 0 | 0 |
16 | MACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1;9;9;11 | 11 | 0 | 0 |
99 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any ... | 3;11;0;9;11;16 | 16 | 0 | 4 |
56 | xKYTSEDCAM | 3 | 1 | False | True | Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy... | 8;10;2;0;2 | 10 | 1 | 4 |
45 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy... | 3;1;11;0;9;11;16 | 16 | 0 | 0 |
Add charge states#
[12]:
fasta_lib.add_charge()
fasta_lib.precursor_df.sample(5, random_state=0)
[12]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | labeling_channel | charge | |
---|---|---|---|---|---|---|---|---|---|---|---|
65 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein N-term;GlyGly@K;Dimethyl@K | 0;5;5 | 11 | 0 | 0 | 4 |
225 | FGHIKLMNPQRST | 1 | 2 | False | True | GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H... | 5;0;5 | 13 | 0 | 4 | 2 |
188 | RQPNMLKIHGF | 2 | 2 | True | True | Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any N-term... | 5;7;0;7 | 11 | 1 | 4 | 4 |
200 | MACDESTYKxK | 1 | 1 | True | False | Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy... | 3;1;9;0;9;11 | 11 | 0 | 4 | 4 |
108 | IHGFKxKYTSEDCAM | 3 | 2 | False | True | Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy... | 13;15;5;0;5;7 | 15 | 1 | 0 | 2 |
import_and_process_protein_dict()
combines all steps#
Or import_and_process_fasta()
for fasta files.
[13]:
fasta_lib.special_mods = []
fasta_lib.labeling_channels = None
fasta_lib.import_and_process_protein_dict(protein_dict)
fasta_lib.protein_df
[13]:
protein_id | full_name | gene_name | sequence | decoy | |
---|---|---|---|---|---|
0 | yy | yy_yy | y_y | FGHIKLMNPQR | 0 |
1 | xx | xx_xx | x_x | MACDESTYKxKFGHIKLMNPQRST | 0 |
2 | REV_yy | REV_yy_yy | REV_y_y | RQPNMLKIHGF | 1 |
3 | REV_xx | REV_xx_xx | REV_x_x | TSRQPNMLKIHGFKxKYTSEDCAM | 1 |
[14]:
fasta_lib.precursor_df
[14]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | charge | |
---|---|---|---|---|---|---|---|---|---|---|
0 | xKFGHIK | 1 | 1 | False | False | 7 | 0 | 2 | ||
1 | xKFGHIK | 1 | 1 | False | False | 7 | 0 | 3 | ||
2 | xKFGHIK | 1 | 1 | False | False | 7 | 0 | 4 | ||
3 | IHGFKxK | 3 | 1 | False | False | 7 | 1 | 2 | ||
4 | IHGFKxK | 3 | 1 | False | False | 7 | 1 | 3 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
169 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1 | 16 | 0 | 3 |
170 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1 | 16 | 0 | 4 |
171 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 3;0 | 16 | 0 | 2 |
172 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 3;0 | 16 | 0 | 3 |
173 | MACDESTYKxKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 3;0 | 16 | 0 | 4 |
174 rows × 10 columns
Start from peptides instead of proteins#
The modularity design of SpecLibFasta
allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.
For example, we have a list of sequences, and we what to add modifications using SpecLibFasta
functionalities:
[15]:
import pandas as pd
pep_lib = SpecLibFasta(
charged_frag_types=['b_z1','y_z1'],
fix_mods=['Carbamidomethyl@C'],
var_mods=['Acetyl@Protein N-term','Oxidation@M'],
labeling_channels={
0: ['Dimethyl@K', 'Dimethyl@Any N-term'],
4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any N-term'],
},
decoy=None,
)
pep_lib.precursor_df = pd.DataFrame({
'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']
})
pep_lib.process_from_naked_peptide_seqs()
pep_lib.precursor_df
[15]:
sequence | nAA | is_prot_nterm | is_prot_cterm | mods | mod_sites | labeling_channel | charge | |
---|---|---|---|---|---|---|---|---|
0 | OPQRST | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 2 |
1 | OPQRST | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 3 |
2 | OPQRST | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 4 |
3 | UVWXYZ | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 2 |
4 | UVWXYZ | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 3 |
5 | UVWXYZ | 6 | False | False | Dimethyl@Any N-term | 0 | 0 | 4 |
6 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl@Any N-term | 3;0 | 0 | 2 |
7 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl@Any N-term | 3;0 | 0 | 3 |
8 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl@Any N-term | 3;0 | 0 | 4 |
9 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl@Any N-term;Dimethyl@K | 6;0;4 | 0 | 2 |
10 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl@Any N-term;Dimethyl@K | 6;0;4 | 0 | 3 |
11 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl@Any N-term;Dimethyl@K | 6;0;4 | 0 | 4 |
12 | HIJKLMN | 7 | False | False | Dimethyl@Any N-term;Dimethyl@K | 0;4 | 0 | 2 |
13 | HIJKLMN | 7 | False | False | Dimethyl@Any N-term;Dimethyl@K | 0;4 | 0 | 3 |
14 | HIJKLMN | 7 | False | False | Dimethyl@Any N-term;Dimethyl@K | 0;4 | 0 | 4 |
15 | OPQRST | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 2 |
16 | OPQRST | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 3 |
17 | OPQRST | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 4 |
18 | UVWXYZ | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 2 |
19 | UVWXYZ | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 3 |
20 | UVWXYZ | 6 | False | False | Dimethyl:2H(4)@Any N-term | 0 | 4 | 4 |
21 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term | 3;0 | 4 | 2 |
22 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term | 3;0 | 4 | 3 |
23 | ABCDEFG | 7 | False | False | Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term | 3;0 | 4 | 4 |
24 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl... | 6;0;4 | 4 | 2 |
25 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl... | 6;0;4 | 4 | 3 |
26 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl... | 6;0;4 | 4 | 4 |
27 | HIJKLMN | 7 | False | False | Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K | 0;4 | 4 | 2 |
28 | HIJKLMN | 7 | False | False | Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K | 0;4 | 4 | 3 |
29 | HIJKLMN | 7 | False | False | Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K | 0;4 | 4 | 4 |
Calculate masses#
Calculate precursor m/z#
[16]:
fasta_lib.calc_precursor_mz()
# fasta_lib.calc_precursor_isotope()
fasta_lib.precursor_df
[16]:
sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | charge | precursor_mz | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | RQPNMLK | 2 | 1 | True | False | Oxidation@M | 5 | 7 | 1 | 2 | 451.747462 |
1 | RQPNMLK | 2 | 1 | True | False | 7 | 1 | 2 | 443.750005 | ||
2 | RQPNMLK | 2 | 1 | True | False | Acetyl@Protein N-term;Oxidation@M | 0;5 | 7 | 1 | 2 | 472.752744 |
3 | RQPNMLK | 2 | 1 | True | False | Acetyl@Protein N-term | 0 | 7 | 1 | 2 | 464.755287 |
4 | LMNPQRST | 1 | 1 | False | True | Oxidation@M | 2 | 8 | 0 | 2 | 481.739834 |
5 | LMNPQRST | 1 | 1 | False | True | 8 | 0 | 2 | 473.742377 | ||
6 | ACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 2 | 8 | 0 | 2 | 487.200207 |
7 | ACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 2;0 | 8 | 0 | 2 | 508.205490 |
8 | YTSEDCAM | 3 | 0 | False | True | Carbamidomethyl@C;Oxidation@M | 6;8 | 8 | 1 | 2 | 496.670426 |
9 | YTSEDCAM | 3 | 0 | False | True | Carbamidomethyl@C | 6 | 8 | 1 | 2 | 488.672968 |
10 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C;Oxidation@M | 3;1 | 9 | 0 | 2 | 560.717907 |
11 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 3 | 9 | 0 | 2 | 552.720450 |
12 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat... | 3;0;1 | 9 | 0 | 2 | 581.723190 |
13 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C;Acetyl@Protein N-term | 3;0 | 9 | 0 | 2 | 573.725732 |
14 | TSRQPNMLK | 3 | 1 | True | False | Oxidation@M | 7 | 9 | 1 | 2 | 545.787316 |
15 | TSRQPNMLK | 3 | 1 | True | False | 9 | 1 | 2 | 537.789858 | ||
16 | TSRQPNMLK | 3 | 1 | True | False | Acetyl@Protein N-term;Oxidation@M | 0;7 | 9 | 1 | 2 | 566.792598 |
17 | TSRQPNMLK | 3 | 1 | True | False | Acetyl@Protein N-term | 0 | 9 | 1 | 2 | 558.795141 |
18 | QPNMLKIHGF | 2 | 1 | False | True | Oxidation@M | 4 | 10 | 1 | 2 | 600.813333 |
19 | QPNMLKIHGF | 2 | 1 | False | True | Oxidation@M | 4 | 10 | 1 | 3 | 400.877981 |
20 | QPNMLKIHGF | 2 | 1 | False | True | 10 | 1 | 2 | 592.815876 | ||
21 | QPNMLKIHGFK | 3 | 1 | False | False | Oxidation@M | 4 | 11 | 1 | 2 | 664.860815 |
22 | QPNMLKIHGFK | 3 | 1 | False | False | Oxidation@M | 4 | 11 | 1 | 3 | 443.576302 |
23 | QPNMLKIHGFK | 3 | 1 | False | False | 11 | 1 | 2 | 656.863357 | ||
24 | QPNMLKIHGFK | 3 | 1 | False | False | 11 | 1 | 3 | 438.244664 | ||
25 | RQPNMLKIHGF | 2 | 2 | True | True | Oxidation@M | 5 | 11 | 1 | 2 | 678.863889 |
26 | RQPNMLKIHGF | 2 | 2 | True | True | Oxidation@M | 5 | 11 | 1 | 3 | 452.911685 |
27 | RQPNMLKIHGF | 2 | 2 | True | True | 11 | 1 | 2 | 670.866431 | ||
28 | RQPNMLKIHGF | 2 | 2 | True | True | 11 | 1 | 3 | 447.580046 | ||
29 | RQPNMLKIHGF | 2 | 2 | True | True | Acetyl@Protein N-term;Oxidation@M | 0;5 | 11 | 1 | 2 | 699.869171 |
30 | RQPNMLKIHGF | 2 | 2 | True | True | Acetyl@Protein N-term;Oxidation@M | 0;5 | 11 | 1 | 3 | 466.915206 |
31 | RQPNMLKIHGF | 2 | 2 | True | True | Acetyl@Protein N-term | 0 | 11 | 1 | 2 | 691.871714 |
32 | RQPNMLKIHGF | 2 | 2 | True | True | Acetyl@Protein N-term | 0 | 11 | 1 | 3 | 461.583568 |
33 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 2 | 678.863889 |
34 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 3 | 452.911685 |
35 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 2 | 670.866431 | ||
36 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 3 | 447.580046 | ||
37 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein N-term;Oxidation@M | 0;7 | 11 | 0 | 2 | 699.869171 |
38 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein N-term;Oxidation@M | 0;7 | 11 | 0 | 3 | 466.915206 |
39 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein N-term | 0 | 11 | 0 | 2 | 691.871714 |
40 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein N-term | 0 | 11 | 0 | 3 | 461.583568 |
41 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 2 | 772.903742 |
42 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 3 | 515.604920 |
43 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 2 | 764.906285 | ||
44 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 3 | 510.273282 | ||
45 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Oxidation@M | 7 | 14 | 1 | 2 | 836.951224 |
46 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Oxidation@M | 7 | 14 | 1 | 3 | 558.303241 |
47 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Oxidation@M | 7 | 14 | 1 | 4 | 418.979250 |
48 | TSRQPNMLKIHGFK | 3 | 2 | True | False | 14 | 1 | 2 | 828.953766 | ||
49 | TSRQPNMLKIHGFK | 3 | 2 | True | False | 14 | 1 | 3 | 552.971603 | ||
50 | TSRQPNMLKIHGFK | 3 | 2 | True | False | 14 | 1 | 4 | 414.980521 | ||
51 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term;Oxidation@M | 0;7 | 14 | 1 | 2 | 857.956506 |
52 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term;Oxidation@M | 0;7 | 14 | 1 | 3 | 572.306763 |
53 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term;Oxidation@M | 0;7 | 14 | 1 | 4 | 429.481891 |
54 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term | 0 | 14 | 1 | 2 | 849.959049 |
55 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term | 0 | 14 | 1 | 3 | 566.975125 |
56 | TSRQPNMLKIHGFK | 3 | 2 | True | False | Acetyl@Protein N-term | 0 | 14 | 1 | 4 | 425.483163 |
After calc_precursor_mz()
, all sequences containing x
are removed because x
’s mass is very large which is out of the range of fasta_lib.min_precursor_mz
and fasta_lib.max_precursor_mz
.
[17]:
from alphabase.constants.aa import AA_ASCII_MASS
(
fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz,
f"mass of 'x' is {AA_ASCII_MASS[ord('x')]}"
)
[17]:
(400.0, 2000.0, "mass of 'x' is 100000000.0")
Calculate fragment m/z#
[18]:
fasta_lib.calc_fragment_mz_df()
fasta_lib.fragment_mz_df
[18]:
b_z1 | y_z1 | |
---|---|---|
0 | 157.108387 | 746.386537 |
1 | 285.166965 | 618.327959 |
2 | 382.219729 | 521.275195 |
3 | 496.262656 | 407.232268 |
4 | 643.298056 | 260.196868 |
... | ... | ... |
556 | 1098.572440 | 601.345658 |
557 | 1211.656504 | 488.261594 |
558 | 1348.715416 | 351.202682 |
559 | 1405.736879 | 294.181218 |
560 | 1552.805293 | 147.112804 |
561 rows × 2 columns
Use frag_start_idx
and frag_stop_idx
in precursor_df to locate the corresponding fragments
[19]:
ith_pep = 5
frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values
fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]
[19]:
b_z1 | y_z1 | |
---|---|---|
31 | 114.091340 | 833.393413 |
32 | 245.131826 | 702.352928 |
33 | 359.174753 | 588.310000 |
34 | 456.227517 | 491.257237 |
35 | 584.286094 | 363.198659 |
36 | 740.387205 | 207.097548 |
37 | 827.419234 | 120.065520 |
[ ]: