Tutorial: Spectral Libraries¶
This notebook introduces functionalities for spectral libraries to developers.
The Base Library Class¶
alphabase.spectral_library.base.SpecLibBase is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via alphabase.protein.fasta.SpecLibFasta.
SpecLibFasta¶
Almost all DataFrame functionalities to process proteins and peptides have been integrated into alphabase.protein.fasta.SpecLibFasta.
[1]:
from alphabase.protein.fasta import SpecLibFasta
fasta_lib = SpecLibFasta(
charged_frag_types=['b_z1','y_z1'],
protease='trypsin',
fix_mods=['Carbamidomethyl@C'],
var_mods=['Acetyl@Protein_N-term','Oxidation@M'],
decoy=None,
)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Start from fasta/proteins¶
The SpecLibFasta will do following for us:
Load fasta files into a protein_dict
Digest proteins into peptide sequences
Append decoy peptide sequences if self.decoy is not None
Add fixed and variable modifications
[Add special modifications]
[Add peptide labeling]
Add charge states to peptides
Load fasta files into a protein_dict¶
[2]:
# from alphabase.protein.fasta import load_all_proteins
# protein_dict = load_all_proteins(fasta_files)
# For example, the protein_dict is:
protein_dict = {
'yy': {
'protein_id': 'yy',
'full_name': 'yy_yy',
'gene_name': 'y_y',
'sequence': 'FGHIKLMNPQR'
},
'xx': {
'protein_id': 'xx',
'full_name': 'xx_xx',
'gene_name': 'x_x',
'sequence': 'MACDESTYKXKFGHIKLMNPQRST'
},
}
Digest proteins into peptide sequences¶
[3]:
fasta_lib.get_peptides_from_protein_dict(protein_dict)
fasta_lib.precursor_df
[3]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | |
|---|---|---|---|---|---|---|---|---|
| 0 | XKFGHIK | 1 | 1 | False | False | 7 | ||
| 1 | LMNPQRST | 1 | 1 | False | True | 8 | ||
| 2 | ACDESTYK | 1 | 0 | True | False | 8 | ||
| 3 | MACDESTYK | 1 | 0 | True | False | 9 | ||
| 4 | ACDESTYKXK | 1 | 1 | True | False | 10 | ||
| 5 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | ||
| 6 | MACDESTYKXK | 1 | 1 | True | False | 11 | ||
| 7 | XKFGHIKLMNPQR | 1 | 2 | False | False | 13 | ||
| 8 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | ||
| 9 | ACDESTYKXKFGHIK | 1 | 2 | True | False | 15 | ||
| 10 | MACDESTYKXKFGHIK | 1 | 2 | True | False | 16 |
[4]:
fasta_lib.protein_df
[4]:
| protein_id | full_name | gene_name | sequence | |
|---|---|---|---|---|
| 0 | yy | yy_yy | y_y | FGHIKLMNPQR |
| 1 | xx | xx_xx | x_x | MACDESTYKXKFGHIKLMNPQRST |
Append decoy sequences¶
This depends on self.decoy:str, its value can be
protein_reverse: Reverse on target protein sequencespseudo_reverse: Pseudo-reverse on target peptide sequencesdiann: DiaNN-like decoyNone: no decoy.
Let’s take diann as an example:
[5]:
fasta_lib.decoy = 'diann'
fasta_lib.append_decoy_sequence()
fasta_lib.precursor_df.sample(5, random_state=0)
[5]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
|---|---|---|---|---|---|---|---|---|---|
| 20 | MACDESTYKXKFGHIK | 1 | 2 | True | False | 16 | 0 | ||
| 10 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | ||
| 14 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | ||
| 13 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | ||
| 1 | XLFGHVK | 1 | 1 | False | False | 7 | 1 |
Add modifications¶
add_modifications() will add fixed and variable modifications.
[6]:
fasta_lib.add_modifications()
fasta_lib.precursor_df.sample(5, random_state=0)
[6]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
|---|---|---|---|---|---|---|---|---|---|
| 35 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 1 |
| 34 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | ||
| 41 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 |
| 27 | MACDESTYKXK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 11 | 0 |
| 11 | MACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 9 | 0 |
Add special modifications¶
Special modifications here refer to some PTMs we want to have more controls on:
We only needs peptides without unmodified forms
GlyGly@Kcannot occur on peptide C-term because trypsin cannot cleave Lys withGlyGlyFor some special modifications like
Phospho@SandHexNAc@S, we would like to limit the number of peptidome forms to control the memory usage.
[7]:
fasta_lib.special_mods = ['GlyGly@K']
fasta_lib.special_mods_cannot_modify_pep_c_term = True
fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms
fasta_lib.max_special_mod_num = 1 # limit the number of
fasta_lib.add_special_modifications()
fasta_lib.precursor_df.sample(10, random_state=0)
[7]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | |
|---|---|---|---|---|---|---|---|---|---|
| 45 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | 0;3;9 | 16 | 0 |
| 33 | ASDESTYKXKFGHVK | 1 | 2 | True | False | Acetyl@Protein_N-term;GlyGly@K | 0;8 | 15 | 1 |
| 40 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Oxidation@M;Carbamidomethyl@C;GlyGly@K | 1;3;11 | 16 | 0 |
| 26 | FGHIKLMNPQRST | 1 | 2 | False | True | GlyGly@K | 5 | 13 | 0 |
| 11 | MACDESTYKXK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3;9 | 11 | 0 |
| 2 | ACDESTYKXK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | 0;2;8 | 10 | 0 |
| 32 | ASDESTYKXKFGHVK | 1 | 2 | True | False | GlyGly@K | 10 | 15 | 1 |
| 43 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3;9 | 16 | 0 |
| 46 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K | 0;3;11 | 16 | 0 |
| 30 | XKFGHIKLMNPQR | 1 | 2 | False | False | GlyGly@K | 7 | 13 | 0 |
Add peptide labeling¶
For example Dimethyl:
[8]:
fasta_lib.labeling_channels = {
0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],
4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],
}
fasta_lib.add_peptide_labeling()
fasta_lib.precursor_df.sample(10, random_state=0)
[8]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | labeling_channel | |
|---|---|---|---|---|---|---|---|---|---|---|
| 85 | XKFGHIKLMNPQR | 1 | 2 | False | False | GlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H... | 7;0;2;7 | 13 | 0 | 4 |
| 10 | MACDESTYKXK | 1 | 1 | True | False | Carbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term... | 3;9;0;9;11 | 11 | 0 | 0 |
| 75 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K | 0;5;5 | 11 | 1 | 4 |
| 2 | ACDESTYKXK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | 0;2;8;8;10 | 10 | 0 | 0 |
| 24 | XLFGHIKLMNPNR | 1 | 2 | False | False | GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K | 7;0;7 | 13 | 1 | 0 |
| 101 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | 0;3;11;9;11;16 | 16 | 0 | 4 |
| 109 | MLCDESTYKXKFGHVK | 1 | 2 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | 0;3;11;9;11;16 | 16 | 1 | 4 |
| 7 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim... | 0;7;5;5 | 11 | 0 | 0 |
| 16 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... | 0;3;9;9;11 | 11 | 1 | 0 |
| 91 | ACDESTYKXKFGHIK | 1 | 2 | True | False | Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_... | 2;10;0;8;10;15 | 15 | 0 | 4 |
Add charge states¶
[9]:
fasta_lib.add_charge()
fasta_lib.precursor_df.sample(5, random_state=0)
[9]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | labeling_channel | charge | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 122 | MACDESTYKXKFGHIK | 1 | 2 | True | False | Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | 1;3;11;0;9;11;16 | 16 | 0 | 0 | 4 |
| 66 | FLHIKLMNPQRTT | 1 | 2 | False | True | GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K | 5;0;5 | 13 | 1 | 0 | 2 |
| 142 | MLCDESTYKXKFGHVK | 1 | 2 | True | False | Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | 1;3;9;0;9;11;16 | 16 | 1 | 0 | 3 |
| 246 | XKFGHIKLMNPQR | 1 | 2 | False | False | Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term... | 9;2;0;2;7 | 13 | 0 | 4 | 2 |
| 146 | MLCDESTYKXKFGHVK | 1 | 2 | True | False | Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... | 1;3;11;0;9;11;16 | 16 | 1 | 0 | 4 |
import_and_process_protein_dict() combines all steps¶
Or import_and_process_fasta() for fasta files.
[10]:
fasta_lib.special_mods = []
fasta_lib.labeling_channels = None
fasta_lib.import_and_process_protein_dict(protein_dict)
fasta_lib.protein_df
[10]:
| protein_id | full_name | gene_name | sequence | |
|---|---|---|---|---|
| 0 | yy | yy_yy | y_y | FGHIKLMNPQR |
| 1 | xx | xx_xx | x_x | MACDESTYKXKFGHIKLMNPQRST |
[11]:
fasta_lib.precursor_df
[11]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | charge | precursor_mz | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LMNPQRST | 1 | 1 | False | True | Oxidation@M | 2 | 8 | 0 | 2 | 481.739834 |
| 1 | LMNPQRST | 1 | 1 | False | True | 8 | 0 | 2 | 473.742377 | ||
| 2 | ACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 2 | 8 | 0 | 2 | 487.200207 |
| 3 | ACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;2 | 8 | 0 | 2 | 508.205490 |
| 4 | LLNPQRTT | 1 | 1 | False | True | 8 | 1 | 2 | 471.771991 | ||
| 5 | ASDESTSK | 1 | 0 | True | False | 8 | 1 | 2 | 412.685247 | ||
| 6 | ASDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term | 0 | 8 | 1 | 2 | 433.690529 |
| 7 | MACDESTYK | 1 | 0 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 9 | 0 | 2 | 560.717907 |
| 8 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 3 | 9 | 0 | 2 | 552.720450 |
| 9 | MACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 9 | 0 | 2 | 581.723190 |
| 10 | MACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 9 | 0 | 2 | 573.725732 |
| 11 | MLCDESTSK | 1 | 0 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 9 | 1 | 2 | 543.725732 |
| 12 | MLCDESTSK | 1 | 0 | True | False | Carbamidomethyl@C | 3 | 9 | 1 | 2 | 535.728275 |
| 13 | MLCDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 9 | 1 | 2 | 564.731015 |
| 14 | MLCDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 9 | 1 | 2 | 556.733557 |
| 15 | ASDESTYKVK | 1 | 1 | True | False | 10 | 1 | 2 | 564.282586 | ||
| 16 | ASDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term | 0 | 10 | 1 | 2 | 585.287868 |
| 17 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 2 | 678.863889 |
| 18 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 3 | 452.911685 |
| 19 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 2 | 670.866431 | ||
| 20 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 3 | 447.580046 | ||
| 21 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 0 | 2 | 699.869171 |
| 22 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 0 | 3 | 466.915206 |
| 23 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 0 | 2 | 691.871714 |
| 24 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 0 | 3 | 461.583568 |
| 25 | MLCDESTYKVK | 1 | 1 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 11 | 1 | 2 | 695.323071 |
| 26 | MLCDESTYKVK | 1 | 1 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 11 | 1 | 3 | 463.884473 |
| 27 | MLCDESTYKVK | 1 | 1 | True | False | Carbamidomethyl@C | 3 | 11 | 1 | 2 | 687.325613 |
| 28 | MLCDESTYKVK | 1 | 1 | True | False | Carbamidomethyl@C | 3 | 11 | 1 | 3 | 458.552834 |
| 29 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 11 | 1 | 2 | 716.328353 |
| 30 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 11 | 1 | 3 | 477.887994 |
| 31 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 11 | 1 | 2 | 708.330896 |
| 32 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 11 | 1 | 3 | 472.556356 |
| 33 | FLHIKLMNPNR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 1 | 2 | 699.887364 |
| 34 | FLHIKLMNPNR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 1 | 3 | 466.927335 |
| 35 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | 2 | 691.889907 | ||
| 36 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | 3 | 461.595697 | ||
| 37 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 1 | 2 | 720.892646 |
| 38 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 1 | 3 | 480.930856 |
| 39 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 1 | 2 | 712.895189 |
| 40 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 1 | 3 | 475.599218 |
| 41 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 2 | 807.942867 |
| 42 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 3 | 538.964337 |
| 43 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 4 | 404.475072 |
| 44 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 2 | 799.945410 | ||
| 45 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 3 | 533.632699 | ||
| 46 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 4 | 400.476343 | ||
| 47 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 2 | 772.903742 |
| 48 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 3 | 515.604920 |
| 49 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 2 | 764.906285 | ||
| 50 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 3 | 510.273282 |
Start from peptides instead of proteins¶
The modularity design of SpecLibFasta allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.
For example, we have a list of sequences, and we what to add modifications using SpecLibFasta functionalities:
[12]:
import pandas as pd
pep_lib = SpecLibFasta(
charged_frag_types=['b_z1','y_z1'],
fix_mods=['Carbamidomethyl@C'],
var_mods=['Acetyl@Protein_N-term','Oxidation@M'],
labeling_channels={
0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],
4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],
},
decoy=None,
)
pep_lib.precursor_df = pd.DataFrame({
'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']
})
pep_lib.process_from_naked_peptide_seqs()
pep_lib.precursor_df
[12]:
| sequence | nAA | is_prot_nterm | is_prot_cterm | mods | mod_sites | labeling_channel | charge | precursor_mz | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | OPQRST | 6 | False | False | Dimethyl@Any_N-term | 0 | 0 | 2 | 427.248152 |
| 1 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl@Any_N-term;Dimethyl@K | 6;0;4 | 0 | 2 | 470.786056 |
| 2 | HIJKLMN | 7 | False | False | Dimethyl@Any_N-term;Dimethyl@K | 0;4 | 0 | 2 | 462.788599 |
| 3 | OPQRST | 6 | False | False | Dimethyl:2H(4)@Any_N-term | 0 | 4 | 2 | 429.260705 |
| 4 | HIJKLMN | 7 | False | False | Oxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl... | 6;0;4 | 4 | 2 | 474.811163 |
| 5 | HIJKLMN | 7 | False | False | Dimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K | 0;4 | 4 | 2 | 466.813706 |
Calculate masses¶
Calculate precursor m/z¶
[13]:
fasta_lib.calc_precursor_mz()
# fasta_lib.calc_precursor_isotope()
fasta_lib.precursor_df
[13]:
| sequence | protein_idxes | miss_cleavage | is_prot_nterm | is_prot_cterm | mods | mod_sites | nAA | decoy | charge | precursor_mz | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LMNPQRST | 1 | 1 | False | True | Oxidation@M | 2 | 8 | 0 | 2 | 481.739834 |
| 1 | LMNPQRST | 1 | 1 | False | True | 8 | 0 | 2 | 473.742377 | ||
| 2 | ACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 2 | 8 | 0 | 2 | 487.200207 |
| 3 | ACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;2 | 8 | 0 | 2 | 508.205490 |
| 4 | LLNPQRTT | 1 | 1 | False | True | 8 | 1 | 2 | 471.771991 | ||
| 5 | ASDESTSK | 1 | 0 | True | False | 8 | 1 | 2 | 412.685247 | ||
| 6 | ASDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term | 0 | 8 | 1 | 2 | 433.690529 |
| 7 | MACDESTYK | 1 | 0 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 9 | 0 | 2 | 560.717907 |
| 8 | MACDESTYK | 1 | 0 | True | False | Carbamidomethyl@C | 3 | 9 | 0 | 2 | 552.720450 |
| 9 | MACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 9 | 0 | 2 | 581.723190 |
| 10 | MACDESTYK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 9 | 0 | 2 | 573.725732 |
| 11 | MLCDESTSK | 1 | 0 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 9 | 1 | 2 | 543.725732 |
| 12 | MLCDESTSK | 1 | 0 | True | False | Carbamidomethyl@C | 3 | 9 | 1 | 2 | 535.728275 |
| 13 | MLCDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 9 | 1 | 2 | 564.731015 |
| 14 | MLCDESTSK | 1 | 0 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 9 | 1 | 2 | 556.733557 |
| 15 | ASDESTYKVK | 1 | 1 | True | False | 10 | 1 | 2 | 564.282586 | ||
| 16 | ASDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term | 0 | 10 | 1 | 2 | 585.287868 |
| 17 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 2 | 678.863889 |
| 18 | FGHIKLMNPQR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 0 | 3 | 452.911685 |
| 19 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 2 | 670.866431 | ||
| 20 | FGHIKLMNPQR | 0;1 | 1 | True | True | 11 | 0 | 3 | 447.580046 | ||
| 21 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 0 | 2 | 699.869171 |
| 22 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 0 | 3 | 466.915206 |
| 23 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 0 | 2 | 691.871714 |
| 24 | FGHIKLMNPQR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 0 | 3 | 461.583568 |
| 25 | MLCDESTYKVK | 1 | 1 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 11 | 1 | 2 | 695.323071 |
| 26 | MLCDESTYKVK | 1 | 1 | True | False | Oxidation@M;Carbamidomethyl@C | 1;3 | 11 | 1 | 3 | 463.884473 |
| 27 | MLCDESTYKVK | 1 | 1 | True | False | Carbamidomethyl@C | 3 | 11 | 1 | 2 | 687.325613 |
| 28 | MLCDESTYKVK | 1 | 1 | True | False | Carbamidomethyl@C | 3 | 11 | 1 | 3 | 458.552834 |
| 29 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 11 | 1 | 2 | 716.328353 |
| 30 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... | 0;1;3 | 11 | 1 | 3 | 477.887994 |
| 31 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 11 | 1 | 2 | 708.330896 |
| 32 | MLCDESTYKVK | 1 | 1 | True | False | Acetyl@Protein_N-term;Carbamidomethyl@C | 0;3 | 11 | 1 | 3 | 472.556356 |
| 33 | FLHIKLMNPNR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 1 | 2 | 699.887364 |
| 34 | FLHIKLMNPNR | 0;1 | 1 | True | True | Oxidation@M | 7 | 11 | 1 | 3 | 466.927335 |
| 35 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | 2 | 691.889907 | ||
| 36 | FLHIKLMNPNR | 0;1 | 1 | True | True | 11 | 1 | 3 | 461.595697 | ||
| 37 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 1 | 2 | 720.892646 |
| 38 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term;Oxidation@M | 0;7 | 11 | 1 | 3 | 480.930856 |
| 39 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 1 | 2 | 712.895189 |
| 40 | FLHIKLMNPNR | 0;1 | 1 | True | True | Acetyl@Protein_N-term | 0 | 11 | 1 | 3 | 475.599218 |
| 41 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 2 | 807.942867 |
| 42 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 3 | 538.964337 |
| 43 | FLHIKLMNPQRTT | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 1 | 4 | 404.475072 |
| 44 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 2 | 799.945410 | ||
| 45 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 3 | 533.632699 | ||
| 46 | FLHIKLMNPQRTT | 1 | 2 | False | True | 13 | 1 | 4 | 400.476343 | ||
| 47 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 2 | 772.903742 |
| 48 | FGHIKLMNPQRST | 1 | 2 | False | True | Oxidation@M | 7 | 13 | 0 | 3 | 515.604920 |
| 49 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 2 | 764.906285 | ||
| 50 | FGHIKLMNPQRST | 1 | 2 | False | True | 13 | 0 | 3 | 510.273282 |
After calc_precursor_mz(), all sequences containing x are removed because x’s mass is very large which is out of the range of fasta_lib.min_precursor_mz and fasta_lib.max_precursor_mz.
[14]:
from alphabase.constants.aa import AA_ASCII_MASS
(
fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz,
f"mass of 'x' is {AA_ASCII_MASS[ord('x')]}"
)
[14]:
(400.0, 2000.0, "mass of 'x' is 100000000.0")
Calculate fragment m/z¶
[15]:
fasta_lib.calc_fragment_mz_df()
fasta_lib.fragment_mz_df
[15]:
| b_z1 | y_z1 | |
|---|---|---|
| 0 | 114.091339 | 849.388306 |
| 1 | 261.126740 | 702.352905 |
| 2 | 375.169678 | 588.309998 |
| 3 | 472.222443 | 491.257233 |
| 4 | 600.281006 | 363.198669 |
| ... | ... | ... |
| 486 | 941.502563 | 588.309998 |
| 487 | 1038.555298 | 491.257233 |
| 488 | 1166.613892 | 363.198669 |
| 489 | 1322.714966 | 207.097549 |
| 490 | 1409.747070 | 120.065521 |
491 rows × 2 columns
Use frag_start_idx and frag_stop_idx in precursor_df to locate the corresponding fragments
[16]:
ith_pep = 5
frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values
fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]
[16]:
| b_z1 | y_z1 | |
|---|---|---|
| 35 | 72.044388 | 753.326111 |
| 36 | 159.076416 | 666.294067 |
| 37 | 274.103363 | 551.267151 |
| 38 | 403.145966 | 422.224548 |
| 39 | 490.177979 | 335.192505 |
| 40 | 591.225647 | 234.144836 |
| 41 | 678.257690 | 147.112808 |
[ ]: