Tutorial: Spectral Libraries

This notebook introduces functionalities for spectral libraries to developers.

The Base Library Class

alphabase.spectral_library.base.SpecLibBase is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via alphabase.protein.fasta.SpecLibFasta.

SpecLibFasta

Almost all DataFrame functionalities to process proteins and peptides have been integrated into alphabase.protein.fasta.SpecLibFasta.

[1]:
from alphabase.protein.fasta import SpecLibFasta

fasta_lib = SpecLibFasta(
    charged_frag_types=['b_z1','y_z1'],
    protease='trypsin',
    fix_mods=['Carbamidomethyl@C'],
    var_mods=['Acetyl@Protein_N-term','Oxidation@M'],
    decoy=None,
)
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Start from fasta/proteins

The SpecLibFasta will do following for us:

  • Load fasta files into a protein_dict

  • Digest proteins into peptide sequences

  • Append decoy peptide sequences if self.decoy is not None

  • Add fixed and variable modifications

  • [Add special modifications]

  • [Add peptide labeling]

  • Add charge states to peptides

Load fasta files into a protein_dict

[2]:
# from alphabase.protein.fasta import load_all_proteins
# protein_dict = load_all_proteins(fasta_files)

# For example, the protein_dict is:
protein_dict = {
    'yy': {
        'protein_id': 'yy',
        'full_name': 'yy_yy',
        'gene_name': 'y_y',
        'sequence': 'FGHIKLMNPQR'
    },
    'xx': {
        'protein_id': 'xx',
        'full_name': 'xx_xx',
        'gene_name': 'x_x',
        'sequence': 'MACDESTYKXKFGHIKLMNPQRST'
    },
}

Digest proteins into peptide sequences

[3]:
fasta_lib.get_peptides_from_protein_dict(protein_dict)
fasta_lib.precursor_df
[3]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA
0 XKFGHIK 1 1 False False 7
1 LMNPQRST 1 1 False True 8
2 ACDESTYK 1 0 True False 8
3 MACDESTYK 1 0 True False 9
4 ACDESTYKXK 1 1 True False 10
5 FGHIKLMNPQR 0;1 1 True True 11
6 MACDESTYKXK 1 1 True False 11
7 XKFGHIKLMNPQR 1 2 False False 13
8 FGHIKLMNPQRST 1 2 False True 13
9 ACDESTYKXKFGHIK 1 2 True False 15
10 MACDESTYKXKFGHIK 1 2 True False 16
[4]:
fasta_lib.protein_df
[4]:
protein_id full_name gene_name sequence
0 yy yy_yy y_y FGHIKLMNPQR
1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST

Append decoy sequences

This depends on self.decoy:str, its value can be

  • protein_reverse: Reverse on target protein sequences

  • pseudo_reverse: Pseudo-reverse on target peptide sequences

  • diann: DiaNN-like decoy

  • None: no decoy.

Let’s take diann as an example:

[5]:
fasta_lib.decoy = 'diann'
fasta_lib.append_decoy_sequence()
fasta_lib.precursor_df.sample(5, random_state=0)
[5]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy
20 MACDESTYKXKFGHIK 1 2 True False 16 0
10 FGHIKLMNPQR 0;1 1 True True 11 0
14 FLHIKLMNPQRTT 1 2 False True 13 1
13 FLHIKLMNPNR 0;1 1 True True 11 1
1 XLFGHVK 1 1 False False 7 1

Add modifications

add_modifications() will add fixed and variable modifications.

[6]:
fasta_lib.add_modifications()
fasta_lib.precursor_df.sample(5, random_state=0)
[6]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy
35 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 1
34 FLHIKLMNPNR 0;1 1 True True 11 1
41 FGHIKLMNPQRST 1 2 False True Oxidation@M 7 13 0
27 MACDESTYKXK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 0
11 MACDESTYK 1 0 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0

Add special modifications

Special modifications here refer to some PTMs we want to have more controls on:

  1. We only needs peptides without unmodified forms

  2. GlyGly@K cannot occur on peptide C-term because trypsin cannot cleave Lys with GlyGly

  3. For some special modifications like Phospho@S and HexNAc@S, we would like to limit the number of peptidome forms to control the memory usage.

[7]:
fasta_lib.special_mods = ['GlyGly@K']
fasta_lib.special_mods_cannot_modify_pep_c_term = True
fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms
fasta_lib.max_special_mod_num = 1 # limit the number of
fasta_lib.add_special_modifications()
fasta_lib.precursor_df.sample(10, random_state=0)
[7]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy
45 MACDESTYKXKFGHIK 1 2 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K 0;3;9 16 0
33 ASDESTYKXKFGHVK 1 2 True False Acetyl@Protein_N-term;GlyGly@K 0;8 15 1
40 MACDESTYKXKFGHIK 1 2 True False Oxidation@M;Carbamidomethyl@C;GlyGly@K 1;3;11 16 0
26 FGHIKLMNPQRST 1 2 False True GlyGly@K 5 13 0
11 MACDESTYKXK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3;9 11 0
2 ACDESTYKXK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K 0;2;8 10 0
32 ASDESTYKXKFGHVK 1 2 True False GlyGly@K 10 15 1
43 MACDESTYKXKFGHIK 1 2 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3;9 16 0
46 MACDESTYKXKFGHIK 1 2 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly@K 0;3;11 16 0
30 XKFGHIKLMNPQR 1 2 False False GlyGly@K 7 13 0

Add peptide labeling

For example Dimethyl:

[8]:
fasta_lib.labeling_channels = {
    0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],
    4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],
}
fasta_lib.add_peptide_labeling()
fasta_lib.precursor_df.sample(10, random_state=0)
[8]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy labeling_channel
85 XKFGHIKLMNPQR 1 2 False False GlyGly@K;Dimethyl:2H(4)@Any_N-term;Dimethyl:2H... 7;0;2;7 13 0 4
10 MACDESTYKXK 1 1 True False Carbamidomethyl@C;GlyGly@K;Dimethyl@Any_N-term... 3;9;0;9;11 11 0 0
75 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;GlyGly@K;Dimethyl:2H(4)@K 0;5;5 11 1 4
2 ACDESTYKXK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... 0;2;8;8;10 10 0 0
24 XLFGHIKLMNPNR 1 2 False False GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K 7;0;7 13 1 0
101 MACDESTYKXKFGHIK 1 2 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... 0;3;11;9;11;16 16 0 4
109 MLCDESTYKXKFGHVK 1 2 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... 0;3;11;9;11;16 16 1 4
7 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M;GlyGly@K;Dim... 0;7;5;5 11 0 0
16 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C;GlyGly... 0;3;9;9;11 11 1 0
91 ACDESTYKXKFGHIK 1 2 True False Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any_... 2;10;0;8;10;15 15 0 4

Add charge states

[9]:
fasta_lib.add_charge()
fasta_lib.precursor_df.sample(5, random_state=0)
[9]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy labeling_channel charge
122 MACDESTYKXKFGHIK 1 2 True False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... 1;3;11;0;9;11;16 16 0 0 4
66 FLHIKLMNPQRTT 1 2 False True GlyGly@K;Dimethyl@Any_N-term;Dimethyl@K 5;0;5 13 1 0 2
142 MLCDESTYKXKFGHVK 1 2 True False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... 1;3;9;0;9;11;16 16 1 0 3
246 XKFGHIKLMNPQR 1 2 False False Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any_N-term... 9;2;0;2;7 13 0 4 2
146 MLCDESTYKXKFGHVK 1 2 True False Oxidation@M;Carbamidomethyl@C;GlyGly@K;Dimethy... 1;3;11;0;9;11;16 16 1 0 4

import_and_process_protein_dict() combines all steps

Or import_and_process_fasta() for fasta files.

[10]:
fasta_lib.special_mods = []
fasta_lib.labeling_channels = None
fasta_lib.import_and_process_protein_dict(protein_dict)
fasta_lib.protein_df
[10]:
protein_id full_name gene_name sequence
0 yy yy_yy y_y FGHIKLMNPQR
1 xx xx_xx x_x MACDESTYKXKFGHIKLMNPQRST
[11]:
fasta_lib.precursor_df
[11]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy charge precursor_mz
0 LMNPQRST 1 1 False True Oxidation@M 2 8 0 2 481.739834
1 LMNPQRST 1 1 False True 8 0 2 473.742377
2 ACDESTYK 1 0 True False Carbamidomethyl@C 2 8 0 2 487.200207
3 ACDESTYK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 2 508.205490
4 LLNPQRTT 1 1 False True 8 1 2 471.771991
5 ASDESTSK 1 0 True False 8 1 2 412.685247
6 ASDESTSK 1 0 True False Acetyl@Protein_N-term 0 8 1 2 433.690529
7 MACDESTYK 1 0 True False Oxidation@M;Carbamidomethyl@C 1;3 9 0 2 560.717907
8 MACDESTYK 1 0 True False Carbamidomethyl@C 3 9 0 2 552.720450
9 MACDESTYK 1 0 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 2 581.723190
10 MACDESTYK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 2 573.725732
11 MLCDESTSK 1 0 True False Oxidation@M;Carbamidomethyl@C 1;3 9 1 2 543.725732
12 MLCDESTSK 1 0 True False Carbamidomethyl@C 3 9 1 2 535.728275
13 MLCDESTSK 1 0 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 2 564.731015
14 MLCDESTSK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 2 556.733557
15 ASDESTYKVK 1 1 True False 10 1 2 564.282586
16 ASDESTYKVK 1 1 True False Acetyl@Protein_N-term 0 10 1 2 585.287868
17 FGHIKLMNPQR 0;1 1 True True Oxidation@M 7 11 0 2 678.863889
18 FGHIKLMNPQR 0;1 1 True True Oxidation@M 7 11 0 3 452.911685
19 FGHIKLMNPQR 0;1 1 True True 11 0 2 670.866431
20 FGHIKLMNPQR 0;1 1 True True 11 0 3 447.580046
21 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 2 699.869171
22 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 3 466.915206
23 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term 0 11 0 2 691.871714
24 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term 0 11 0 3 461.583568
25 MLCDESTYKVK 1 1 True False Oxidation@M;Carbamidomethyl@C 1;3 11 1 2 695.323071
26 MLCDESTYKVK 1 1 True False Oxidation@M;Carbamidomethyl@C 1;3 11 1 3 463.884473
27 MLCDESTYKVK 1 1 True False Carbamidomethyl@C 3 11 1 2 687.325613
28 MLCDESTYKVK 1 1 True False Carbamidomethyl@C 3 11 1 3 458.552834
29 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 2 716.328353
30 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 3 477.887994
31 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 2 708.330896
32 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 3 472.556356
33 FLHIKLMNPNR 0;1 1 True True Oxidation@M 7 11 1 2 699.887364
34 FLHIKLMNPNR 0;1 1 True True Oxidation@M 7 11 1 3 466.927335
35 FLHIKLMNPNR 0;1 1 True True 11 1 2 691.889907
36 FLHIKLMNPNR 0;1 1 True True 11 1 3 461.595697
37 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 2 720.892646
38 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 3 480.930856
39 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term 0 11 1 2 712.895189
40 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term 0 11 1 3 475.599218
41 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 2 807.942867
42 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 3 538.964337
43 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 4 404.475072
44 FLHIKLMNPQRTT 1 2 False True 13 1 2 799.945410
45 FLHIKLMNPQRTT 1 2 False True 13 1 3 533.632699
46 FLHIKLMNPQRTT 1 2 False True 13 1 4 400.476343
47 FGHIKLMNPQRST 1 2 False True Oxidation@M 7 13 0 2 772.903742
48 FGHIKLMNPQRST 1 2 False True Oxidation@M 7 13 0 3 515.604920
49 FGHIKLMNPQRST 1 2 False True 13 0 2 764.906285
50 FGHIKLMNPQRST 1 2 False True 13 0 3 510.273282

Start from peptides instead of proteins

The modularity design of SpecLibFasta allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.

For example, we have a list of sequences, and we what to add modifications using SpecLibFasta functionalities:

[12]:
import pandas as pd
pep_lib = SpecLibFasta(
    charged_frag_types=['b_z1','y_z1'],
    fix_mods=['Carbamidomethyl@C'],
    var_mods=['Acetyl@Protein_N-term','Oxidation@M'],
    labeling_channels={
        0: ['Dimethyl@K', 'Dimethyl@Any_N-term'],
        4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any_N-term'],
    },
    decoy=None,
)

pep_lib.precursor_df = pd.DataFrame({
    'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']
})
pep_lib.process_from_naked_peptide_seqs()
pep_lib.precursor_df
[12]:
sequence nAA is_prot_nterm is_prot_cterm mods mod_sites labeling_channel charge precursor_mz
0 OPQRST 6 False False Dimethyl@Any_N-term 0 0 2 427.248152
1 HIJKLMN 7 False False Oxidation@M;Dimethyl@Any_N-term;Dimethyl@K 6;0;4 0 2 470.786056
2 HIJKLMN 7 False False Dimethyl@Any_N-term;Dimethyl@K 0;4 0 2 462.788599
3 OPQRST 6 False False Dimethyl:2H(4)@Any_N-term 0 4 2 429.260705
4 HIJKLMN 7 False False Oxidation@M;Dimethyl:2H(4)@Any_N-term;Dimethyl... 6;0;4 4 2 474.811163
5 HIJKLMN 7 False False Dimethyl:2H(4)@Any_N-term;Dimethyl:2H(4)@K 0;4 4 2 466.813706

Calculate masses

Calculate precursor m/z

[13]:
fasta_lib.calc_precursor_mz()
# fasta_lib.calc_precursor_isotope()
fasta_lib.precursor_df
[13]:
sequence protein_idxes miss_cleavage is_prot_nterm is_prot_cterm mods mod_sites nAA decoy charge precursor_mz
0 LMNPQRST 1 1 False True Oxidation@M 2 8 0 2 481.739834
1 LMNPQRST 1 1 False True 8 0 2 473.742377
2 ACDESTYK 1 0 True False Carbamidomethyl@C 2 8 0 2 487.200207
3 ACDESTYK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;2 8 0 2 508.205490
4 LLNPQRTT 1 1 False True 8 1 2 471.771991
5 ASDESTSK 1 0 True False 8 1 2 412.685247
6 ASDESTSK 1 0 True False Acetyl@Protein_N-term 0 8 1 2 433.690529
7 MACDESTYK 1 0 True False Oxidation@M;Carbamidomethyl@C 1;3 9 0 2 560.717907
8 MACDESTYK 1 0 True False Carbamidomethyl@C 3 9 0 2 552.720450
9 MACDESTYK 1 0 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 0 2 581.723190
10 MACDESTYK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 0 2 573.725732
11 MLCDESTSK 1 0 True False Oxidation@M;Carbamidomethyl@C 1;3 9 1 2 543.725732
12 MLCDESTSK 1 0 True False Carbamidomethyl@C 3 9 1 2 535.728275
13 MLCDESTSK 1 0 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 9 1 2 564.731015
14 MLCDESTSK 1 0 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 9 1 2 556.733557
15 ASDESTYKVK 1 1 True False 10 1 2 564.282586
16 ASDESTYKVK 1 1 True False Acetyl@Protein_N-term 0 10 1 2 585.287868
17 FGHIKLMNPQR 0;1 1 True True Oxidation@M 7 11 0 2 678.863889
18 FGHIKLMNPQR 0;1 1 True True Oxidation@M 7 11 0 3 452.911685
19 FGHIKLMNPQR 0;1 1 True True 11 0 2 670.866431
20 FGHIKLMNPQR 0;1 1 True True 11 0 3 447.580046
21 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 2 699.869171
22 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 0 3 466.915206
23 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term 0 11 0 2 691.871714
24 FGHIKLMNPQR 0;1 1 True True Acetyl@Protein_N-term 0 11 0 3 461.583568
25 MLCDESTYKVK 1 1 True False Oxidation@M;Carbamidomethyl@C 1;3 11 1 2 695.323071
26 MLCDESTYKVK 1 1 True False Oxidation@M;Carbamidomethyl@C 1;3 11 1 3 463.884473
27 MLCDESTYKVK 1 1 True False Carbamidomethyl@C 3 11 1 2 687.325613
28 MLCDESTYKVK 1 1 True False Carbamidomethyl@C 3 11 1 3 458.552834
29 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 2 716.328353
30 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Oxidation@M;Carbamidomet... 0;1;3 11 1 3 477.887994
31 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 2 708.330896
32 MLCDESTYKVK 1 1 True False Acetyl@Protein_N-term;Carbamidomethyl@C 0;3 11 1 3 472.556356
33 FLHIKLMNPNR 0;1 1 True True Oxidation@M 7 11 1 2 699.887364
34 FLHIKLMNPNR 0;1 1 True True Oxidation@M 7 11 1 3 466.927335
35 FLHIKLMNPNR 0;1 1 True True 11 1 2 691.889907
36 FLHIKLMNPNR 0;1 1 True True 11 1 3 461.595697
37 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 2 720.892646
38 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term;Oxidation@M 0;7 11 1 3 480.930856
39 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term 0 11 1 2 712.895189
40 FLHIKLMNPNR 0;1 1 True True Acetyl@Protein_N-term 0 11 1 3 475.599218
41 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 2 807.942867
42 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 3 538.964337
43 FLHIKLMNPQRTT 1 2 False True Oxidation@M 7 13 1 4 404.475072
44 FLHIKLMNPQRTT 1 2 False True 13 1 2 799.945410
45 FLHIKLMNPQRTT 1 2 False True 13 1 3 533.632699
46 FLHIKLMNPQRTT 1 2 False True 13 1 4 400.476343
47 FGHIKLMNPQRST 1 2 False True Oxidation@M 7 13 0 2 772.903742
48 FGHIKLMNPQRST 1 2 False True Oxidation@M 7 13 0 3 515.604920
49 FGHIKLMNPQRST 1 2 False True 13 0 2 764.906285
50 FGHIKLMNPQRST 1 2 False True 13 0 3 510.273282

After calc_precursor_mz(), all sequences containing x are removed because x’s mass is very large which is out of the range of fasta_lib.min_precursor_mz and fasta_lib.max_precursor_mz.

[14]:
from alphabase.constants.aa import AA_ASCII_MASS
(
    fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz,
    f"mass of 'x' is {AA_ASCII_MASS[ord('x')]}"
)
[14]:
(400.0, 2000.0, "mass of 'x' is 100000000.0")

Calculate fragment m/z

[15]:
fasta_lib.calc_fragment_mz_df()
fasta_lib.fragment_mz_df
[15]:
b_z1 y_z1
0 114.091339 849.388306
1 261.126740 702.352905
2 375.169678 588.309998
3 472.222443 491.257233
4 600.281006 363.198669
... ... ...
486 941.502563 588.309998
487 1038.555298 491.257233
488 1166.613892 363.198669
489 1322.714966 207.097549
490 1409.747070 120.065521

491 rows × 2 columns

Use frag_start_idx and frag_stop_idx in precursor_df to locate the corresponding fragments

[16]:
ith_pep = 5
frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values
fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]
[16]:
b_z1 y_z1
35 72.044388 753.326111
36 159.076416 666.294067
37 274.103363 551.267151
38 403.145966 422.224548
39 490.177979 335.192505
40 591.225647 234.144836
41 678.257690 147.112808
[ ]: