Tutorial for Dev: Spectral Libraries#

This notebook introduces functionalities for spectral libraries to developers.

[1]:

# One or two methods are renamed and only available in v1.0.1 (main branch),
# please install alphabase from github:

# %pip install git+https://github.com/mannlabs/alphabase

[2]:

import alphabase
# the version must be 1.0.1
alphabase.__version__

[2]:

'1.0.1'

The Base Library Class#

alphabase.spectral_library.base.SpecLibBase is the base class for spectral libraries. See https://alphabase.readthedocs.io/en/latest/ for details. We recommend users to access spectral library functionalities via alphabase.protein.fasta.SpecLibFasta.

`SpecLibFasta`#

Almost all DataFrame functionalities to process proteins and peptides have been integrated into alphabase.protein.fasta.SpecLibFasta.

[3]:

from alphabase.protein.fasta import SpecLibFasta

fasta_lib = SpecLibFasta(
    charged_frag_types=['b_z1','y_z1'],
    protease='trypsin',
    fix_mods=['Carbamidomethyl@C'],
    var_mods=['Acetyl@Protein N-term','Oxidation@M'],
    decoy=None,
)

Start from fasta/proteins#

The SpecLibFasta will do following for us:

Load fasta files into a protein_dict
Digest proteins into peptide sequences
Append decoy peptide sequences if self.decoy is not None
Add fixed and variable modifications
[Add special modifications]
[Add peptide labeling]
Add charge states to peptides

Load fasta files into a protein_dict#

[4]:

# from alphabase.protein.fasta import load_all_proteins
# protein_dict = load_all_proteins(fasta_files)

# For example, the protein_dict is:
protein_dict = {
    'yy': {
        'protein_id': 'yy',
        'full_name': 'yy_yy',
        'gene_name': 'y_y',
        'sequence': 'FGHIKLMNPQR'
    },
    'xx': {
        'protein_id': 'xx',
        'full_name': 'xx_xx',
        'gene_name': 'x_x',
        'sequence': 'MACDESTYKxKFGHIKLMNPQRST'
    },
}

Digest proteins into peptide sequences#

[5]:

fasta_lib.get_peptides_from_protein_dict(protein_dict)
fasta_lib.precursor_df

[5]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	nAA
0	xKFGHIK	1	1	False	False	7
1	LMNPQRST	1	1	False	True	8
2	ACDESTYK	1	0	True	False	8
3	MACDESTYK	1	0	True	False	9
4	ACDESTYKxK	1	1	True	False	10
5	FGHIKLMNPQR	0;1	1	True	True	11
6	MACDESTYKxK	1	1	True	False	11
7	xKFGHIKLMNPQR	1	2	False	False	13
8	FGHIKLMNPQRST	1	2	False	True	13
9	ACDESTYKxKFGHIK	1	2	True	False	15
10	MACDESTYKxKFGHIK	1	2	True	False	16

[6]:

fasta_lib.protein_df

[6]:

	protein_id	full_name	gene_name	sequence
0	yy	yy_yy	y_y	FGHIKLMNPQR
1	xx	xx_xx	x_x	MACDESTYKxKFGHIKLMNPQRST

Append decoy sequences#

This depends on self.decoy:str, its value can be

protein_reverse: Reverse on target protein sequences
pseudo_reverse: Pseudo-reverse on target peptide sequences
diann: DiaNN-like decoy
None: no decoy.

Let’s take protein_reverse as an example:

[7]:

fasta_lib.decoy = 'protein_reverse'
fasta_lib.append_decoy_sequence()
fasta_lib.precursor_df.sample(5, random_state=0)

[7]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	nAA	decoy
20	ACDESTYKxKFGHIK	1	2	True	False	15	0
10	QPNMLKIHGF	2	1	False	True	10	1
14	FGHIKLMNPQR	0;1	1	True	True	11	0
13	MACDESTYKxK	1	1	True	False	11	0
1	IHGFKxK	3	1	False	False	7	1

As protein_reverse is a protein-level decoy, the protein_df is changed too:

[8]:

fasta_lib.protein_df

[8]:

	protein_id	full_name	gene_name	sequence	decoy
0	yy	yy_yy	y_y	FGHIKLMNPQR	0
1	xx	xx_xx	x_x	MACDESTYKxKFGHIKLMNPQRST	0
2	REV_yy	REV_yy_yy	REV_y_y	RQPNMLKIHGF	1
3	REV_xx	REV_xx_xx	REV_x_x	TSRQPNMLKIHGFKxKYTSEDCAM	1

Add modifications#

add_modifications() will add fixed and variable modifications.

[9]:

fasta_lib.add_modifications()
fasta_lib.precursor_df.sample(5, random_state=0)

[9]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy
35	MACDESTYKxK	1	1	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	3;0	11	0
34	MACDESTYKxK	1	1	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1	11	0
42	xKFGHIKLMNPQR	1	2	False	False	Oxidation@M	9	13	0
27	QPNMLKIHGFK	3	1	False	False			11	1
11	YTSEDCAM	3	0	False	True	Carbamidomethyl@C	6	8	1

Add special modifications#

Special modifications here refer to some PTMs we want to have more controls on:

We only needs peptides without unmodified forms
GlyGly@K cannot occur on peptide C-term because trypsin cannot cleave Lys with GlyGly
For some special modifications like Phospho@S and HexNAc@S, we would like to limit the number of peptidome forms to control the memory usage.

[10]:

fasta_lib.special_mods = ['GlyGly@K']
fasta_lib.special_mods_cannot_modify_pep_c_term = True
fasta_lib.min_special_mod_num = 1 # exclude the unmodified forms
fasta_lib.max_special_mod_num = 1 # limit the number of
fasta_lib.add_special_modifications()
fasta_lib.precursor_df.sample(10, random_state=0)

[10]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy
28	QPNMLKIHGFKxK	3	2	False	False	Oxidation@M;GlyGly@K	4;6	13	1
37	IHGFKxKYTSEDCAM	3	2	False	True	Carbamidomethyl@C;Oxidation@M;GlyGly@K	13;15;7	15	1
11	RQPNMLKIHGF	2	2	True	True	GlyGly@K	7	11	1
34	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term;Oxidation@M;GlyGly@K	0;7;9	14	1
2	ACDESTYKxK	1	1	True	False	Carbamidomethyl@C;GlyGly@K	2;8	10	0
30	QPNMLKIHGFKxK	3	2	False	False	GlyGly@K	6	13	1
40	ACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;GlyGly@K	2;8	15	0
32	TSRQPNMLKIHGFK	3	2	True	False	Oxidation@M;GlyGly@K	7;9	14	1
26	xKFGHIKLMNPQR	1	2	False	False	GlyGly@K	2	13	0
4	xKYTSEDCAM	3	1	False	True	Carbamidomethyl@C;Oxidation@M;GlyGly@K	8;10;2	10	1

Add peptide labeling#

For example Dimethyl:

[11]:

fasta_lib.labeling_channels = {
    0: ['Dimethyl@K', 'Dimethyl@Any N-term'],
    4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any N-term'],
}
fasta_lib.add_peptide_labeling()
fasta_lib.precursor_df.sample(10, random_state=0)

[11]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy	labeling_channel
26	xKFGHIKLMNPQR	1	2	False	False	GlyGly@K;Dimethyl@Any N-term;Dimethyl@K;Dimeth...	2;0;2;7	13	0	0
61	QPNMLKIHGFK	3	1	False	False	GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H...	6;0;6;11	11	1	4
2	ACDESTYKxK	1	1	True	False	Carbamidomethyl@C;GlyGly@K;Dimethyl@Any N-term...	2;8;0;8;10	10	0	0
62	RQPNMLKIHGF	2	2	True	True	Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any N-term...	5;7;0;7	11	1	4
85	TSRQPNMLKIHGFK	3	2	True	False	GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H...	9;0;9;14	14	1	4
48	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1;9;9;11;16	16	0	0
16	MACDESTYKxK	1	1	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1;9;9;11	11	0	0
99	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;GlyGly@K;Dimethyl:2H(4)@Any ...	3;11;0;9;11;16	16	0	4
56	xKYTSEDCAM	3	1	False	True	Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy...	8;10;2;0;2	10	1	4
45	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy...	3;1;11;0;9;11;16	16	0	0

Add charge states#

[12]:

fasta_lib.add_charge()
fasta_lib.precursor_df.sample(5, random_state=0)

[12]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy	labeling_channel	charge
65	FGHIKLMNPQR	0;1	1	True	True	Acetyl@Protein N-term;GlyGly@K;Dimethyl@K	0;5;5	11	0	0	4
225	FGHIKLMNPQRST	1	2	False	True	GlyGly@K;Dimethyl:2H(4)@Any N-term;Dimethyl:2H...	5;0;5	13	0	4	2
188	RQPNMLKIHGF	2	2	True	True	Oxidation@M;GlyGly@K;Dimethyl:2H(4)@Any N-term...	5;7;0;7	11	1	4	4
200	MACDESTYKxK	1	1	True	False	Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy...	3;1;9;0;9;11	11	0	4	4
108	IHGFKxKYTSEDCAM	3	2	False	True	Carbamidomethyl@C;Oxidation@M;GlyGly@K;Dimethy...	13;15;5;0;5;7	15	1	0	2

`import_and_process_protein_dict()` combines all steps#

Or import_and_process_fasta() for fasta files.

[13]:

fasta_lib.special_mods = []
fasta_lib.labeling_channels = None
fasta_lib.import_and_process_protein_dict(protein_dict)
fasta_lib.protein_df

[13]:

	protein_id	full_name	gene_name	sequence	decoy
0	yy	yy_yy	y_y	FGHIKLMNPQR	0
1	xx	xx_xx	x_x	MACDESTYKxKFGHIKLMNPQRST	0
2	REV_yy	REV_yy_yy	REV_y_y	RQPNMLKIHGF	1
3	REV_xx	REV_xx_xx	REV_x_x	TSRQPNMLKIHGFKxKYTSEDCAM	1

[14]:

fasta_lib.precursor_df

[14]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy	charge
0	xKFGHIK	1	1	False	False			7	0	2
1	xKFGHIK	1	1	False	False			7	0	3
2	xKFGHIK	1	1	False	False			7	0	4
3	IHGFKxK	3	1	False	False			7	1	2
4	IHGFKxK	3	1	False	False			7	1	3
...	...	...	...	...	...	...	...	...	...	...
169	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1	16	0	3
170	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1	16	0	4
171	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	3;0	16	0	2
172	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	3;0	16	0	3
173	MACDESTYKxKFGHIK	1	2	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	3;0	16	0	4

174 rows × 10 columns

Start from peptides instead of proteins#

The modularity design of SpecLibFasta allows us to starts from arbitrary types of peptide inputs, meaning that fasta files or protein_dict is not necessary.

For example, we have a list of sequences, and we what to add modifications using SpecLibFasta functionalities:

[15]:

import pandas as pd
pep_lib = SpecLibFasta(
    charged_frag_types=['b_z1','y_z1'],
    fix_mods=['Carbamidomethyl@C'],
    var_mods=['Acetyl@Protein N-term','Oxidation@M'],
    labeling_channels={
        0: ['Dimethyl@K', 'Dimethyl@Any N-term'],
        4: ['Dimethyl:2H(4)@K', 'Dimethyl:2H(4)@Any N-term'],
    },
    decoy=None,
)

pep_lib.precursor_df = pd.DataFrame({
    'sequence': ['ABCDEFG','HIJKLMN','OPQRST','UVWXYZ']
})
pep_lib.process_from_naked_peptide_seqs()
pep_lib.precursor_df

[15]:

	sequence	nAA	is_prot_nterm	is_prot_cterm	mods	mod_sites	labeling_channel	charge
0	OPQRST	6	False	False	Dimethyl@Any N-term	0	0	2
1	OPQRST	6	False	False	Dimethyl@Any N-term	0	0	3
2	OPQRST	6	False	False	Dimethyl@Any N-term	0	0	4
3	UVWXYZ	6	False	False	Dimethyl@Any N-term	0	0	2
4	UVWXYZ	6	False	False	Dimethyl@Any N-term	0	0	3
5	UVWXYZ	6	False	False	Dimethyl@Any N-term	0	0	4
6	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl@Any N-term	3;0	0	2
7	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl@Any N-term	3;0	0	3
8	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl@Any N-term	3;0	0	4
9	HIJKLMN	7	False	False	Oxidation@M;Dimethyl@Any N-term;Dimethyl@K	6;0;4	0	2
10	HIJKLMN	7	False	False	Oxidation@M;Dimethyl@Any N-term;Dimethyl@K	6;0;4	0	3
11	HIJKLMN	7	False	False	Oxidation@M;Dimethyl@Any N-term;Dimethyl@K	6;0;4	0	4
12	HIJKLMN	7	False	False	Dimethyl@Any N-term;Dimethyl@K	0;4	0	2
13	HIJKLMN	7	False	False	Dimethyl@Any N-term;Dimethyl@K	0;4	0	3
14	HIJKLMN	7	False	False	Dimethyl@Any N-term;Dimethyl@K	0;4	0	4
15	OPQRST	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	2
16	OPQRST	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	3
17	OPQRST	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	4
18	UVWXYZ	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	2
19	UVWXYZ	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	3
20	UVWXYZ	6	False	False	Dimethyl:2H(4)@Any N-term	0	4	4
21	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term	3;0	4	2
22	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term	3;0	4	3
23	ABCDEFG	7	False	False	Carbamidomethyl@C;Dimethyl:2H(4)@Any N-term	3;0	4	4
24	HIJKLMN	7	False	False	Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl...	6;0;4	4	2
25	HIJKLMN	7	False	False	Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl...	6;0;4	4	3
26	HIJKLMN	7	False	False	Oxidation@M;Dimethyl:2H(4)@Any N-term;Dimethyl...	6;0;4	4	4
27	HIJKLMN	7	False	False	Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K	0;4	4	2
28	HIJKLMN	7	False	False	Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K	0;4	4	3
29	HIJKLMN	7	False	False	Dimethyl:2H(4)@Any N-term;Dimethyl:2H(4)@K	0;4	4	4

Calculate masses#

Calculate precursor m/z#

[16]:

fasta_lib.calc_precursor_mz()
# fasta_lib.calc_precursor_isotope()
fasta_lib.precursor_df

[16]:

	sequence	protein_idxes	miss_cleavage	is_prot_nterm	is_prot_cterm	mods	mod_sites	nAA	decoy	charge	precursor_mz
0	RQPNMLK	2	1	True	False	Oxidation@M	5	7	1	2	451.747462
1	RQPNMLK	2	1	True	False			7	1	2	443.750005
2	RQPNMLK	2	1	True	False	Acetyl@Protein N-term;Oxidation@M	0;5	7	1	2	472.752744
3	RQPNMLK	2	1	True	False	Acetyl@Protein N-term	0	7	1	2	464.755287
4	LMNPQRST	1	1	False	True	Oxidation@M	2	8	0	2	481.739834
5	LMNPQRST	1	1	False	True			8	0	2	473.742377
6	ACDESTYK	1	0	True	False	Carbamidomethyl@C	2	8	0	2	487.200207
7	ACDESTYK	1	0	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	2;0	8	0	2	508.205490
8	YTSEDCAM	3	0	False	True	Carbamidomethyl@C;Oxidation@M	6;8	8	1	2	496.670426
9	YTSEDCAM	3	0	False	True	Carbamidomethyl@C	6	8	1	2	488.672968
10	MACDESTYK	1	0	True	False	Carbamidomethyl@C;Oxidation@M	3;1	9	0	2	560.717907
11	MACDESTYK	1	0	True	False	Carbamidomethyl@C	3	9	0	2	552.720450
12	MACDESTYK	1	0	True	False	Carbamidomethyl@C;Acetyl@Protein N-term;Oxidat...	3;0;1	9	0	2	581.723190
13	MACDESTYK	1	0	True	False	Carbamidomethyl@C;Acetyl@Protein N-term	3;0	9	0	2	573.725732
14	TSRQPNMLK	3	1	True	False	Oxidation@M	7	9	1	2	545.787316
15	TSRQPNMLK	3	1	True	False			9	1	2	537.789858
16	TSRQPNMLK	3	1	True	False	Acetyl@Protein N-term;Oxidation@M	0;7	9	1	2	566.792598
17	TSRQPNMLK	3	1	True	False	Acetyl@Protein N-term	0	9	1	2	558.795141
18	QPNMLKIHGF	2	1	False	True	Oxidation@M	4	10	1	2	600.813333
19	QPNMLKIHGF	2	1	False	True	Oxidation@M	4	10	1	3	400.877981
20	QPNMLKIHGF	2	1	False	True			10	1	2	592.815876
21	QPNMLKIHGFK	3	1	False	False	Oxidation@M	4	11	1	2	664.860815
22	QPNMLKIHGFK	3	1	False	False	Oxidation@M	4	11	1	3	443.576302
23	QPNMLKIHGFK	3	1	False	False			11	1	2	656.863357
24	QPNMLKIHGFK	3	1	False	False			11	1	3	438.244664
25	RQPNMLKIHGF	2	2	True	True	Oxidation@M	5	11	1	2	678.863889
26	RQPNMLKIHGF	2	2	True	True	Oxidation@M	5	11	1	3	452.911685
27	RQPNMLKIHGF	2	2	True	True			11	1	2	670.866431
28	RQPNMLKIHGF	2	2	True	True			11	1	3	447.580046
29	RQPNMLKIHGF	2	2	True	True	Acetyl@Protein N-term;Oxidation@M	0;5	11	1	2	699.869171
30	RQPNMLKIHGF	2	2	True	True	Acetyl@Protein N-term;Oxidation@M	0;5	11	1	3	466.915206
31	RQPNMLKIHGF	2	2	True	True	Acetyl@Protein N-term	0	11	1	2	691.871714
32	RQPNMLKIHGF	2	2	True	True	Acetyl@Protein N-term	0	11	1	3	461.583568
33	FGHIKLMNPQR	0;1	1	True	True	Oxidation@M	7	11	0	2	678.863889
34	FGHIKLMNPQR	0;1	1	True	True	Oxidation@M	7	11	0	3	452.911685
35	FGHIKLMNPQR	0;1	1	True	True			11	0	2	670.866431
36	FGHIKLMNPQR	0;1	1	True	True			11	0	3	447.580046
37	FGHIKLMNPQR	0;1	1	True	True	Acetyl@Protein N-term;Oxidation@M	0;7	11	0	2	699.869171
38	FGHIKLMNPQR	0;1	1	True	True	Acetyl@Protein N-term;Oxidation@M	0;7	11	0	3	466.915206
39	FGHIKLMNPQR	0;1	1	True	True	Acetyl@Protein N-term	0	11	0	2	691.871714
40	FGHIKLMNPQR	0;1	1	True	True	Acetyl@Protein N-term	0	11	0	3	461.583568
41	FGHIKLMNPQRST	1	2	False	True	Oxidation@M	7	13	0	2	772.903742
42	FGHIKLMNPQRST	1	2	False	True	Oxidation@M	7	13	0	3	515.604920
43	FGHIKLMNPQRST	1	2	False	True			13	0	2	764.906285
44	FGHIKLMNPQRST	1	2	False	True			13	0	3	510.273282
45	TSRQPNMLKIHGFK	3	2	True	False	Oxidation@M	7	14	1	2	836.951224
46	TSRQPNMLKIHGFK	3	2	True	False	Oxidation@M	7	14	1	3	558.303241
47	TSRQPNMLKIHGFK	3	2	True	False	Oxidation@M	7	14	1	4	418.979250
48	TSRQPNMLKIHGFK	3	2	True	False			14	1	2	828.953766
49	TSRQPNMLKIHGFK	3	2	True	False			14	1	3	552.971603
50	TSRQPNMLKIHGFK	3	2	True	False			14	1	4	414.980521
51	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term;Oxidation@M	0;7	14	1	2	857.956506
52	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term;Oxidation@M	0;7	14	1	3	572.306763
53	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term;Oxidation@M	0;7	14	1	4	429.481891
54	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term	0	14	1	2	849.959049
55	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term	0	14	1	3	566.975125
56	TSRQPNMLKIHGFK	3	2	True	False	Acetyl@Protein N-term	0	14	1	4	425.483163

After calc_precursor_mz(), all sequences containing x are removed because x’s mass is very large which is out of the range of fasta_lib.min_precursor_mz and fasta_lib.max_precursor_mz.

[17]:

from alphabase.constants.aa import AA_ASCII_MASS
(
    fasta_lib.min_precursor_mz, fasta_lib.max_precursor_mz,
    f"mass of 'x' is {AA_ASCII_MASS[ord('x')]}"
)

[17]:

(400.0, 2000.0, "mass of 'x' is 100000000.0")

Calculate fragment m/z#

[18]:

fasta_lib.calc_fragment_mz_df()
fasta_lib.fragment_mz_df

[18]:

	b_z1	y_z1
0	157.108387	746.386537
1	285.166965	618.327959
2	382.219729	521.275195
3	496.262656	407.232268
4	643.298056	260.196868
...	...	...
556	1098.572440	601.345658
557	1211.656504	488.261594
558	1348.715416	351.202682
559	1405.736879	294.181218
560	1552.805293	147.112804

561 rows × 2 columns

Use frag_start_idx and frag_stop_idx in precursor_df to locate the corresponding fragments

[19]:

ith_pep = 5
frag_start, frag_stop = fasta_lib.precursor_df.loc[ith_pep,['frag_start_idx','frag_stop_idx']].values
fasta_lib.fragment_mz_df.iloc[frag_start:frag_stop]

[19]:

	b_z1	y_z1
31	114.091340	833.393413
32	245.131826	702.352928
33	359.174753	588.310000
34	456.227517	491.257237
35	584.286094	363.198659
36	740.387205	207.097548
37	827.419234	120.065520

[ ]: