alphabase.peptide.precursor#

Functions:

`calc_precursor_isotope`(precursor_df[, ...])	Calculate isotope intensity values for precursor_df inplace.
`calc_precursor_isotope_info`(precursor_df[, ...])	Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace.
`calc_precursor_isotope_info_mp`(precursor_df)	calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation.
`calc_precursor_isotope_intensity`(precursor_df)	Calculate isotope intensity values for precursor_df inplace.
`calc_precursor_isotope_intensity_mp`(precursor_df)	Calculate isotope intensity values for precursor_df using multiprocessing.
`calc_precursor_isotope_mp`(precursor_df[, ...])	Calculate isotope intensity values for precursor_df using multiprocessing.
`calc_precursor_mz`(precursor_df[, batch_size])	Calculate precursor_mz inplace in the precursor_df
`get_mod_seq_charge_hash`(sequence, mods, ...)	Get hash code value for a precursor:
`get_mod_seq_formula`(seq, mods)	'PEPTIDE','Acetyl@Any N-term' --> [('C',n), ('H',m), ...]
`get_mod_seq_hash`(sequence, mods, mod_sites, *)	Get hash code value for a peptide:
`get_mod_seq_isotope_distribution`(seq_mods, ...)	Get isotope abundance distribution by IsotopeDistribution.
`get_right_most_isotope_offset`(intensities, ...)	Get right-most isotope index
`hash_mod_seq_charge_df`(precursor_df, *[, seed])	Internal function
`hash_mod_seq_df`(precursor_df, *[, seed])	Internal function
`hash_precursor_df`(precursor_df, *[, seed])	Add columns 'mod_seq_hash' and 'mod_seq_charge_hash' into precursor_df (inplace).
`is_precursor_refined`(precursor_df)
`is_precursor_sorted`(precursor_df)
`refine_precursor_df`(df[, drop_frag_idx, ...])	Refine df inplace for faster precursor/fragment calculation.
`reset_precursor_df`(df[, drop_frag_idx, ...])	Refine df inplace for faster precursor/fragment calculation.
`update_precursor_mz`(precursor_df[, batch_size])	Calculate precursor_mz inplace in the precursor_df

alphabase.peptide.precursor.calc_precursor_isotope(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') → DataFrame[source]#

Calculate isotope intensity values for precursor_df inplace.

Parameters:

precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

Returns:

precursor_df with additional columns:

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_info(precursor_df: DataFrame, min_right_most_intensity: float = 0.2)[source][source]#

Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace.

Parameters:

precursor_df (pd.DataFrame) – precursor_df to calculate
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

precursor_df with additional columns:

isotope_m1_intensity: relative intensity of M1 to mono peak
isotope_m1_mz: mz of M1
isotope_apex_intensity: relative intensity of the apex peak
isotope_apex_mz: mz of the apex peak
isotope_apex_offset: position offset of the apex peak to mono peak
isotope_right_most_intensity: relative intensity of the right-most peak
isotope_right_most_mz: mz of the right-most peak
isotope_right_most_offset: position offset of the right-most peak

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_info_mp(precursor_df: DataFrame, processes: int = 8, mp_batch_size: int = 10000, progress_bar=None, min_right_most_intensity: float = 0.2, min_precursor_num_to_run_mp: int = 10000) → DataFrame[source][source]#

calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation. The speed is acceptable with multiprocessing (3.8 min for 21M precursors, 8 processes).

Parameters:

precursor_df (pd.DataFrame) – Precursor_df to calculate
processes (int) – Process number. Optional, by default 8
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 100000.
progress_bar (Callable) – The tqdm-based callback function to check multiprocessing. Defaults to None.
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

DataFrame with isotope_* columns, see :meth:’calc_precursor_isotope()’.

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_intensity(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') → DataFrame[source][source]#

Calculate isotope intensity values for precursor_df inplace.

Parameters:

precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

Returns:

precursor_df with additional columns:

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_intensity_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) → DataFrame[source][source]#

Calculate isotope intensity values for precursor_df using multiprocessing.

Parameters:

precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.
mp_process_num (int) – Process number. Optional, by default 8
progress_bar (bool) – Whether to show progress bar. Optional, by default True

Returns:

precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) → DataFrame[source]#

Calculate isotope intensity values for precursor_df using multiprocessing.

Parameters:

precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.
mp_process_num (int) – Process number. Optional, by default 8
progress_bar (bool) – Whether to show progress bar. Optional, by default True

Returns:

precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_mz(precursor_df: DataFrame, batch_size=500000) → DataFrame[source]#

Calculate precursor_mz inplace in the precursor_df

Parameters:: precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column
Returns:: precursor_df with ‘precursor_mz’
Return type:: pd.DataFrame

alphabase.peptide.precursor.get_mod_seq_charge_hash(sequence: str, mods: str, mod_sites: str, charge: int, *, seed=0)[source][source]#

Get hash code value for a precursor:: (sequence, mods, mod_sites, charge)

Parameters:

sequence (str) – Amino acid sequence
mods (str) – Modification names in AlphaBase format
mod_sites (str) – Modification sites in AlphaBase format
charge (int) – Precursor charge state
seed (int) – Seed for hashing. Optional, by default 0

Returns:

64-bit hash code value

Return type:

np.uint64

alphabase.peptide.precursor.get_mod_seq_formula(seq: str, mods: str) → list[source][source]#: ‘PEPTIDE’,’Acetyl@Any N-term’ –> [(‘C’,n), (‘H’,m), …]

alphabase.peptide.precursor.get_mod_seq_hash(sequence: str, mods: str, mod_sites: str, *, seed: int = 0) → uint64[source][source]#

Get hash code value for a peptide:: (sequence, mods, mod_sites)

Parameters:

sequence (str) – Amino acid sequence
mods (str) – Modification names in AlphaBase format
mod_sites (str) – Modification sites in AlphaBase format
seed (int) – Seed for hashing. Optional, by default 0

Returns:

64-bit hash code value

Return type:

np.uint64

alphabase.peptide.precursor.get_mod_seq_isotope_distribution(seq_mods: tuple, isotope_dist: IsotopeDistribution, min_right_most_intensity: float = 0.2) → tuple[source][source]#

Get isotope abundance distribution by IsotopeDistribution. This function is designed for multiprocessing.

Parameters:

seq_mods (tuple) – (sequence, mods)
isotope_dist (IsotopeDistribution) – See IsotopeDistribution in alphabase.constants.isotope
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

float - Abundance of mono+1 / mono float - Abundance of apex / mono int - Apex isotope position relative to mono, i.e. apex index - mono index and 0 refers to the position of mono itself float - Abundance of right-most peak which has at least min_right_most_intensity intensity relative to the apex peak int - Right-most position relative to mono, i.e. right-most index - mono index

Return type:

tuple

alphabase.peptide.precursor.get_right_most_isotope_offset(intensities: ndarray, apex_idx: int, min_right_most_intensity: float) → int[source]#

Get right-most isotope index

Parameters:

intensities (np.ndarray) – Isotope intensities
apex_idx (int) – The index or position of apex peak
min_right_most_intensity (float) – Minimal intensity to consider for right-most peak relative to apex

Returns:

Index or position of the right-most peak

Return type:

int

alphabase.peptide.precursor.hash_mod_seq_charge_df(precursor_df: DataFrame, *, seed=0)[source][source]#: Internal function

alphabase.peptide.precursor.hash_mod_seq_df(precursor_df: DataFrame, *, seed=0)[source][source]#: Internal function

alphabase.peptide.precursor.hash_precursor_df(precursor_df: DataFrame, *, seed: int = 0) → DataFrame[source][source]#

Add columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’ into precursor_df (inplace). The 64-bit hash function is from xxhash (xxhash.xxh64).

Parameters:

precursor_df (pd.DataFrame) – precursor_df
Seed (int) – Seed for xxhash.xxh64. Optional, by default 0

Returns:

DataFrame with columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’

Return type:

pd.DataFrame

alphabase.peptide.precursor.is_precursor_refined(precursor_df: DataFrame)[source][source]#

alphabase.peptide.precursor.is_precursor_sorted(precursor_df: DataFrame)[source]#

alphabase.peptide.precursor.refine_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) → DataFrame[source][source]#: Refine df inplace for faster precursor/fragment calculation.

alphabase.peptide.precursor.reset_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) → DataFrame[source]#: Refine df inplace for faster precursor/fragment calculation.

alphabase.peptide.precursor.update_precursor_mz(precursor_df: DataFrame, batch_size=500000) → DataFrame[source][source]#

Calculate precursor_mz inplace in the precursor_df

Parameters:: precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column
Returns:: precursor_df with ‘precursor_mz’
Return type:: pd.DataFrame