alphabase.peptide.precursor#

Functions:

calc_precursor_isotope(precursor_df[, ...])

Calculate isotope intensity values for precursor_df inplace.

calc_precursor_isotope_info(precursor_df[, ...])

Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace.

calc_precursor_isotope_info_mp(precursor_df)

calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation.

calc_precursor_isotope_intensity(precursor_df)

Calculate isotope intensity values for precursor_df inplace.

calc_precursor_isotope_intensity_mp(precursor_df)

Calculate isotope intensity values for precursor_df using multiprocessing.

calc_precursor_isotope_mp(precursor_df[, ...])

Calculate isotope intensity values for precursor_df using multiprocessing.

calc_precursor_mz(precursor_df[, batch_size])

Calculate precursor_mz inplace in the precursor_df

get_mod_seq_charge_hash(sequence, mods, ...)

Get hash code value for a precursor:

get_mod_seq_formula(seq, mods)

'PEPTIDE','Acetyl@Any N-term' --> [('C',n), ('H',m), ...]

get_mod_seq_hash(sequence, mods, mod_sites, *)

Get hash code value for a peptide:

get_mod_seq_isotope_distribution(seq_mods, ...)

Get isotope abundance distribution by IsotopeDistribution.

get_right_most_isotope_offset(intensities, ...)

Get right-most isotope index

hash_mod_seq_charge_df(precursor_df, *[, seed])

Internal function

hash_mod_seq_df(precursor_df, *[, seed])

Internal function

hash_precursor_df(precursor_df, *[, seed])

Add columns 'mod_seq_hash' and 'mod_seq_charge_hash' into precursor_df (inplace).

is_precursor_refined(precursor_df)

is_precursor_sorted(precursor_df)

refine_precursor_df(df[, drop_frag_idx, ...])

Refine df inplace for faster precursor/fragment calculation.

reset_precursor_df(df[, drop_frag_idx, ...])

Refine df inplace for faster precursor/fragment calculation.

update_precursor_mz(precursor_df[, batch_size])

Calculate precursor_mz inplace in the precursor_df

alphabase.peptide.precursor.calc_precursor_isotope(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') DataFrame[source]#

Calculate isotope intensity values for precursor_df inplace.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity

  • max_isotope (int) – Max isotope number to calculate. Optional, by default 6

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

Returns:

precursor_df with additional columns:

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_info(precursor_df: DataFrame, min_right_most_intensity: float = 0.2)[source][source]#

Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace.

Parameters:
  • precursor_df (pd.DataFrame) – precursor_df to calculate

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

precursor_df with additional columns:

  • isotope_m1_intensity: relative intensity of M1 to mono peak

  • isotope_m1_mz: mz of M1

  • isotope_apex_intensity: relative intensity of the apex peak

  • isotope_apex_mz: mz of the apex peak

  • isotope_apex_offset: position offset of the apex peak to mono peak

  • isotope_right_most_intensity: relative intensity of the right-most peak

  • isotope_right_most_mz: mz of the right-most peak

  • isotope_right_most_offset: position offset of the right-most peak

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_info_mp(precursor_df: DataFrame, processes: int = 8, mp_batch_size: int = 10000, progress_bar=None, min_right_most_intensity: float = 0.2, min_precursor_num_to_run_mp: int = 10000) DataFrame[source][source]#

calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation. The speed is acceptable with multiprocessing (3.8 min for 21M precursors, 8 processes).

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df to calculate

  • processes (int) – Process number. Optional, by default 8

  • mp_batch_size (int) – Multiprocessing batch size. Optional, by default 100000.

  • progress_bar (Callable) – The tqdm-based callback function to check multiprocessing. Defaults to None.

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

DataFrame with isotope_* columns, see :meth:’calc_precursor_isotope()’.

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_intensity(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') DataFrame[source][source]#

Calculate isotope intensity values for precursor_df inplace.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity

  • max_isotope (int) – Max isotope number to calculate. Optional, by default 6

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

Returns:

precursor_df with additional columns:

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_intensity_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) DataFrame[source][source]#

Calculate isotope intensity values for precursor_df using multiprocessing.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity

  • max_isotope (int) – Max isotope number to calculate. Optional, by default 6

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

  • mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.

  • mp_process_num (int) – Process number. Optional, by default 8

  • progress_bar (bool) – Whether to show progress bar. Optional, by default True

Returns:

precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_isotope_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) DataFrame[source]#

Calculate isotope intensity values for precursor_df using multiprocessing.

Parameters:
  • precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity

  • max_isotope (int) – Max isotope number to calculate. Optional, by default 6

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.

  • mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.

  • mp_process_num (int) – Process number. Optional, by default 8

  • progress_bar (bool) – Whether to show progress bar. Optional, by default True

Returns:

precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}

Return type:

pd.DataFrame

alphabase.peptide.precursor.calc_precursor_mz(precursor_df: DataFrame, batch_size=500000) DataFrame[source]#

Calculate precursor_mz inplace in the precursor_df

Parameters:

precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column

Returns:

precursor_df with ‘precursor_mz’

Return type:

pd.DataFrame

alphabase.peptide.precursor.get_mod_seq_charge_hash(sequence: str, mods: str, mod_sites: str, charge: int, *, seed=0)[source][source]#
Get hash code value for a precursor:

(sequence, mods, mod_sites, charge)

Parameters:
  • sequence (str) – Amino acid sequence

  • mods (str) – Modification names in AlphaBase format

  • mod_sites (str) – Modification sites in AlphaBase format

  • charge (int) – Precursor charge state

  • seed (int) – Seed for hashing. Optional, by default 0

Returns:

64-bit hash code value

Return type:

np.uint64

alphabase.peptide.precursor.get_mod_seq_formula(seq: str, mods: str) list[source][source]#

‘PEPTIDE’,’Acetyl@Any N-term’ –> [(‘C’,n), (‘H’,m), …]

alphabase.peptide.precursor.get_mod_seq_hash(sequence: str, mods: str, mod_sites: str, *, seed: int = 0) uint64[source][source]#
Get hash code value for a peptide:

(sequence, mods, mod_sites)

Parameters:
  • sequence (str) – Amino acid sequence

  • mods (str) – Modification names in AlphaBase format

  • mod_sites (str) – Modification sites in AlphaBase format

  • seed (int) – Seed for hashing. Optional, by default 0

Returns:

64-bit hash code value

Return type:

np.uint64

alphabase.peptide.precursor.get_mod_seq_isotope_distribution(seq_mods: tuple, isotope_dist: IsotopeDistribution, min_right_most_intensity: float = 0.2) tuple[source][source]#

Get isotope abundance distribution by IsotopeDistribution. This function is designed for multiprocessing.

Parameters:
  • seq_mods (tuple) – (sequence, mods)

  • isotope_dist (IsotopeDistribution) – See IsotopeDistribution in alphabase.constants.isotope

  • min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2

Returns:

float - Abundance of mono+1 / mono float - Abundance of apex / mono int - Apex isotope position relative to mono, i.e. apex index - mono index and 0 refers to the position of mono itself float - Abundance of right-most peak which has at least min_right_most_intensity intensity relative to the apex peak int - Right-most position relative to mono, i.e. right-most index - mono index

Return type:

tuple

alphabase.peptide.precursor.get_right_most_isotope_offset(intensities: ndarray, apex_idx: int, min_right_most_intensity: float) int[source]#

Get right-most isotope index

Parameters:
  • intensities (np.ndarray) – Isotope intensities

  • apex_idx (int) – The index or position of apex peak

  • min_right_most_intensity (float) – Minimal intensity to consider for right-most peak relative to apex

Returns:

Index or position of the right-most peak

Return type:

int

alphabase.peptide.precursor.hash_mod_seq_charge_df(precursor_df: DataFrame, *, seed=0)[source][source]#

Internal function

alphabase.peptide.precursor.hash_mod_seq_df(precursor_df: DataFrame, *, seed=0)[source][source]#

Internal function

alphabase.peptide.precursor.hash_precursor_df(precursor_df: DataFrame, *, seed: int = 0) DataFrame[source][source]#

Add columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’ into precursor_df (inplace). The 64-bit hash function is from xxhash (xxhash.xxh64).

Parameters:
  • precursor_df (pd.DataFrame) – precursor_df

  • Seed (int) – Seed for xxhash.xxh64. Optional, by default 0

Returns:

DataFrame with columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’

Return type:

pd.DataFrame

alphabase.peptide.precursor.is_precursor_refined(precursor_df: DataFrame)[source][source]#
alphabase.peptide.precursor.is_precursor_sorted(precursor_df: DataFrame)[source]#
alphabase.peptide.precursor.refine_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) DataFrame[source][source]#

Refine df inplace for faster precursor/fragment calculation.

alphabase.peptide.precursor.reset_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) DataFrame[source]#

Refine df inplace for faster precursor/fragment calculation.

alphabase.peptide.precursor.update_precursor_mz(precursor_df: DataFrame, batch_size=500000) DataFrame[source][source]#

Calculate precursor_mz inplace in the precursor_df

Parameters:

precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column

Returns:

precursor_df with ‘precursor_mz’

Return type:

pd.DataFrame