alphabase.peptide.precursor#
Functions:
|
Calculate isotope intensity values for precursor_df inplace. |
|
Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace. |
|
calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation. |
|
Calculate isotope intensity values for precursor_df inplace. |
|
Calculate isotope intensity values for precursor_df using multiprocessing. |
|
Calculate isotope intensity values for precursor_df using multiprocessing. |
|
Calculate precursor_mz inplace in the precursor_df |
|
Get hash code value for a precursor: |
|
'PEPTIDE','Acetyl@Any N-term' --> [('C',n), ('H',m), ...] |
|
Get hash code value for a peptide: |
|
Get isotope abundance distribution by IsotopeDistribution. |
|
Get right-most isotope index |
|
Internal function |
|
Internal function |
|
Add columns 'mod_seq_hash' and 'mod_seq_charge_hash' into precursor_df (inplace). |
|
|
|
|
|
Refine df inplace for faster precursor/fragment calculation. |
|
Refine df inplace for faster precursor/fragment calculation. |
|
Calculate precursor_mz inplace in the precursor_df |
- alphabase.peptide.precursor.calc_precursor_isotope(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') DataFrame [source]#
Calculate isotope intensity values for precursor_df inplace.
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
- Returns:
precursor_df with additional columns:
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_isotope_info(precursor_df: DataFrame, min_right_most_intensity: float = 0.2)[source][source]#
Calculate isotope mz values and relative (to M0) intensity values for precursor_df inplace.
- Parameters:
precursor_df (pd.DataFrame) – precursor_df to calculate
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2
- Returns:
precursor_df with additional columns:
isotope_m1_intensity: relative intensity of M1 to mono peak
isotope_m1_mz: mz of M1
isotope_apex_intensity: relative intensity of the apex peak
isotope_apex_mz: mz of the apex peak
isotope_apex_offset: position offset of the apex peak to mono peak
isotope_right_most_intensity: relative intensity of the right-most peak
isotope_right_most_mz: mz of the right-most peak
isotope_right_most_offset: position offset of the right-most peak
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_isotope_info_mp(precursor_df: DataFrame, processes: int = 8, mp_batch_size: int = 10000, progress_bar=None, min_right_most_intensity: float = 0.2, min_precursor_num_to_run_mp: int = 10000) DataFrame [source][source]#
calc_precursor_isotope is not that fast for large dataframes, so here we use multiprocessing for faster isotope pattern calculation. The speed is acceptable with multiprocessing (3.8 min for 21M precursors, 8 processes).
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df to calculate
processes (int) – Process number. Optional, by default 8
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 100000.
progress_bar (Callable) – The tqdm-based callback function to check multiprocessing. Defaults to None.
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2
- Returns:
DataFrame with isotope_* columns, see :meth:’calc_precursor_isotope()’.
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_isotope_intensity(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum') DataFrame [source][source]#
Calculate isotope intensity values for precursor_df inplace.
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
- Returns:
precursor_df with additional columns:
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_isotope_intensity_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) DataFrame [source][source]#
Calculate isotope intensity values for precursor_df using multiprocessing.
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.
mp_process_num (int) – Process number. Optional, by default 8
progress_bar (bool) – Whether to show progress bar. Optional, by default True
- Returns:
precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_isotope_mp(precursor_df, max_isotope=6, min_right_most_intensity=0.001, normalize: Literal['mono', 'sum'] = 'sum', mp_batch_size=1000, mp_process_num=8, progress_bar=True) DataFrame [source]#
Calculate isotope intensity values for precursor_df using multiprocessing.
- Parameters:
precursor_df (pd.DataFrame) – Precursor_df to calculate isotope intensity
max_isotope (int) – Max isotope number to calculate. Optional, by default 6
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak.
mp_batch_size (int) – Multiprocessing batch size. Optional, by default 1000.
mp_process_num (int) – Process number. Optional, by default 8
progress_bar (bool) – Whether to show progress bar. Optional, by default True
- Returns:
precursor_df with additional columns i_0, i_1, i_2, … i_{max_isotope-1}
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.calc_precursor_mz(precursor_df: DataFrame, batch_size=500000) DataFrame [source]#
Calculate precursor_mz inplace in the precursor_df
- Parameters:
precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column
- Returns:
precursor_df with ‘precursor_mz’
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.get_mod_seq_charge_hash(sequence: str, mods: str, mod_sites: str, charge: int, *, seed=0)[source][source]#
- Get hash code value for a precursor:
(sequence, mods, mod_sites, charge)
- Parameters:
sequence (str) – Amino acid sequence
mods (str) – Modification names in AlphaBase format
mod_sites (str) – Modification sites in AlphaBase format
charge (int) – Precursor charge state
seed (int) – Seed for hashing. Optional, by default 0
- Returns:
64-bit hash code value
- Return type:
np.uint64
- alphabase.peptide.precursor.get_mod_seq_formula(seq: str, mods: str) list [source][source]#
‘PEPTIDE’,’Acetyl@Any N-term’ –> [(‘C’,n), (‘H’,m), …]
- alphabase.peptide.precursor.get_mod_seq_hash(sequence: str, mods: str, mod_sites: str, *, seed: int = 0) uint64 [source][source]#
- Get hash code value for a peptide:
(sequence, mods, mod_sites)
- Parameters:
sequence (str) – Amino acid sequence
mods (str) – Modification names in AlphaBase format
mod_sites (str) – Modification sites in AlphaBase format
seed (int) – Seed for hashing. Optional, by default 0
- Returns:
64-bit hash code value
- Return type:
np.uint64
- alphabase.peptide.precursor.get_mod_seq_isotope_distribution(seq_mods: tuple, isotope_dist: IsotopeDistribution, min_right_most_intensity: float = 0.2) tuple [source][source]#
Get isotope abundance distribution by IsotopeDistribution. This function is designed for multiprocessing.
- Parameters:
seq_mods (tuple) – (sequence, mods)
isotope_dist (IsotopeDistribution) – See IsotopeDistribution in alphabase.constants.isotope
min_right_most_intensity (float) – The minimal intensity value of the right-most peak relative to apex peak. Optional, by default 0.2
- Returns:
float - Abundance of mono+1 / mono float - Abundance of apex / mono int - Apex isotope position relative to mono, i.e. apex index - mono index and 0 refers to the position of mono itself float - Abundance of right-most peak which has at least min_right_most_intensity intensity relative to the apex peak int - Right-most position relative to mono, i.e. right-most index - mono index
- Return type:
tuple
- alphabase.peptide.precursor.get_right_most_isotope_offset(intensities: ndarray, apex_idx: int, min_right_most_intensity: float) int [source]#
Get right-most isotope index
- Parameters:
intensities (np.ndarray) – Isotope intensities
apex_idx (int) – The index or position of apex peak
min_right_most_intensity (float) – Minimal intensity to consider for right-most peak relative to apex
- Returns:
Index or position of the right-most peak
- Return type:
int
- alphabase.peptide.precursor.hash_mod_seq_charge_df(precursor_df: DataFrame, *, seed=0)[source][source]#
Internal function
- alphabase.peptide.precursor.hash_mod_seq_df(precursor_df: DataFrame, *, seed=0)[source][source]#
Internal function
- alphabase.peptide.precursor.hash_precursor_df(precursor_df: DataFrame, *, seed: int = 0) DataFrame [source][source]#
Add columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’ into precursor_df (inplace). The 64-bit hash function is from xxhash (xxhash.xxh64).
- Parameters:
precursor_df (pd.DataFrame) – precursor_df
Seed (int) – Seed for xxhash.xxh64. Optional, by default 0
- Returns:
DataFrame with columns ‘mod_seq_hash’ and ‘mod_seq_charge_hash’
- Return type:
pd.DataFrame
- alphabase.peptide.precursor.refine_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) DataFrame [source][source]#
Refine df inplace for faster precursor/fragment calculation.
- alphabase.peptide.precursor.reset_precursor_df(df: DataFrame, drop_frag_idx=False, ensure_data_validity=False) DataFrame [source]#
Refine df inplace for faster precursor/fragment calculation.
- alphabase.peptide.precursor.update_precursor_mz(precursor_df: DataFrame, batch_size=500000) DataFrame [source][source]#
Calculate precursor_mz inplace in the precursor_df
- Parameters:
precursor_df (pd.DataFrame) – precursor_df with the ‘charge’ column
- Returns:
precursor_df with ‘precursor_mz’
- Return type:
pd.DataFrame