alphabase.scoring.ml_scoring#

Classes:

Percolator()

SupervisedPercolator()

DIA-NN like scoring.

class alphabase.scoring.ml_scoring.Percolator[source][source]#

Bases: object

Methods:

__init__()

extract_features(psm_df, *args, **kwargs)

Extract features for rescoring.

rescore(df)

Estimate ML scores and then FDRs (q-values)

run_rerank_workflow(top_k_psm_df[, ...])

Run percolator workflow with reranking the peptides for each spectrum.

run_rescore_workflow(psm_df, *args, **kwargs)

Run percolator workflow:

Attributes:

feature_extractor

The feature extractor inherited from BaseFeatureExtractor

feature_list

Get extracted feature_list.

ml_model

ML model in Percolator.

__init__()[source][source]#
extract_features(psm_df: DataFrame, *args, **kwargs) DataFrame[source][source]#

Extract features for rescoring.

*args and **kwargs are used for self.feature_extractor.extract_features.

Parameters:

psm_df (pd.DataFrame) – PSM DataFrame

Returns:

psm_df with feature columns appended inplace.

Return type:

pd.DataFrame

property feature_extractor: BaseFeatureExtractor#

The feature extractor inherited from BaseFeatureExtractor

property feature_list: list#

Get extracted feature_list. Property, read-only

property ml_model#

ML model in Percolator. It can be sklearn models or other models but implement the methods fit() and decision_function() (or predict_proba()) which are the same as sklearn models.

rescore(df: DataFrame) DataFrame[source][source]#

Estimate ML scores and then FDRs (q-values)

Parameters:

df (pd.DataFrame) – psm_df

Returns:

psm_df with ml_score and fdr columns updated inplace

Return type:

pd.DataFrame

run_rerank_workflow(top_k_psm_df: DataFrame, rerank_column: str = 'spec_idx', *args, **kwargs) DataFrame[source][source]#

Run percolator workflow with reranking the peptides for each spectrum.

  • self.extract_features()

  • self.rescore()

*args and **kwargs are used for self.feature_extractor.extract_features.

Parameters:
  • top_k_psm_df (pd.DataFrame) – PSM DataFrame

  • rerank_column (str) –

    The column use to rerank PSMs.

    For example, use the following code to select the top-ranked peptide for each spectrum. ` rerank_column = 'spec_idx' # scan_num idx = top_k_psm_df.groupby(['raw_name',rerank_column])['ml_score'].idxmax() psm_df = top_k_psm_df.loc[idx].copy() `

Returns:

Only top-scored PSM is returned for each group of the rerank_column.

Return type:

pd.DataFrame

run_rescore_workflow(psm_df: DataFrame, *args, **kwargs) DataFrame[source][source]#

Run percolator workflow:

  • self.extract_features()

  • self.rescore()

*args and **kwargs are used for self.feature_extractor.extract_features.

Parameters:

psm_df (pd.DataFrame) – PSM DataFrame

Returns:

psm_df with feature columns appended inplace.

Return type:

pd.DataFrame

class alphabase.scoring.ml_scoring.SupervisedPercolator[source][source]#

Bases: Percolator

DIA-NN like scoring.

Methods:

__init__()

rescore(psm_df)

Estimate ML scores and then FDRs (q-values)

__init__()[source][source]#
rescore(psm_df: DataFrame) DataFrame[source][source]#

Estimate ML scores and then FDRs (q-values)

Parameters:

df (pd.DataFrame) – psm_df

Returns:

psm_df with ml_score and fdr columns updated inplace

Return type:

pd.DataFrame