Hyperparameter optimization and model evaluation failing

GreyMerchant · ‎01-25-2024

Hi there,

I am trying to do a custom Hyperparameter optimization and model evaluation. I tried using one of the code samples provided (AUC-PR) to see how it works:

from sklearn.metrics import auc, precision_recall_curve

def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
- [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
- [optional] sample_weight is a numpy ndarray with shape (nb_records,)
NB: this option requires a variable set as "Sample weights"
"""
# Data to plot precision-recall curve
precision, recall, thresholds = precision_recall_curve(y_valid, y_pred[:,1])
# Use AUC function to calculate the area under the precision-recall curve
auc_precision_recall = auc(recall, precision)
return auc_precision_recall

This is leading to errros for all models: "Failed to train : <class 'ValueError'> : Custom scoring function failed: too many indices for array "

I would like to get this to work and I actually would like to get the below to work in my existing problem. How can I adjust this to work?

from sklearn.metrics import precision_recall_fscore_support

def f_beta_score(y_true, y_pred, sample_weight=None, beta=1.0):
    """
    Custom scoring function using F-beta score.
    Must return a float quantifying the estimator prediction quality.
    - y_true is a numpy ndarray or pandas Series with true labels
    - y_pred is a numpy ndarray with predicted probabilities or class predictions
    - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights
    - beta is the beta parameter for F-beta score
    """
    # Convert probabilities to class predictions (binary classification)
    y_pred_class = (y_pred[:, 1] > 0.5).astype(int)

    # Calculate precision, recall, and F-beta score
    precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight)

    return f_beta

TomWiley · ‎01-25-2024

Hi!

For the AUC-PR code sample, have you enabled the `Needs Probability` option in the custom metric options? It should look something like in the screenshot i've attached, and it is necessary for this specific example.

For the F-Beta metric you've described, the following should do the trick:

from sklearn.metrics import precision_recall_fscore_support

def score(y_true, y_pred, sample_weight=None):
    """
    Custom scoring function using F-beta score.
    Must return a float quantifying the estimator prediction quality.
    - y_true is a numpy ndarray or pandas Series with true labels
    - y_pred is a numpy ndarray with predicted probabilities or class predictions
    - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights
    - beta is the beta parameter for F-beta score
    """
    # Convert probabilities to class predictions (binary classification)
    beta=1.0
    y_pred_class = (y_pred[:, 1] > 0.5).astype(int)

    # Calculate precision, recall, and F-beta score
    precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight)

    return f_beta

What i've done here is modify your function just so that its signature matches what we expect.

Again, this will require the `Needs Probability` option set to true.

Let me know if this helps!

Tom

Sign up to take part

Hyperparameter optimization and model evaluation failing

Hyperparameter optimization and model evaluation failing