Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi there,
I am trying to do a custom Hyperparameter optimization and model evaluation. I tried using one of the code samples provided (AUC-PR) to see how it works:
from sklearn.metrics import auc, precision_recall_curve
def score(y_valid, y_pred):
"""
Custom scoring function.
Must return a float quantifying the estimator prediction quality.
- y_valid is a pandas Series
- y_pred is a numpy ndarray with shape:
- (nb_records,) for regression problems and classification problems
where 'needs probas' (see below) is false
(for classification, the values are the numeric class indexes)
- (nb_records, nb_classes) for classification problems where
'needs probas' is true
- [optional] X_valid is a dataframe with shape (nb_records, nb_input_features)
- [optional] sample_weight is a numpy ndarray with shape (nb_records,)
NB: this option requires a variable set as "Sample weights"
"""
# Data to plot precision-recall curve
precision, recall, thresholds = precision_recall_curve(y_valid, y_pred[:,1])
# Use AUC function to calculate the area under the precision-recall curve
auc_precision_recall = auc(recall, precision)
return auc_precision_recall
This is leading to errros for all models: "Failed to train : <class 'ValueError'> : Custom scoring function failed: too many indices for array "
I would like to get this to work and I actually would like to get the below to work in my existing problem. How can I adjust this to work?
from sklearn.metrics import precision_recall_fscore_support def f_beta_score(y_true, y_pred, sample_weight=None, beta=1.0): """ Custom scoring function using F-beta score. Must return a float quantifying the estimator prediction quality. - y_true is a numpy ndarray or pandas Series with true labels - y_pred is a numpy ndarray with predicted probabilities or class predictions - sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights - beta is the beta parameter for F-beta score """ # Convert probabilities to class predictions (binary classification) y_pred_class = (y_pred[:, 1] > 0.5).astype(int) # Calculate precision, recall, and F-beta score precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight) return f_beta
Hi!
For the AUC-PR code sample, have you enabled the `Needs Probability` option in the custom metric options? It should look something like in the screenshot i've attached, and it is necessary for this specific example.
For the F-Beta metric you've described, the following should do the trick:
from sklearn.metrics import precision_recall_fscore_support
def score(y_true, y_pred, sample_weight=None):
"""
Custom scoring function using F-beta score.
Must return a float quantifying the estimator prediction quality.
- y_true is a numpy ndarray or pandas Series with true labels
- y_pred is a numpy ndarray with predicted probabilities or class predictions
- sample_weight is a numpy ndarray with shape (nb_records,) representing sample weights
- beta is the beta parameter for F-beta score
"""
# Convert probabilities to class predictions (binary classification)
beta=1.0
y_pred_class = (y_pred[:, 1] > 0.5).astype(int)
# Calculate precision, recall, and F-beta score
precision, recall, f_beta, _ = precision_recall_fscore_support(y_true, y_pred_class, beta=beta, average='binary', sample_weight=sample_weight)
return f_beta
What i've done here is modify your function just so that its signature matches what we expect.
Again, this will require the `Needs Probability` option set to true.
Let me know if this helps!
Tom