AutoML compatibility w/ sklearn cross-validation & roc_auc #466

username725 · 2022-02-22T23:30:37Z

To perform nested cross-validation:

sklearn.model_selection.cross_val_score(automl, X, y, cv=2)

However that requires AutoML to have a score() method available. Okay, let's explicitly give sklearn a scoring method:

sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

This ends up in sklearn's _BaseScorer._select_proba_binary() which requires classes_ to be a Numpy ndarray. AutoML explicitly is turning these to a list. So there is an error.

Full example:

import numpy as np
import sklearn
from flaml import AutoML

X = np.random.random(size=(10, 1))
y = np.random.choice([False, True], size=10)
automl = AutoML(time_budget=5)
sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

Leads to error:

( col_idx = np.flatnonzero(classes == pos_label)[0] IndexError: index 0 is out of bounds for axis 0 with size 0).

A workaround is to override classes_ to have it return an array:

class MyAutoML(AutoML):
    @property
    def classes_(self):
        return np.array(super().classes_)

Since a workaround was found, this isn't high priority, but I wonder:

Does a decision_function() make sense for AutoML?
Does a score() function make sense?
Compatibility reasons to .tolist() the .classes_?

FLAML 0.9.6, scikit-learn 1.0.2

The text was updated successfully, but these errors were encountered:

sonichi · 2022-02-23T00:38:28Z

To perform nested cross-validation:
sklearn.model_selection.cross_val_score(automl, X, y, cv=2)
However that requires AutoML to have a score() method available. Okay, let's explicitly give sklearn a scoring method:
sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)
This ends up in sklearn's _BaseScorer._select_proba_binary() which requires classes_ to be a Numpy ndarray. AutoML explicitly is turning these to a list. So there is an error.

Full example:
import numpy as np
import sklearn
from flaml import AutoML

X = np.random.random(size=(10, 1))
y = np.random.choice([False, True], size=10)
automl = AutoML(time_budget=5)
sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)
Leads to error:

( col_idx = np.flatnonzero(classes == pos_label)[0] IndexError: index 0 is out of bounds for axis 0 with size 0).

A workaround is to override classes_ to have it return an array:
class MyAutoML(AutoML):
    @property
    def classes_(self):
        return np.array(super().classes_)
Since a workaround was found, this isn't high priority, but I wonder:

Does a decision_function() make sense for AutoML?

Not sure because it is not applicable to all learners and tasks.

Does a score() function make sense?

Yes, it makes sense. Would you like to add it?

Compatibility reasons to .tolist() the .classes_?

We used this to make it work for automlbenchmark. Let me try converting it to np.array. If it works, we should make it compatible.

FLAML 0.9.6, scikit-learn 1.0.2

username725 · 2022-03-01T00:53:31Z

Thanks for the quick turn around.
We can consider this Issue closed, and I can open an PR for score() if I get a chance.

sonichi mentioned this issue Feb 23, 2022

make AutoML.classes_ an array #467

Merged

sonichi closed this as completed Mar 1, 2022

sonichi mentioned this issue Mar 22, 2022

adding evaluation #495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

username725 commented Feb 22, 2022 •

edited

Loading

sonichi commented Feb 23, 2022

username725 commented Mar 1, 2022 •

edited

Loading

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

Comments

username725 commented Feb 22, 2022 • edited Loading

sonichi commented Feb 23, 2022

username725 commented Mar 1, 2022 • edited Loading

username725 commented Feb 22, 2022 •

edited

Loading

username725 commented Mar 1, 2022 •

edited

Loading