Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

Closed
username725 opened this issue Feb 22, 2022 · 2 comments
Closed

AutoML compatibility w/ sklearn cross-validation & roc_auc #466

username725 opened this issue Feb 22, 2022 · 2 comments

Comments

@username725
Copy link

username725 commented Feb 22, 2022

To perform nested cross-validation:

sklearn.model_selection.cross_val_score(automl, X, y, cv=2)

However that requires AutoML to have a score() method available. Okay, let's explicitly give sklearn a scoring method:

sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

This ends up in sklearn's _BaseScorer._select_proba_binary() which requires classes_ to be a Numpy ndarray. AutoML explicitly is turning these to a list. So there is an error.

Full example:

import numpy as np
import sklearn
from flaml import AutoML

X = np.random.random(size=(10, 1))
y = np.random.choice([False, True], size=10)
automl = AutoML(time_budget=5)
sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

Leads to error:

( col_idx = np.flatnonzero(classes == pos_label)[0] IndexError: index 0 is out of bounds for axis 0 with size 0).

A workaround is to override classes_ to have it return an array:

class MyAutoML(AutoML):
    @property
    def classes_(self):
        return np.array(super().classes_)

Since a workaround was found, this isn't high priority, but I wonder:

  • Does a decision_function() make sense for AutoML?
  • Does a score() function make sense?
  • Compatibility reasons to .tolist() the .classes_?

FLAML 0.9.6, scikit-learn 1.0.2

@sonichi
Copy link
Contributor

sonichi commented Feb 23, 2022

To perform nested cross-validation:

sklearn.model_selection.cross_val_score(automl, X, y, cv=2)

However that requires AutoML to have a score() method available. Okay, let's explicitly give sklearn a scoring method:

sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

This ends up in sklearn's _BaseScorer._select_proba_binary() which requires classes_ to be a Numpy ndarray. AutoML explicitly is turning these to a list. So there is an error.

Full example:

import numpy as np
import sklearn
from flaml import AutoML

X = np.random.random(size=(10, 1))
y = np.random.choice([False, True], size=10)
automl = AutoML(time_budget=5)
sklearn.model_selection.cross_val_score(automl, X, y, scoring='roc_auc', cv=2)

Leads to error:

( col_idx = np.flatnonzero(classes == pos_label)[0] IndexError: index 0 is out of bounds for axis 0 with size 0).

A workaround is to override classes_ to have it return an array:

class MyAutoML(AutoML):
    @property
    def classes_(self):
        return np.array(super().classes_)

Since a workaround was found, this isn't high priority, but I wonder:

  • Does a decision_function() make sense for AutoML?

Not sure because it is not applicable to all learners and tasks.

  • Does a score() function make sense?

Yes, it makes sense. Would you like to add it?

  • Compatibility reasons to .tolist() the .classes_?

We used this to make it work for automlbenchmark. Let me try converting it to np.array. If it works, we should make it compatible.

FLAML 0.9.6, scikit-learn 1.0.2

@username725
Copy link
Author

username725 commented Mar 1, 2022

Thanks for the quick turn around.
We can consider this Issue closed, and I can open an PR for score() if I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants