-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Refactoring of scikit-learn based surrogate models #699
Conversation
self.ridge.fit(X, y.ravel()) | ||
return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge)) | ||
|
||
|
||
def main(): | ||
# Hyperparameter configuration space | ||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this look like the other examples
searcher=searcher, | ||
search_options={"debug_log": False}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you pass an object for searcher
, there is no search_options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #699 +/- ##
==========================================
- Coverage 65.85% 65.83% -0.02%
==========================================
Files 400 401 +1
Lines 27868 27877 +9
==========================================
+ Hits 18352 18354 +2
- Misses 9516 9523 +7
☔ View full report in Codecov by Sentry. |
@@ -40,6 +40,7 @@ def load_benchmark_requirements(): | |||
required_aws = load_requirements("requirements-aws.txt") | |||
required_moo = load_requirements("requirements-moo.txt") | |||
required_visual = load_requirements("requirements-visual.txt") | |||
required_sklearn = load_requirements("requirements-sklearn.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will collect requirements of the scikit-learn based surrogate models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a nice change, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been planning to make a similar change in preparation for the ICML code coming in. See comments on SKLearnPredictorWrapper
for details.
allow_duplicates=allow_duplicates, | ||
resource_attr=None, # TODO This needs to be passed for multi-fidelity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about resource attribute, shall we necer pass it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you do in the ICML paper, can easily be implemented with a searcher which does not know about multi-fidelity at all. Namely, we can use a specific StateForModelConverter
, which converts the multi-fidelity data to single fidelity data before the searcher even sees it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why, at least for now, I'd keep it simple here. I am rather a fan of keeping something simple and make it more complex when needed, because at that later point, you know exactly what is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good rule in general. For now this was just a todo placeholder so I am asking why we do we remove it since nothing changed. I guess it just sets a default so we might not have it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Before you implement something complex for the multi-fidelity case for ICML, let's talk. This reduction via state_converter
is very simple and elegant, and allows the user to decouple the reduction multi_fid -> single_fid from the choice of the single_fid searcher/model.
@@ -40,6 +40,7 @@ def load_benchmark_requirements(): | |||
required_aws = load_requirements("requirements-aws.txt") | |||
required_moo = load_requirements("requirements-moo.txt") | |||
required_visual = load_requirements("requirements-visual.txt") | |||
required_sklearn = load_requirements("requirements-sklearn.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a nice change, thanks
searcher=searcher, | ||
search_options={"debug_log": False}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense
def keys_predict(self) -> Set[str]: | ||
if self.sklearn_predictor.returns_std(): | ||
return {"mean", "std"} | ||
else: | ||
return {"mean"} | ||
|
||
:param inputs: Input points, shape ``(n, d)`` | ||
:return: List of ``dict`` with keys :meth:`keys_predict`, of length 1 | ||
""" | ||
|
||
mean, std = self.contributed_predictor.predict(inputs) | ||
outputs = {"mean": mean} | ||
if std is not None: | ||
outputs["std"] = std | ||
def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]: | ||
prediction = self.sklearn_predictor.predict(X=inputs) | ||
if self.sklearn_predictor.returns_std(): | ||
outputs = {"mean": prediction[0], "std": prediction[1]} | ||
else: | ||
outputs = {"mean": prediction} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about a change like this a lot recently and I believe those functions should be directly implemented into the SKLearnPredictor
for the following reasons:
1/ Most importantly, the SKLearnPredictor could provide outputs that are not only mean and stds, for example the ICML surrogates use quantiles to determine the predicted distribution. We should make the SKLearnPredictor
flexible enough to account for that.
2/ The predictor already needs to provide the implementation of returns_std()
therefore it already needs to know about different keys and whether to predict means, stds, etc.
3/ We add the complexity of having different outputs (mean, std, quantiles,etc.) and should do it all in one place (here SKLearnPredictor
) rather than distributed between two classes. If we move those functions to the SKLearnPredictor
, the Wrapper does not need to know about what is being predicted as long as it has a known format Dict[str, np.ndarray].
I would recommend something like:
class SKLearnPredictorWrapper(BasePredictor):
"""
Wrapper class for sklearn predictors returned by ``fit_from_state`` of
:class:`~syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_estimator.SKLearnEstimatorWrapper`.
"""
def __init__(
self,
sklearn_predictor: SKLearnPredictor,
state: TuningJobState,
active_metric: Optional[str] = None,
):
super().__init__(state, active_metric)
self.sklearn_predictor = sklearn_predictor
def keys_predict(self) -> Set[str]:
return self.sklearn_predictor.keys_predict()
def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
prediction = self.sklearn_predictor.predict(X=inputs)
return [prediction]
class SKLearnPredictor:
"""
Base class for the sklearn predictors.
"""
@staticmethod
def keys_predict() -> bool:
"""
:return: The quantities predicted by predict, most commonly 'mean' and 'std'
"""
raise NotImplementedError
def predict(
self, X: np.ndarray
) -> Dict[str, np.ndarray]:
"""
Returns signals which are statistics of the predictive distribution at
input points ``inputs``.
:param inputs: Input points, shape ``(n, d)``
:return: Dictionary with {quantity_name: quantity_array} predictive outputs.
All quantity values are arrays of shape ``(n,)``
Most commonly this dictionary holds
{
"mean": predictive mean
"std": predictive stddevs
}
"""
raise NotImplementedError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, but then I think you have not really simplified things for the user.
My idea is that code in bayesopt/sklearn
is simpler and closer to scikit-learn that code in bayesopt/models
. Here, "simpler" means that it looks more like scikit-learn, and less like the specific API we need further above.
You started this way, and I like it (except the fact whether a predictor returns std or not, cannot be an argument, but a fixed property of the predictor).
If another case comes up, I'd rather create a different class for predictor and estimator in bayesopt/sklearn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, but just pushing through all methods to bayesopt/sklearn
does not make things easier there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good point but then we should commit to making it as simple as possible and only accept mean + std, like in the original code. Adding options to return std optionally makes it more contrived and defeats the purpose of making is as simple as possible. We should either have flexibility and return a dict or have simplicitly and return mean+std.
If we decide to go for only mean+std in the basic contributed surrigate, we would add different wrappers/predictors for different outputs like quantiles or mean-only. Ultimately, it is very hard to add do BO with just mean predictions so we probably need some extra code to make it work anyways (like computing std on held-out validation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but scikit-learn itself does this, right? If return_std=True
, BayesianRidge
returns means and stds, and otherwise it only returns means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not an option, but a fixed property of the object (or even of the class). A predictor either always returns (mean, std), or it always only returns mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point for improved flexibility but I think that with the Scikit interface the basic class should just be as simple to use as possible. This means just implementing fit()
and predict()
methods should be enough. The current implementation supports providing std=None
so non-std functions are still supported if needed, the interface is just designed to make the default use-case as easy as possible and I think it is something we should retain.
Note that we can also add a more advanced SKLearnPredictor that has higher flexibility at the cost of complexity, it just won't be the default that we showcase as easy to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, so how do we proceed?
The current implementation is simply not correct. It is essential for things to work that keys_predict
and the return of predict
are consistent. As it stands, keys_predict
says: I'll give you mean and std, but then maybe predict
does not. An AF like EI thinks everything is fine, and the first call, things blow up.
I am all for simplicity, but things just have to be correct.
If you want keys_predict
to remain as it is, then predict
has to always return mean and std, and so a deterministic model cannot be covered.
I don't really understand why not being explicit about whether you return std or not, is making things complicated. This seems pretty basic to me, that you can make a statement about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is simply not correct. It is essential for things to work that keys_predict and the return of predict are consistent. As it stands, keys_predict says: I'll give you mean and std, but then maybe predict does not. An AF like EI thinks everything is fine, and the first call, things blow up.
Aha I see, I missed this point before but this is indeed a good callout. I think in this case I would opt for the simplest way to fix the bug and ublock the tutorial: assert both mean and std are returned by the default surrogate. This is the surrogate you can use for the tutorials and it will be simple to understand albeit non-flexible.
Going beyond that, I have noted some ideas below. We can discuss them offline (tomorrow) if needed to align on the best strategy.
We generally need a good way to tackle predictors that return mean, mean+std, quantiles or anything else. I see the main issue is that the user implementing their own SKLearnPredictor
needs to somehow encode the information about the returns of the predictor inside the class.
1/ One way is to have SKLearnPredictor
return a dictionary thus moving all the logic about different outputs into SKLearnPredictor
(that's my previous proposal). I see your points against it as it lacks simplicity.
2/ The second one is to assume multiple options for SKLearnPredictor
and move the construction of output dictionaries into SKLearnPredictorWrapper
such as below. Users can choose which predictor to subclass and the wrapper will figure out the rest which I hope will be simple in the default case but also flexible for advanced users.
3/ What you propose which is a mix of the two, it can work too but I prefer to have a sklearn surrogate which just needs one function to implment. Otherwise we risk non-advanced users getting lost and, for exmple, implementing mismatched versions of predict()
and returns_std()
. Maybe I am overly caucious but I hope options 1 or 2 will work.
class SKLearnPredictor:
"""
Base class for the sklearn predictors
"""
def predict(
self, X: np.ndarray, return_std: bool = True
) -> Tuple[np.ndarray, np.ndarray]:
"""
Returns signals which are statistics of the predictive distribution at
input points ``inputs``.
:param inputs: Input points, shape ``(n, d)``
:return: Tuple with the following entries:
* "mean": Predictive means in shape of ``(n,)``
* "std": Predictive stddevs, shape ``(n,)``
"""
raise NotImplementedError()
class SKLearnDeterminicticPredictor:
"""
Base class for the sklearn predictors
"""
def predict(
self, X: np.ndarray, return_std: bool = True
) -> np.ndarray:
"""
Returns signals which are statistics of the predictive distribution at
input points ``inputs``.
:param inputs: Input points, shape ``(n, d)``
:return: "mean": Predictive means in shape of ``(n,)``
"""
raise NotImplementedError()
class SKLearnPredictorWrapper(BasePredictor):
def __init__(
self,
contributed_predictor: Union[SKLearnPredictor, SKLearnDeterministicPredictor],
state: TuningJobState,
active_metric: Optional[str] = None,
):
super().__init__(state, active_metric)
self.contributed_predictor = contributed_predictor
if isinstance(self.contributed_predictor, SKLearnPredictor):
self.keys = {"mean", "std"}
elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor):
self.keys = {"mean"}
else:
raise AttributeError
def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
predictions = self.contributed_predictor.predict(inputs)
if isinstance(self.contributed_predictor, SKLearnPredictor):
assert isinstance(predictions, tuple), "SKLearnPredictor must return a tuple with Mean and Std"
assert len(predictions) == 2, "SKLearnPredictor must return a tuple with Mean and Std"
outputs = {"mean": predictions[0], "std": predictions[1]}
elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor):
assert isinstance(predictions, np.ndarray), "SKLearnPredictor must return a single array with the mean"
outputs = {"mean": predictions}
else:
raise AttributeError
return [outputs]
def keys_predict(self) -> Set[str]:
return self.keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your quantile case will be interesting indeed.
There is also the need to refactor stuff in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Refactorings needed before I can include this in the BO dev tutorial:
sklearn
contributed
->sklearn
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.