refactor: Refactoring of scikit-learn based surrogate models #699

mseeger · 2023-05-29T21:13:02Z

Refactorings needed before I can include this in the BO dev tutorial:

Cleanup whether predictor returns (means, stds) or only means
New dependency tag sklearn
Some more contributed -> sklearn

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mseeger · 2023-05-29T21:18:53Z

examples/launch_sklearn_surrogate_bo.py

        self.ridge.fit(X, y.ravel())
        return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))


-def main():
-    # Hyperparameter configuration space
+if __name__ == "__main__":


Make this look like the other examples

mseeger · 2023-05-29T21:19:31Z

examples/launch_sklearn_surrogate_bo.py

        searcher=searcher,
-        search_options={"debug_log": False},


If you pass an object for searcher, there is no search_options

OK, makes sense

codecov · 2023-05-29T21:20:04Z

Codecov Report

Patch coverage: 76.19% and project coverage change: -0.02 ⚠️

Comparison is base (51cfe94) 65.85% compared to head (d3a736a) 65.83%.

❗ Current head d3a736a differs from pull request most recent head ae1c41c. Consider uploading reports for the commit ae1c41c to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #699      +/-   ##
==========================================
- Coverage   65.85%   65.83%   -0.02%     
==========================================
  Files         400      401       +1     
  Lines       27868    27877       +9     
==========================================
+ Hits        18352    18354       +2     
- Misses       9516     9523       +7

Impacted Files	Coverage Δ
setup.py	`0.00% <0.00%> (ø)`
...optimizer/schedulers/searchers/gp_fifo_searcher.py	`100.00% <ø> (ø)`
...rs/searchers/sklearn/sklearn_surrogate_searcher.py	`75.00% <ø> (ø)`
...schedulers/searchers/bayesopt/sklearn/estimator.py	`76.92% <50.00%> (-10.58%)`	⬇️
...schedulers/searchers/bayesopt/sklearn/predictor.py	`71.42% <50.00%> (ø)`
syne_tune/try_import.py	`46.87% <50.00%> (+0.20%)`	⬆️
...ers/searchers/bayesopt/models/sklearn_estimator.py	`90.00% <80.00%> (-10.00%)`	⬇️
...ers/searchers/bayesopt/models/sklearn_predictor.py	`86.66% <80.00%> (-7.46%)`	⬇️
...archers/bayesopt/tuning_algorithms/base_classes.py	`84.52% <100.00%> (-0.19%)`	⬇️
...optimizer/schedulers/searchers/sklearn/__init__.py	`100.00% <100.00%> (ø)`
... and 1 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

mseeger · 2023-05-29T21:20:08Z

setup.py

@@ -40,6 +40,7 @@ def load_benchmark_requirements():
 required_aws = load_requirements("requirements-aws.txt")
 required_moo = load_requirements("requirements-moo.txt")
 required_visual = load_requirements("requirements-visual.txt")
+required_sklearn = load_requirements("requirements-sklearn.txt")


This will collect requirements of the scikit-learn based surrogate models

Thats a nice change, thanks

jgolebiowski

I have been planning to make a similar change in preparation for the ICML code coming in. See comments on SKLearnPredictorWrapper for details.

jgolebiowski · 2023-05-30T10:26:58Z

syne_tune/optimizer/schedulers/searchers/sklearn/sklearn_surrogate_searcher.py

                allow_duplicates=allow_duplicates,
-                resource_attr=None,  # TODO This needs to be passed for multi-fidelity


What about resource attribute, shall we necer pass it?

What you do in the ICML paper, can easily be implemented with a searcher which does not know about multi-fidelity at all. Namely, we can use a specific StateForModelConverter, which converts the multi-fidelity data to single fidelity data before the searcher even sees it.

That is why, at least for now, I'd keep it simple here. I am rather a fan of keeping something simple and make it more complex when needed, because at that later point, you know exactly what is needed.

I think this is a good rule in general. For now this was just a todo placeholder so I am asking why we do we remove it since nothing changed. I guess it just sets a default so we might not have it here.

BTW: Before you implement something complex for the multi-fidelity case for ICML, let's talk. This reduction via state_converter is very simple and elegant, and allows the user to decouple the reduction multi_fid -> single_fid from the choice of the single_fid searcher/model.

jgolebiowski · 2023-05-30T10:27:25Z

setup.py

@@ -40,6 +40,7 @@ def load_benchmark_requirements():
 required_aws = load_requirements("requirements-aws.txt")
 required_moo = load_requirements("requirements-moo.txt")
 required_visual = load_requirements("requirements-visual.txt")
+required_sklearn = load_requirements("requirements-sklearn.txt")


Thats a nice change, thanks

jgolebiowski · 2023-05-30T10:27:34Z

examples/launch_sklearn_surrogate_bo.py

        searcher=searcher,
-        search_options={"debug_log": False},


OK, makes sense

jgolebiowski · 2023-05-30T10:40:23Z

syne_tune/optimizer/schedulers/searchers/bayesopt/models/sklearn_predictor.py

+    def keys_predict(self) -> Set[str]:
+        if self.sklearn_predictor.returns_std():
+            return {"mean", "std"}
+        else:
+            return {"mean"}

-        :param inputs: Input points, shape ``(n, d)``
-        :return: List of ``dict`` with keys :meth:`keys_predict`, of length 1
-        """
-
-        mean, std = self.contributed_predictor.predict(inputs)
-        outputs = {"mean": mean}
-        if std is not None:
-            outputs["std"] = std
+    def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
+        prediction = self.sklearn_predictor.predict(X=inputs)
+        if self.sklearn_predictor.returns_std():
+            outputs = {"mean": prediction[0], "std": prediction[1]}
+        else:
+            outputs = {"mean": prediction}


I have been thinking about a change like this a lot recently and I believe those functions should be directly implemented into the SKLearnPredictor for the following reasons:
1/ Most importantly, the SKLearnPredictor could provide outputs that are not only mean and stds, for example the ICML surrogates use quantiles to determine the predicted distribution. We should make the SKLearnPredictor flexible enough to account for that.
2/ The predictor already needs to provide the implementation of returns_std() therefore it already needs to know about different keys and whether to predict means, stds, etc.
3/ We add the complexity of having different outputs (mean, std, quantiles,etc.) and should do it all in one place (here SKLearnPredictor) rather than distributed between two classes. If we move those functions to the SKLearnPredictor, the Wrapper does not need to know about what is being predicted as long as it has a known format Dict[str, np.ndarray].

I would recommend something like:

class SKLearnPredictorWrapper(BasePredictor): """ Wrapper class for sklearn predictors returned by ``fit_from_state`` of :class:`~syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_estimator.SKLearnEstimatorWrapper`. """ def __init__( self, sklearn_predictor: SKLearnPredictor, state: TuningJobState, active_metric: Optional[str] = None, ): super().__init__(state, active_metric) self.sklearn_predictor = sklearn_predictor def keys_predict(self) -> Set[str]: return self.sklearn_predictor.keys_predict() def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]: prediction = self.sklearn_predictor.predict(X=inputs) return [prediction] class SKLearnPredictor: """ Base class for the sklearn predictors. """ @staticmethod def keys_predict() -> bool: """ :return: The quantities predicted by predict, most commonly 'mean' and 'std' """ raise NotImplementedError def predict( self, X: np.ndarray ) -> Dict[str, np.ndarray]: """ Returns signals which are statistics of the predictive distribution at input points ``inputs``. :param inputs: Input points, shape ``(n, d)`` :return: Dictionary with {quantity_name: quantity_array} predictive outputs. All quantity values are arrays of shape ``(n,)`` Most commonly this dictionary holds { "mean": predictive mean "std": predictive stddevs } """ raise NotImplementedError

We can do that, but then I think you have not really simplified things for the user.

My idea is that code in bayesopt/sklearn is simpler and closer to scikit-learn that code in bayesopt/models. Here, "simpler" means that it looks more like scikit-learn, and less like the specific API we need further above.

You started this way, and I like it (except the fact whether a predictor returns std or not, cannot be an argument, but a fixed property of the predictor).

If another case comes up, I'd rather create a different class for predictor and estimator in bayesopt/sklearn.

Up to you, but just pushing through all methods to bayesopt/sklearn does not make things easier there.

I think this is a good point but then we should commit to making it as simple as possible and only accept mean + std, like in the original code. Adding options to return std optionally makes it more contrived and defeats the purpose of making is as simple as possible. We should either have flexibility and return a dict or have simplicitly and return mean+std.

If we decide to go for only mean+std in the basic contributed surrigate, we would add different wrappers/predictors for different outputs like quantiles or mean-only. Ultimately, it is very hard to add do BO with just mean predictions so we probably need some extra code to make it work anyways (like computing std on held-out validation)

Hmm, but scikit-learn itself does this, right? If return_std=True, BayesianRidge returns means and stds, and otherwise it only returns means.

Note that this is not an option, but a fixed property of the object (or even of the class). A predictor either always returns (mean, std), or it always only returns mean.

I can see your point for improved flexibility but I think that with the Scikit interface the basic class should just be as simple to use as possible. This means just implementing fit() and predict() methods should be enough. The current implementation supports providing std=None so non-std functions are still supported if needed, the interface is just designed to make the default use-case as easy as possible and I think it is something we should retain.

Note that we can also add a more advanced SKLearnPredictor that has higher flexibility at the cost of complexity, it just won't be the default that we showcase as easy to use.

Well, so how do we proceed?

The current implementation is simply not correct. It is essential for things to work that keys_predict and the return of predict are consistent. As it stands, keys_predict says: I'll give you mean and std, but then maybe predict does not. An AF like EI thinks everything is fine, and the first call, things blow up.

I am all for simplicity, but things just have to be correct.

If you want keys_predict to remain as it is, then predict has to always return mean and std, and so a deterministic model cannot be covered.

I don't really understand why not being explicit about whether you return std or not, is making things complicated. This seems pretty basic to me, that you can make a statement about that.

The current implementation is simply not correct. It is essential for things to work that keys_predict and the return of predict are consistent. As it stands, keys_predict says: I'll give you mean and std, but then maybe predict does not. An AF like EI thinks everything is fine, and the first call, things blow up.

Aha I see, I missed this point before but this is indeed a good callout. I think in this case I would opt for the simplest way to fix the bug and ublock the tutorial: assert both mean and std are returned by the default surrogate. This is the surrogate you can use for the tutorials and it will be simple to understand albeit non-flexible.

Going beyond that, I have noted some ideas below. We can discuss them offline (tomorrow) if needed to align on the best strategy.

We generally need a good way to tackle predictors that return mean, mean+std, quantiles or anything else. I see the main issue is that the user implementing their own SKLearnPredictor needs to somehow encode the information about the returns of the predictor inside the class.
1/ One way is to have SKLearnPredictor return a dictionary thus moving all the logic about different outputs into SKLearnPredictor (that's my previous proposal). I see your points against it as it lacks simplicity.
2/ The second one is to assume multiple options for SKLearnPredictor and move the construction of output dictionaries into SKLearnPredictorWrapper such as below. Users can choose which predictor to subclass and the wrapper will figure out the rest which I hope will be simple in the default case but also flexible for advanced users.
3/ What you propose which is a mix of the two, it can work too but I prefer to have a sklearn surrogate which just needs one function to implment. Otherwise we risk non-advanced users getting lost and, for exmple, implementing mismatched versions of predict() and returns_std(). Maybe I am overly caucious but I hope options 1 or 2 will work.

class SKLearnPredictor: """ Base class for the sklearn predictors """ def predict( self, X: np.ndarray, return_std: bool = True ) -> Tuple[np.ndarray, np.ndarray]: """ Returns signals which are statistics of the predictive distribution at input points ``inputs``. :param inputs: Input points, shape ``(n, d)`` :return: Tuple with the following entries: * "mean": Predictive means in shape of ``(n,)`` * "std": Predictive stddevs, shape ``(n,)`` """ raise NotImplementedError() class SKLearnDeterminicticPredictor: """ Base class for the sklearn predictors """ def predict( self, X: np.ndarray, return_std: bool = True ) -> np.ndarray: """ Returns signals which are statistics of the predictive distribution at input points ``inputs``. :param inputs: Input points, shape ``(n, d)`` :return: "mean": Predictive means in shape of ``(n,)`` """ raise NotImplementedError() class SKLearnPredictorWrapper(BasePredictor): def __init__( self, contributed_predictor: Union[SKLearnPredictor, SKLearnDeterministicPredictor], state: TuningJobState, active_metric: Optional[str] = None, ): super().__init__(state, active_metric) self.contributed_predictor = contributed_predictor if isinstance(self.contributed_predictor, SKLearnPredictor): self.keys = {"mean", "std"} elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor): self.keys = {"mean"} else: raise AttributeError def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]: predictions = self.contributed_predictor.predict(inputs) if isinstance(self.contributed_predictor, SKLearnPredictor): assert isinstance(predictions, tuple), "SKLearnPredictor must return a tuple with Mean and Std" assert len(predictions) == 2, "SKLearnPredictor must return a tuple with Mean and Std" outputs = {"mean": predictions[0], "std": predictions[1]} elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor): assert isinstance(predictions, np.ndarray), "SKLearnPredictor must return a single array with the mean" outputs = {"mean": predictions} else: raise AttributeError return [outputs] def keys_predict(self) -> Set[str]: return self.keys

Your quantile case will be interesting indeed.

mseeger · 2023-05-30T15:37:11Z

There is also the need to refactor stuff in bayesopt/models/cost with your API, but cost models are deterministic, so we need to clarify how this is going to look like. Also, I think you need this for your applications.

jgolebiowski

LGTM

refactor: Refactoring of scikit-learn based surrogate models

a870f41

mseeger requested a review from jgolebiowski May 29, 2023 21:13

Small fix

78b9229

mseeger commented May 29, 2023

View reviewed changes

mseeger added 2 commits May 29, 2023 23:23

Fix lint

88b224c

restrict_configurations argument

31e9898

jgolebiowski requested changes May 30, 2023

View reviewed changes

Estimator needs to return both mean and std

d3a736a

Fix lint

ae1c41c

jgolebiowski approved these changes May 30, 2023

View reviewed changes

mseeger merged commit 1359817 into main May 30, 2023

mseeger deleted the refactor_sklearn branch May 30, 2023 15:54

jgolebiowski added the refactor label Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Refactoring of scikit-learn based surrogate models #699

refactor: Refactoring of scikit-learn based surrogate models #699

mseeger commented May 29, 2023

mseeger May 29, 2023

mseeger May 29, 2023

jgolebiowski May 30, 2023

codecov bot commented May 29, 2023 •

edited

Loading

mseeger May 29, 2023

jgolebiowski May 30, 2023

jgolebiowski left a comment

jgolebiowski May 30, 2023

mseeger May 30, 2023

mseeger May 30, 2023

jgolebiowski May 30, 2023

mseeger May 30, 2023

jgolebiowski May 30, 2023

jgolebiowski May 30, 2023

jgolebiowski May 30, 2023

mseeger May 30, 2023

mseeger May 30, 2023

jgolebiowski May 30, 2023 •

edited

Loading

mseeger May 30, 2023

mseeger May 30, 2023

jgolebiowski May 30, 2023

mseeger May 30, 2023

jgolebiowski May 30, 2023 •

edited

Loading

mseeger May 30, 2023

mseeger commented May 30, 2023

jgolebiowski left a comment

		allow_duplicates=allow_duplicates,
		resource_attr=None, # TODO This needs to be passed for multi-fidelity

refactor: Refactoring of scikit-learn based surrogate models #699

refactor: Refactoring of scikit-learn based surrogate models #699

Conversation

mseeger commented May 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 29, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgolebiowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgolebiowski May 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgolebiowski May 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseeger commented May 30, 2023

jgolebiowski left a comment

Choose a reason for hiding this comment

codecov bot commented May 29, 2023 •

edited

Loading

jgolebiowski May 30, 2023 •

edited

Loading

jgolebiowski May 30, 2023 •

edited

Loading