Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Refactoring of scikit-learn based surrogate models #699

Merged
merged 6 commits into from
May 30, 2023

Conversation

mseeger
Copy link
Collaborator

@mseeger mseeger commented May 29, 2023

Refactorings needed before I can include this in the BO dev tutorial:

  • Cleanup whether predictor returns (means, stds) or only means
  • New dependency tag sklearn
  • Some more contributed -> sklearn

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mseeger mseeger requested a review from jgolebiowski May 29, 2023 21:13
self.ridge.fit(X, y.ravel())
return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))


def main():
# Hyperparameter configuration space
if __name__ == "__main__":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this look like the other examples

searcher=searcher,
search_options={"debug_log": False},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass an object for searcher, there is no search_options

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense

@codecov
Copy link

codecov bot commented May 29, 2023

Codecov Report

Patch coverage: 76.19% and project coverage change: -0.02 ⚠️

Comparison is base (51cfe94) 65.85% compared to head (d3a736a) 65.83%.

❗ Current head d3a736a differs from pull request most recent head ae1c41c. Consider uploading reports for the commit ae1c41c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #699      +/-   ##
==========================================
- Coverage   65.85%   65.83%   -0.02%     
==========================================
  Files         400      401       +1     
  Lines       27868    27877       +9     
==========================================
+ Hits        18352    18354       +2     
- Misses       9516     9523       +7     
Impacted Files Coverage Δ
setup.py 0.00% <0.00%> (ø)
...optimizer/schedulers/searchers/gp_fifo_searcher.py 100.00% <ø> (ø)
...rs/searchers/sklearn/sklearn_surrogate_searcher.py 75.00% <ø> (ø)
...schedulers/searchers/bayesopt/sklearn/estimator.py 76.92% <50.00%> (-10.58%) ⬇️
...schedulers/searchers/bayesopt/sklearn/predictor.py 71.42% <50.00%> (ø)
syne_tune/try_import.py 46.87% <50.00%> (+0.20%) ⬆️
...ers/searchers/bayesopt/models/sklearn_estimator.py 90.00% <80.00%> (-10.00%) ⬇️
...ers/searchers/bayesopt/models/sklearn_predictor.py 86.66% <80.00%> (-7.46%) ⬇️
...archers/bayesopt/tuning_algorithms/base_classes.py 84.52% <100.00%> (-0.19%) ⬇️
...optimizer/schedulers/searchers/sklearn/__init__.py 100.00% <100.00%> (ø)
... and 1 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -40,6 +40,7 @@ def load_benchmark_requirements():
required_aws = load_requirements("requirements-aws.txt")
required_moo = load_requirements("requirements-moo.txt")
required_visual = load_requirements("requirements-visual.txt")
required_sklearn = load_requirements("requirements-sklearn.txt")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will collect requirements of the scikit-learn based surrogate models

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a nice change, thanks

Copy link
Collaborator

@jgolebiowski jgolebiowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been planning to make a similar change in preparation for the ICML code coming in. See comments on SKLearnPredictorWrapper for details.

allow_duplicates=allow_duplicates,
resource_attr=None, # TODO This needs to be passed for multi-fidelity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about resource attribute, shall we necer pass it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you do in the ICML paper, can easily be implemented with a searcher which does not know about multi-fidelity at all. Namely, we can use a specific StateForModelConverter, which converts the multi-fidelity data to single fidelity data before the searcher even sees it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why, at least for now, I'd keep it simple here. I am rather a fan of keeping something simple and make it more complex when needed, because at that later point, you know exactly what is needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good rule in general. For now this was just a todo placeholder so I am asking why we do we remove it since nothing changed. I guess it just sets a default so we might not have it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: Before you implement something complex for the multi-fidelity case for ICML, let's talk. This reduction via state_converter is very simple and elegant, and allows the user to decouple the reduction multi_fid -> single_fid from the choice of the single_fid searcher/model.

@@ -40,6 +40,7 @@ def load_benchmark_requirements():
required_aws = load_requirements("requirements-aws.txt")
required_moo = load_requirements("requirements-moo.txt")
required_visual = load_requirements("requirements-visual.txt")
required_sklearn = load_requirements("requirements-sklearn.txt")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a nice change, thanks

searcher=searcher,
search_options={"debug_log": False},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense

Comment on lines 43 to 54
def keys_predict(self) -> Set[str]:
if self.sklearn_predictor.returns_std():
return {"mean", "std"}
else:
return {"mean"}

:param inputs: Input points, shape ``(n, d)``
:return: List of ``dict`` with keys :meth:`keys_predict`, of length 1
"""

mean, std = self.contributed_predictor.predict(inputs)
outputs = {"mean": mean}
if std is not None:
outputs["std"] = std
def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
prediction = self.sklearn_predictor.predict(X=inputs)
if self.sklearn_predictor.returns_std():
outputs = {"mean": prediction[0], "std": prediction[1]}
else:
outputs = {"mean": prediction}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about a change like this a lot recently and I believe those functions should be directly implemented into the SKLearnPredictor for the following reasons:
1/ Most importantly, the SKLearnPredictor could provide outputs that are not only mean and stds, for example the ICML surrogates use quantiles to determine the predicted distribution. We should make the SKLearnPredictor flexible enough to account for that.
2/ The predictor already needs to provide the implementation of returns_std() therefore it already needs to know about different keys and whether to predict means, stds, etc.
3/ We add the complexity of having different outputs (mean, std, quantiles,etc.) and should do it all in one place (here SKLearnPredictor) rather than distributed between two classes. If we move those functions to the SKLearnPredictor, the Wrapper does not need to know about what is being predicted as long as it has a known format Dict[str, np.ndarray].

I would recommend something like:

class SKLearnPredictorWrapper(BasePredictor):
    """
    Wrapper class for sklearn predictors returned by ``fit_from_state`` of
    :class:`~syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_estimator.SKLearnEstimatorWrapper`.
    """

    def __init__(
        self,
        sklearn_predictor: SKLearnPredictor,
        state: TuningJobState,
        active_metric: Optional[str] = None,
    ):
        super().__init__(state, active_metric)
        self.sklearn_predictor = sklearn_predictor

    def keys_predict(self) -> Set[str]:
        return self.sklearn_predictor.keys_predict()

    def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
        prediction = self.sklearn_predictor.predict(X=inputs)
        return [prediction]

class SKLearnPredictor:
    """
    Base class for the sklearn predictors.
    """

    @staticmethod
    def keys_predict() -> bool:
        """
        :return: The quantities predicted by predict, most commonly 'mean' and 'std'
        """
        raise NotImplementedError

    def predict(
        self, X: np.ndarray
    ) -> Dict[str, np.ndarray]:
        """
        Returns signals which are statistics of the predictive distribution at
        input points ``inputs``.
        :param inputs: Input points, shape ``(n, d)``
        :return: Dictionary with {quantity_name: quantity_array} predictive outputs.
            All quantity values are arrays of shape ``(n,)`` 
            Most commonly this dictionary holds 
            {
                "mean": predictive mean  
                "std": predictive stddevs
            }
        """
        raise NotImplementedError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, but then I think you have not really simplified things for the user.

My idea is that code in bayesopt/sklearn is simpler and closer to scikit-learn that code in bayesopt/models. Here, "simpler" means that it looks more like scikit-learn, and less like the specific API we need further above.

You started this way, and I like it (except the fact whether a predictor returns std or not, cannot be an argument, but a fixed property of the predictor).

If another case comes up, I'd rather create a different class for predictor and estimator in bayesopt/sklearn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but just pushing through all methods to bayesopt/sklearn does not make things easier there.

Copy link
Collaborator

@jgolebiowski jgolebiowski May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good point but then we should commit to making it as simple as possible and only accept mean + std, like in the original code. Adding options to return std optionally makes it more contrived and defeats the purpose of making is as simple as possible. We should either have flexibility and return a dict or have simplicitly and return mean+std.

If we decide to go for only mean+std in the basic contributed surrigate, we would add different wrappers/predictors for different outputs like quantiles or mean-only. Ultimately, it is very hard to add do BO with just mean predictions so we probably need some extra code to make it work anyways (like computing std on held-out validation)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but scikit-learn itself does this, right? If return_std=True, BayesianRidge returns means and stds, and otherwise it only returns means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not an option, but a fixed property of the object (or even of the class). A predictor either always returns (mean, std), or it always only returns mean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see your point for improved flexibility but I think that with the Scikit interface the basic class should just be as simple to use as possible. This means just implementing fit() and predict() methods should be enough. The current implementation supports providing std=None so non-std functions are still supported if needed, the interface is just designed to make the default use-case as easy as possible and I think it is something we should retain.

Note that we can also add a more advanced SKLearnPredictor that has higher flexibility at the cost of complexity, it just won't be the default that we showcase as easy to use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, so how do we proceed?

The current implementation is simply not correct. It is essential for things to work that keys_predict and the return of predict are consistent. As it stands, keys_predict says: I'll give you mean and std, but then maybe predict does not. An AF like EI thinks everything is fine, and the first call, things blow up.

I am all for simplicity, but things just have to be correct.

If you want keys_predict to remain as it is, then predict has to always return mean and std, and so a deterministic model cannot be covered.

I don't really understand why not being explicit about whether you return std or not, is making things complicated. This seems pretty basic to me, that you can make a statement about that.

Copy link
Collaborator

@jgolebiowski jgolebiowski May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is simply not correct. It is essential for things to work that keys_predict and the return of predict are consistent. As it stands, keys_predict says: I'll give you mean and std, but then maybe predict does not. An AF like EI thinks everything is fine, and the first call, things blow up.

Aha I see, I missed this point before but this is indeed a good callout. I think in this case I would opt for the simplest way to fix the bug and ublock the tutorial: assert both mean and std are returned by the default surrogate. This is the surrogate you can use for the tutorials and it will be simple to understand albeit non-flexible.

Going beyond that, I have noted some ideas below. We can discuss them offline (tomorrow) if needed to align on the best strategy.

We generally need a good way to tackle predictors that return mean, mean+std, quantiles or anything else. I see the main issue is that the user implementing their own SKLearnPredictor needs to somehow encode the information about the returns of the predictor inside the class.
1/ One way is to have SKLearnPredictor return a dictionary thus moving all the logic about different outputs into SKLearnPredictor (that's my previous proposal). I see your points against it as it lacks simplicity.
2/ The second one is to assume multiple options for SKLearnPredictor and move the construction of output dictionaries into SKLearnPredictorWrapper such as below. Users can choose which predictor to subclass and the wrapper will figure out the rest which I hope will be simple in the default case but also flexible for advanced users.
3/ What you propose which is a mix of the two, it can work too but I prefer to have a sklearn surrogate which just needs one function to implment. Otherwise we risk non-advanced users getting lost and, for exmple, implementing mismatched versions of predict() and returns_std(). Maybe I am overly caucious but I hope options 1 or 2 will work.


class SKLearnPredictor:
    """
    Base class for the sklearn predictors
    """

    def predict(
        self, X: np.ndarray, return_std: bool = True
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Returns signals which are statistics of the predictive distribution at
        input points ``inputs``.


        :param inputs: Input points, shape ``(n, d)``
        :return: Tuple with the following entries:
            * "mean": Predictive means in shape of ``(n,)``
            * "std": Predictive stddevs, shape ``(n,)``
        """

        raise NotImplementedError()


class SKLearnDeterminicticPredictor:
    """
    Base class for the sklearn predictors
    """

    def predict(
        self, X: np.ndarray, return_std: bool = True
    ) -> np.ndarray:
        """
        Returns signals which are statistics of the predictive distribution at
        input points ``inputs``.


        :param inputs: Input points, shape ``(n, d)``
        :return: "mean": Predictive means in shape of ``(n,)``
        """

        raise NotImplementedError()
    
    
class SKLearnPredictorWrapper(BasePredictor):
    def __init__(
        self,
        contributed_predictor: Union[SKLearnPredictor, SKLearnDeterministicPredictor],
        state: TuningJobState,
        active_metric: Optional[str] = None,
    ):
        super().__init__(state, active_metric)
        self.contributed_predictor = contributed_predictor
        if isinstance(self.contributed_predictor, SKLearnPredictor):
            self.keys = {"mean", "std"}
        elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor):
            self.keys = {"mean"}
        else:
            raise AttributeError

    def predict(self, inputs: np.ndarray) -> List[Dict[str, np.ndarray]]:
        predictions = self.contributed_predictor.predict(inputs)
        if isinstance(self.contributed_predictor, SKLearnPredictor):
            assert isinstance(predictions, tuple), "SKLearnPredictor must return a tuple with Mean and Std"
            assert len(predictions) == 2, "SKLearnPredictor must return a tuple with Mean and Std"
            outputs = {"mean": predictions[0], "std": predictions[1]}
        elif isinstance(self.contributed_predictor, SKLearnDeterminicticPredictor):
            assert isinstance(predictions, np.ndarray), "SKLearnPredictor must return a  single array with the mean"
            outputs = {"mean": predictions}
        else:
            raise AttributeError

        return [outputs]
    
    def keys_predict(self) -> Set[str]:
        return self.keys

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your quantile case will be interesting indeed.

@mseeger
Copy link
Collaborator Author

mseeger commented May 30, 2023

There is also the need to refactor stuff in bayesopt/models/cost with your API, but cost models are deterministic, so we need to clarify how this is going to look like. Also, I think you need this for your applications.

Copy link
Collaborator

@jgolebiowski jgolebiowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mseeger mseeger merged commit 1359817 into main May 30, 2023
@mseeger mseeger deleted the refactor_sklearn branch May 30, 2023 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants