Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non GP model types in Botorch #1064

Closed
jduerholt opened this issue Jan 19, 2022 · 13 comments
Closed

Non GP model types in Botorch #1064

jduerholt opened this issue Jan 19, 2022 · 13 comments
Labels
enhancement New feature or request WIP Work in Progress

Comments

@jduerholt
Copy link
Contributor

Hi,

I was thinking about the possibility to use non GP models within botorch. For example to use a GP for one objective and a neural network (ensemble) for another one. Using just a neural network should already be possible via the GenericDeterministcModel

class GenericDeterministicModel(DeterministicModel):
via just hooking in a neural network written in torch as callable f.

In this case, uncertainty estimates from a NN ensemble could not be used. My idea was to implement a new type of Posterior that takes also the variance from an NN ensemble and return it as variance of the posterior.

def variance(self) -> Tensor:

This should it already allow to use the whole botorch machinery of analytical acquisition functions. Of course this assumes that the posterior is normally distributed. If one then also implements the rsample method of the posterior, then one should also be able to use the MC acquisiton functions.

Do you see any obstacles in this?

I see the benefit in the possibility of using other model types in situations in which they perform better than GPs and do not have to reimplement the great machinere of acquisition functions and so forth, which are already available in botorch.

Best,

Johannes

@wjmaddox
Copy link
Contributor

Yes, this is pretty possible (I have some research code doing exactly deep ensemble posteriors). In general, the mean and variance can be calculated using sample mean and variance of the network's output producing a normal approximation to the "posterior", while sampling can be done by selecting a random item or items in the list of networks.

The only gotcha is in dealing with multi-batched data as some classes of NNs don't like that in torch (I'm thinking things like RNNs and LSTMs on string inputs for example).

@Balandat
Copy link
Contributor

Yeah I did think about this when designing the APIs, so this should be possible without too much trouble (if not we should fix that). Basically, as long as you can have a posterior object that implements rsample and allows back-propagating gradients through samples that should work. Would love to see more use cases of this, actually.

@jduerholt
Copy link
Contributor Author

Thanks for the info. I will try to setup a simple MWE for this in the next month. Maybe I will come back with some questions then ;)

@jduerholt
Copy link
Contributor Author

Hi,

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior. Of course this assumes then also in the sampling process a normal distribution but doing it like this treats analytical and MC acqfs on the same footing.

Does this approach makes sense for you?

Best,

Johannes

@Balandat
Copy link
Contributor

Hmm could you elaborate a bit more on what exactly you mean by

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior.

Is this using Wesley's approach of using the sample mean and variance from the NN output? If you do this for sampling this seems a bit odd since you use samples from the "true posterior" of the network to fit a MVN and then sample from that. Why not just use the outputs from the NN output directly as "samples"? You could have a lightweight wrapper posterior object that just references the network internally, and where rsample just means computing the NN outputs. Or is the issue here that the NN ensemble is deterministic (conditional on the initially drawn ensemble) so that this sampling distribution would be discretely supported?

@jduerholt
Copy link
Contributor Author

Yes, I calculate the mean and variance over the prediction of each NN in the ensemble of NNs. With the mean and the variance alone, I can already use analytic ACQFS like EI, of course, this assumes, that the posterior is normal distributed. For being also able to use MC ACQFS, I just used GpytorchPosterior and was parameterizing the underlying mvn with the mean and variance of the ensemble prediction. With this I could use the already existing Posterior implementations. Of couse this also assumes a normally distributed Posterior, but at least I get with this the same ACQF values as with the analytic counterpart. Is it somehow clear what I mean?

But I think, I will also implement your suggestion of sampling the outputs directly. For BNNs one could then do the same.

@wjmaddox
Copy link
Contributor

In case you haven't already implemented it. I've managed to open-source a deep ensemble posterior class here that should be pretty generic and works with batching (the other code inthe file is pretty tightly specified into our research codebase for that paper).

@Balandat I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

@jduerholt
Copy link
Contributor Author

Thanks for sharing. Looks promising. I also try to add my implementation based on the multivariate normal at some point in the next weeks. Then both options are available.

@Balandat
Copy link
Contributor

Balandat commented Apr 8, 2022

I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

@wjmaddox that would be awesome! Did you have an end to end example using this that you can point to?

@wjmaddox
Copy link
Contributor

Yeah, here's roughly the link to the overall model class (https://github.com/samuelstanton/lambo/blob/7b67684b884f75f7007501978c5299514d0efb75/lambo/optimizers/pymoo.py#L343). As I think I mentioned previously, we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete. @samuelstanton can walk you through more of the code if necessary.

It's probably best for us to just pull out a simple notebook outside of our research code.

@Balandat
Copy link
Contributor

we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete

cc @sdaulton this could be a good real-world test case for some of your discrete optimization work.

@jduerholt
Copy link
Contributor Author

@wjmaddox @samuelstanton: I had a closer look at your implementation of the ensemble posterior. I like it! I would be willing to create a PR based on it to bring it directly into botorch.

I have one question. Maybe @Balandat could also help there:

From what I saw the MC ACQFs in the recent implementations of botorch always use rsample_from_base_samples. How would one implement this method for an ensemble?

In @wjmaddox and @samuelstanton implementation, the rsample method just returns the output of the requested ensemble models, which have been randomly permuted beforehand. If the number of requested samples is larger than the number of models in the ensemble an error is raised.

@esantorella esantorella added enhancement New feature or request WIP Work in Progress labels Jan 30, 2023
facebook-github-bot pushed a commit that referenced this issue Feb 11, 2023
Summary:
<!--
Thank you for sending the PR! We appreciate you spending the time to make BoTorch better.

Help us understand your motivation by explaining why you decided to make this change.

You can learn more about contributing to BoTorch here: https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md
-->

## Motivation

As discussed in #1064, this is an attempt to add a `EnsemblePosterior` to botorch, that could be used for example by NN ensembles.

I have problems with implementing `rsample` properly. I think my current implementation is not correct, it is based on `DeterministicPosterior`, but I think we should sample directly solutions from the individual predictions of the ensemble. But I do not know how to interprete `sample_shape` in this context.

As sampler, I registered the `StochasticSampler` for the new posterior class. But also, there I am not sure if this is correct. Furthermore, I have another question regarding `StochasticSampler`. It is stated in the docstring of `StochasticSampler` that it should not be used in combination with `optimize_acqf`. But `StochasticSampler` is assigned to the `DeterministicPosterior`. Does it mean that one cannot use a `ModelList` consisting of a `DeterministicModel` and GPs in combination with `optimize_acqf`?

Balandat: any suggestions on this?

### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)?

Yes.

Pull Request resolved: #1636

Test Plan: Unit tests. Not yet implemented/finished as it is still WIP.

Reviewed By: saitcakmak

Differential Revision: D43017184

Pulled By: Balandat

fbshipit-source-id: fd2ede2dbba82a40c466f8a178138ced0fcba5fe
@saitcakmak
Copy link
Contributor

Resolved by #1636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request WIP Work in Progress
Projects
None yet
Development

No branches or pull requests

5 participants