Non GP model types in Botorch #1064

jduerholt · 2022-01-19T08:50:24Z

Hi,

I was thinking about the possibility to use non GP models within botorch. For example to use a GP for one objective and a neural network (ensemble) for another one. Using just a neural network should already be possible via the GenericDeterministcModel

botorch/botorch/models/deterministic.py

Line 83 in f8da711

class GenericDeterministicModel(DeterministicModel):

via just hooking in a neural network written in torch as callable f.

In this case, uncertainty estimates from a NN ensemble could not be used. My idea was to implement a new type of Posterior that takes also the variance from an NN ensemble and return it as variance of the posterior.

botorch/botorch/posteriors/posterior.py

Line 56 in f8da711

def variance(self) -> Tensor:

This should it already allow to use the whole botorch machinery of analytical acquisition functions. Of course this assumes that the posterior is normally distributed. If one then also implements the rsample method of the posterior, then one should also be able to use the MC acquisiton functions.

Do you see any obstacles in this?

I see the benefit in the possibility of using other model types in situations in which they perform better than GPs and do not have to reimplement the great machinere of acquisition functions and so forth, which are already available in botorch.

Best,

Johannes

The text was updated successfully, but these errors were encountered:

wjmaddox · 2022-01-19T13:35:37Z

Yes, this is pretty possible (I have some research code doing exactly deep ensemble posteriors). In general, the mean and variance can be calculated using sample mean and variance of the network's output producing a normal approximation to the "posterior", while sampling can be done by selecting a random item or items in the list of networks.

The only gotcha is in dealing with multi-batched data as some classes of NNs don't like that in torch (I'm thinking things like RNNs and LSTMs on string inputs for example).

Balandat · 2022-01-19T15:49:27Z

Yeah I did think about this when designing the APIs, so this should be possible without too much trouble (if not we should fix that). Basically, as long as you can have a posterior object that implements rsample and allows back-propagating gradients through samples that should work. Would love to see more use cases of this, actually.

jduerholt · 2022-01-21T11:01:52Z

Thanks for the info. I will try to setup a simple MWE for this in the next month. Maybe I will come back with some questions then ;)

jduerholt · 2022-02-23T16:28:28Z

Hi,

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior. Of course this assumes then also in the sampling process a normal distribution but doing it like this treats analytical and MC acqfs on the same footing.

Does this approach makes sense for you?

Best,

Johannes

Balandat · 2022-02-23T16:43:02Z

Hmm could you elaborate a bit more on what exactly you mean by

I just tested using an NN ensemble within botorch and it works, both for analytical and as MC acqfs. I just represented the posterior as multivariate normal and used the GPytorchPosterior.

Is this using Wesley's approach of using the sample mean and variance from the NN output? If you do this for sampling this seems a bit odd since you use samples from the "true posterior" of the network to fit a MVN and then sample from that. Why not just use the outputs from the NN output directly as "samples"? You could have a lightweight wrapper posterior object that just references the network internally, and where rsample just means computing the NN outputs. Or is the issue here that the NN ensemble is deterministic (conditional on the initially drawn ensemble) so that this sampling distribution would be discretely supported?

jduerholt · 2022-02-24T00:26:41Z

Yes, I calculate the mean and variance over the prediction of each NN in the ensemble of NNs. With the mean and the variance alone, I can already use analytic ACQFS like EI, of course, this assumes, that the posterior is normal distributed. For being also able to use MC ACQFS, I just used GpytorchPosterior and was parameterizing the underlying mvn with the mean and variance of the ensemble prediction. With this I could use the already existing Posterior implementations. Of couse this also assumes a normally distributed Posterior, but at least I get with this the same ACQF values as with the analytic counterpart. Is it somehow clear what I mean?

But I think, I will also implement your suggestion of sampling the outputs directly. For BNNs one could then do the same.

wjmaddox · 2022-03-31T15:42:19Z

In case you haven't already implemented it. I've managed to open-source a deep ensemble posterior class here that should be pretty generic and works with batching (the other code inthe file is pretty tightly specified into our research codebase for that paper).

@Balandat I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

jduerholt · 2022-04-01T06:43:52Z

Thanks for sharing. Looks promising. I also try to add my implementation based on the multivariate normal at some point in the next weeks. Then both options are available.

Balandat · 2022-04-08T22:47:28Z

I'm happy to try to some variant of this up as a PR as well over the coming weeks if that'd be useful.

@wjmaddox that would be awesome! Did you have an end to end example using this that you can point to?

wjmaddox · 2022-04-14T15:06:44Z

Yeah, here's roughly the link to the overall model class (https://github.com/samuelstanton/lambo/blob/7b67684b884f75f7007501978c5299514d0efb75/lambo/optimizers/pymoo.py#L343). As I think I mentioned previously, we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete. @samuelstanton can walk you through more of the code if necessary.

It's probably best for us to just pull out a simple notebook outside of our research code.

Balandat · 2022-04-16T22:32:04Z

we were using genetic algorithms to optimize everything b/c the tasks we considered were discrete

cc @sdaulton this could be a good real-world test case for some of your discrete optimization work.

jduerholt · 2023-01-19T19:18:30Z

@wjmaddox @samuelstanton: I had a closer look at your implementation of the ensemble posterior. I like it! I would be willing to create a PR based on it to bring it directly into botorch.

I have one question. Maybe @Balandat could also help there:

From what I saw the MC ACQFs in the recent implementations of botorch always use rsample_from_base_samples. How would one implement this method for an ensemble?

In @wjmaddox and @samuelstanton implementation, the rsample method just returns the output of the requested ensemble models, which have been randomly permuted beforehand. If the number of requested samples is larger than the number of models in the ensemble an error is raised.

Summary:  ## Motivation As discussed in #1064, this is an attempt to add a `EnsemblePosterior` to botorch, that could be used for example by NN ensembles. I have problems with implementing `rsample` properly. I think my current implementation is not correct, it is based on `DeterministicPosterior`, but I think we should sample directly solutions from the individual predictions of the ensemble. But I do not know how to interprete `sample_shape` in this context. As sampler, I registered the `StochasticSampler` for the new posterior class. But also, there I am not sure if this is correct. Furthermore, I have another question regarding `StochasticSampler`. It is stated in the docstring of `StochasticSampler` that it should not be used in combination with `optimize_acqf`. But `StochasticSampler` is assigned to the `DeterministicPosterior`. Does it mean that one cannot use a `ModelList` consisting of a `DeterministicModel` and GPs in combination with `optimize_acqf`? Balandat: any suggestions on this? ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes. Pull Request resolved: #1636 Test Plan: Unit tests. Not yet implemented/finished as it is still WIP. Reviewed By: saitcakmak Differential Revision: D43017184 Pulled By: Balandat fbshipit-source-id: fd2ede2dbba82a40c466f8a178138ced0fcba5fe

saitcakmak · 2024-05-31T19:51:41Z

Resolved by #1636

jduerholt mentioned this issue Jan 20, 2023

Ensemble Posterior #1636

Closed

esantorella added enhancement New feature or request WIP Work in Progress labels Jan 30, 2023

saitcakmak closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non GP model types in Botorch #1064

Non GP model types in Botorch #1064

jduerholt commented Jan 19, 2022

wjmaddox commented Jan 19, 2022

Balandat commented Jan 19, 2022

jduerholt commented Jan 21, 2022

jduerholt commented Feb 23, 2022

Balandat commented Feb 23, 2022

jduerholt commented Feb 24, 2022

wjmaddox commented Mar 31, 2022

jduerholt commented Apr 1, 2022

Balandat commented Apr 8, 2022

wjmaddox commented Apr 14, 2022

Balandat commented Apr 16, 2022

jduerholt commented Jan 19, 2023

saitcakmak commented May 31, 2024

Non GP model types in Botorch #1064

Non GP model types in Botorch #1064

Comments

jduerholt commented Jan 19, 2022

wjmaddox commented Jan 19, 2022

Balandat commented Jan 19, 2022

jduerholt commented Jan 21, 2022

jduerholt commented Feb 23, 2022

Balandat commented Feb 23, 2022

jduerholt commented Feb 24, 2022

wjmaddox commented Mar 31, 2022

jduerholt commented Apr 1, 2022

Balandat commented Apr 8, 2022

wjmaddox commented Apr 14, 2022

Balandat commented Apr 16, 2022

jduerholt commented Jan 19, 2023

saitcakmak commented May 31, 2024