Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble Posterior #1636

Closed
wants to merge 64 commits into from
Closed

Conversation

jduerholt
Copy link
Contributor

Motivation

As discussed in #1064, this is an attempt to add a EnsemblePosterior to botorch, that could be used for example by NN ensembles.

I have problems with implementing rsample properly. I think my current implementation is not correct, it is based on DeterministicPosterior, but I think we should sample directly solutions from the individual predictions of the ensemble. But I do not know how to interprete sample_shape in this context.

As sampler, I registered the StochasticSampler for the new posterior class. But also, there I am not sure if this is correct. Furthermore, I have another question regarding StochasticSampler. It is stated in the docstring of StochasticSampler that it should not be used in combination with optimize_acqf. But StochasticSampler is assigned to the DeterministicPosterior. Does it mean that one cannot use a ModelList consisting of a DeterministicModel and GPs in combination with optimize_acqf?

@Balandat: any suggestions on this?

Have you read the Contributing Guidelines on pull requests?

Yes.

Test Plan

Unit tests. Not yet implemented/finished as it is still WIP.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 20, 2023
@codecov
Copy link

codecov bot commented Jan 20, 2023

Codecov Report

Merging #1636 (3e83ea1) into main (72e872a) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##              main    #1636      +/-   ##
===========================================
- Coverage   100.00%   99.99%   -0.01%     
===========================================
  Files          155      169      +14     
  Lines        13788    14444     +656     
===========================================
+ Hits         13788    14443     +655     
- Misses           0        1       +1     
Impacted Files Coverage Δ
botorch/models/deterministic.py 100.00% <100.00%> (ø)
botorch/models/ensemble.py 100.00% <100.00%> (ø)
botorch/posteriors/deterministic.py 100.00% <100.00%> (ø)
botorch/posteriors/ensemble.py 100.00% <100.00%> (ø)
botorch/sampling/get_sampler.py 100.00% <100.00%> (ø)
botorch/sampling/index_sampler.py 100.00% <100.00%> (ø)
botorch/optim/parameter_constraints.py 99.27% <0.00%> (-0.73%) ⬇️
botorch/optim/fit.py 100.00% <0.00%> (ø)
botorch/generation/gen.py 100.00% <0.00%> (ø)
botorch/optim/optimize.py 100.00% <0.00%> (ø)
... and 20 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Balandat
Copy link
Contributor

Hmm so one high level question I have here is why the EnsemblePosterior operates of a finite set of values rather than keeping a reference to the model and then evaluating that in the way @wjmaddox's code did here: https://github.com/samuelstanton/lambo/blob/dec2f303d847efefc4392ebfd2d78226d543388c/lambo/models/deep_ensemble.py#L177

Is the idea to just predict the output for all models in the model ensemble and then randomly sample from that (rather than re-evaluating a subset of the ensemble models)? I guess that seems reasonable if the ensemble models are explicitly defined and we don't have any scalability issues. But if the ensemble models are lazily defined by some parameters and only actually built upon evaluation this could cause issues.

Assuming we're in the case where we can easily do this, then I think there may be some confusion about sample_shape. Basically this being an ensemble of (I take it determinstic) models, the number of different samples that can possibly be drawn here is finite (s in your case). So it seems to me that rsample(sample_shape) should essentially grab sample_shape.numel() elements from your values tensor and then arrange it appropriately, sth like this:

indcs = torch.randint(values.shape[-1], sample_shape)
samples = values[..., indcs]
samples = samples.permute(
    *range(-1, -len(sample_shape) - 1, -1),
    *range(len(values) - 1)
)

It is stated in the docstring of StochasticSampler that it should not be used in combination with optimize_acqf. But StochasticSampler is assigned to the DeterministicPosterior. Does it mean that one cannot use a ModelList consisting of a DeterministicModel and GPs in combination with optimize_acqf?

So DeterministicPosterior isn't really doing anything, it's mostly just a crutch to make sure that deterministic "models" can be used in the same way as other probabilistic models. So using StochasticSampler is just out of convenience here. I guess the proper thing would be to have a DeterminsticSampler that doesn't do anything either, but I guess we wanted to avoid inflation of dummy classes here? cc @saitcakmak.

The point about not using a stochastic sampler with optimize_acqf is that by default we use optimization algorithms that assume a deterministic function and break in bad ways if that's not the case. But in this special case of a DeterministicPosterior there is actually no randomness and the used StochasticSampler does indeed produce deterministic values so all is good.

For this ensemble posterior I do think it makes sense to use StochasticSampler. You'll probably want to add the EnsemblePosterior to this check here though: https://github.com/pytorch/botorch/blob/main/botorch/sampling/stochastic_samplers.py#L80

Now if you do want to use deterministic optimization algorithms the idea would essentially be to fix the models that are used to generate the "samples". This would mean that the "base samples" are essentially the indices that you'd use to index into the values- these would stay fixed on the sampler, but a new model.posterior(X) call would generate a new values at a new X ( and so if the indices are fixed across evaluations and the model is differentiable then the samples will be differentiable w.r.t. X).

Does this make sense?

@jduerholt
Copy link
Contributor Author

Hi @Balandat,

thanks for your detailed answer. This is just my first attempt to integrate an EnsemblePosterior into botorch, so I am very happy about your comments and open to changes of the concept.

To your high level question:

I saw @wjmaddox code, but I programmed it with the intention that we compute the output of all models first and then sample from this values as you described, as for example also a Random Forest is doing it. I have not thought about what you described about lazily defined models and/or scalability issues.

So what is your recommendation here? Which solution would you prefer? What I like about my proposal is the similarity to the other models and posteriors in botorch (I also added an abstract EnsembleModel class.). A model has a posterior method which calls the forward method in which inference is performed.

Thanks for your explanation for the StochasticSampler and the DeterministicModel. I was just wondering about the doctstring there. Maybe we should add there just a sentence regarding the DeterministicModel as exception.

My intention for the EnsemblePosterior is to be able to use it with deterministic optimization algorithms. So which sampler should one register for it to have the base_samples as indices that are fixed for consecutive calls? In case one wants to use the EnsemblePosterior with stochastic optimizer, one could then register another sampler.

If we would follow my initial idea, the abstract EnsembleModel class would be very similar to the DeterministicModel class. Only difference is in principle just the different posterior class. So one should thing about introducing a new base class from both DeterministicModel and EnsembleModel are inheriting from that implements an abstract method _posterior which returns the respective posterior objects.

Best,

Johannes

@Balandat
Copy link
Contributor

So what is your recommendation here? Which solution would you prefer? What I like about my proposal is the similarity to the other models and posteriors in botorch (I also added an abstract EnsembleModel class.). A model has a posterior method which calls the forward method in which inference is performed.

Yes we definitely want to retain this API pattern. But one could do this in the two different ways I mentioned above.

Your version makes sense if there is a full ensemble of models of reasonably small ensemble size so that computing the values across all models is not a big deal and we can always do that. Then we could have the sampler just pick out a random set of ensemble items. Of course if they're all computed already and most of the work has been done it stands to reason why we would actually sample in the first place rather than just use all computed outputs.

The other idea would be to have the ensemble model be a model where the posterior call just produces an object that keeps a reference back to the model, similar to Wesley's implementation. Then sampling would mean selecting a subset of models from the ensemble to compute the output when sampling from the posterior. That is, models would be evaluated in a lazy fashion as requested by the sampler. This would have the benefit of avoiding wasting computation on evaluating all models in the ensemble model even if their outputs are never used.

Thanks for your explanation for the StochasticSampler and the DeterministicModel. I was just wondering about the doctstring there. Maybe we should add there just a sentence regarding the DeterministicModel as exception.

I wonder if we should just be very clear here by implementing a dummy sampler that just evaluates DeterministicModels rather than overloading the meaning of StochasticSampler. Curious about @saitcakmak's thoughts on this.

My intention for the EnsemblePosterior is to be able to use it with deterministic optimization algorithms. So which sampler should one register for it to have the base_samples as indices that are fixed for consecutive calls? In case one wants to use the EnsemblePosterior with stochastic optimizer, one could then register another sampler.

For deterministic optimization we'd probably want to implement something like an EnsembleSampler or IndexSampler. This would store the base samples as indices into the list of available samples and then index into the posterior whenever called on a Posterior object.

@jduerholt
Copy link
Contributor Author

So one Todo would be definitely to implement this IndexSampler.

Concerning the two possible design patterns you outlined above, I am open to both, but the more I think about Wesleys implementation, the more I see its benefits. So I will change the PR accordingly and we decide to follow this design pattern.

Fine for you?

@Balandat
Copy link
Contributor

I don't think it necessarily has to be either / or. One could have two different types of posterior, one that just contains a set of samples, and one that keeps the reference to the model and performs lazy evaluation. So we could start with whatever is easier / the more immediate need.

@jduerholt
Copy link
Contributor Author

Ok, I will then just proceed with what I have currently and focus on implementing the IndexSampler and rsample correctly. Both is also needed for the other version. I will also change the default shape of values from batch_shape x q x m x sample_shape to
sample_shape x batch_shape x q x m, so that it is consistent with the rest of the code, as for example here:

samples: A `sample_shape x batch_shape x q x m`-dim Tensors of

It could be that there will be more questions over the course of the week ;-)

@jduerholt
Copy link
Contributor Author

Here comes already the first question:

In the docstring of the abstract rsample method it is defined that sample_shape is a torch.Size object which specifies the number of samples and the number of batches to sample. Here it has to be different, or?

The IndexSampler would not pass a torch.Size object to rsample but a torch.Tensor of dtype=int64, holding the indices to sample for each batch. Is this new meaning of sample_shape in rsample ok?

Or should we implement rsample as you described above and in the usual way and implement a method rsample_by_indices for the EnsemblePosterior which is then called by the IndexSampler instead of the conventional rsample?

Best,

Johannes

@Balandat
Copy link
Contributor

The IndexSampler would not pass a torch.Size object to rsample but a torch.Tensor of dtype=int64, holding the indices to sample for each batch. Is this new meaning of sample_shape in rsample ok?

We can consider the index samples produced by the sampler as "base samples". Check out how the Normal samplers do this where the base samples are iid Normal, they call a rsample_from_base_samples method defined on the posterior here: https://github.com/pytorch/botorch/blob/main/botorch/sampling/normal.py#L43-L49. You'd implement a similar method on the EnsemblePosterior. In EnsemblePosterior.rsample you can then just generate those samples (if not using a Sampler) and call rsample_from_base_samples, similar to what the GPyTorchPosterior does here: https://github.com/pytorch/botorch/blob/main/botorch/posteriors/gpytorch.py#L180-L182

@saitcakmak
Copy link
Contributor

Sorry about the confusion around StochasticSampler and DeterministicModel. I was just avoiding creating a dummy class that's identical to the StochasticSampler. I'll add one to avoid any confusion going forward. Otherwise, @Balandat's explanation above is on point.

Regarding an IndexSampler, I agree that it'd be best to just use the indices as the base_samples and keep the API consistent by defining a rsample_from_base_samples that uses these index base_samples. The rsample can then simply sample sample_shape indices and pass these to rsample_from_base_samples as the base_samples.

The motivation behind rsample and rsample_from_base_samples is to keep the rsample API consistent with the rsample of torch.Distribution, which produces differentiable but not necessarily deterministic samples; and use rsample_from_base_samples to ensure that the samples are deterministic and can be optimized with a gradient based deterministic optimizer, such as the L-BFGS-B used in optimize_acqf.

@jduerholt
Copy link
Contributor Author

jduerholt commented Jan 28, 2023

Hi @saitcakmak and @Balandat,

I just added the rsample_from_base_samples (including an update of rsample) and the IndexSampler. Perhaps, you could have a look and tell me if this looks as you were describing it above.

Note, I kept the shape of values as batch_shape x q x m x posterior_size. For this reason, the sampling is conducted over the last dimension, followed by a reshape.

Best,

Johannes

P.S: I know it is buggy, I just want to make sure that it goes in the correct direction ;)

@jduerholt jduerholt marked this pull request as draft January 28, 2023 21:53
Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this lgtm! I think one thing we could consider is have the ensemble dimension be between the batch shape and the q dimension - that feels a bit more natural to me as usually trailing dimensions are outcome dimensions throughout the code base. But really I'm not sure if it makes a difference.

Comment on lines 78 to 85
# Note: we use a Tensor instance check so that `observation_noise = True`
# just gets ignored. This avoids having to do a bunch of case distinctions
# when using a ModelList.
if isinstance(kwargs.get("observation_noise"), Tensor):
# TODO: Consider returning an MVN here instead
raise UnsupportedError(
"Deterministic models do not support observation noise."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One deeper question here is whether EnsembleModel should be limited to deterministic models or if (possibly down the road) want it to be more general and also support other models (e.g. GPs). We currently have a SaasFullyBayesianSingleTaskGP that essentially is a batched model whose posterior call returns a FullyBayesianPosterior.

It seems that a natural hierarchy would be to have generic Ensemble* versions from which we derive both the deterministic and GP ensemble models.

Not suggesting this is something we should do as part of this PR, but something to think about. cc @saitcakmak, @dme65, @sdaulton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel competent enough to answer this question. I commented below regarding the similarity between the current version of EnsembleModel and DeterministicModel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, everything is an ensemble of a single model, unless it is an ensemble of multiple models. I don't see any immediate benefit from such a refactor, so I'd say lets revisit this if we end up with more ensemble models in the future.

jduerholt and others added 3 commits January 30, 2023 15:16
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
Co-authored-by: Max Balandat <Balandat@users.noreply.github.com>
@jduerholt
Copy link
Contributor Author

A few more minor points, o/w this looks great. I'd like to get some more input from other folks who have been working on the transforms more than I have more recently to make sure these changes cause any unforeseen issues.

I changed it according to your points. The one thing, I could not change accordingly is explained above. I am looking forward to the comments of the other folks and hope to get it merged soon ;)

Copy link
Contributor

@saitcakmak saitcakmak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this. A bunch of nits in line, lgtm otherwise.

Comment on lines 78 to 85
# Note: we use a Tensor instance check so that `observation_noise = True`
# just gets ignored. This avoids having to do a bunch of case distinctions
# when using a ModelList.
if isinstance(kwargs.get("observation_noise"), Tensor):
# TODO: Consider returning an MVN here instead
raise UnsupportedError(
"Deterministic models do not support observation noise."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, everything is an ensemble of a single model, unless it is an ensemble of multiple models. I don't see any immediate benefit from such a refactor, so I'd say lets revisit this if we end up with more ensemble models in the future.

jduerholt and others added 12 commits February 8, 2023 08:55
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
Co-authored-by: Sait Cakmak <saitcakmak@outlook.com>
@jduerholt
Copy link
Contributor Author

@saitcakmak : thanks for your review. I changed the PR according to your suggestions. I only let you a question for the DeterministicSampler.

@facebook-github-bot
Copy link
Contributor

@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@saitcakmak
Copy link
Contributor

Hmm, codecov shows missing coverage for a line that's just a comment (https://app.codecov.io/gh/pytorch/botorch/pull/1636#597cdf37e0736ca95369f0ff0f74eff4-R244). I'll re-run the tests to see if it disappears.

@saitcakmak
Copy link
Contributor

I think something is wrong with codecov on that one. It's a file not touched by PR, the line is a comment and this diff doesn't remove any tests. I wouldn't worry about it

@facebook-github-bot
Copy link
Contributor

@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Balandat merged this pull request in a8efd76.

esantorella added a commit to esantorella/botorch that referenced this pull request Jul 2, 2024
…ticSampler (pytorch#2391)

Summary:
## Motivation

Delete references to deprecated DeterministicPosterior and DeterministicSampler, replacing `DeterministicPosterior` with `EnsemblePosterior` where appropriate. These were deprecated before 0.9.0, so now that we are past 0.11.0, they can be reaped.

Pull Request resolved: pytorch#2391

Test Plan:
Replaced `DeterministicPosterior` with `EnsemblePosterior` in tests

## Related PRs

pytorch#1636

Differential Revision: D59057165

Reviewed By: Balandat

Pulled By: Balandat
facebook-github-bot pushed a commit that referenced this pull request Jul 2, 2024
…ticSampler (#2391)

Summary:
## Motivation

Delete references to deprecated DeterministicPosterior and DeterministicSampler, replacing `DeterministicPosterior` with `EnsemblePosterior` where appropriate. These were deprecated before 0.9.0, so now that we are past 0.11.0, they can be reaped.

Pull Request resolved: #2391

Test Plan:
Replaced `DeterministicPosterior` with `EnsemblePosterior` in tests

## Related PRs

#1636

Reviewed By: Balandat

Differential Revision: D59057165

Pulled By: esantorella

fbshipit-source-id: 70a8405d0e75d414a685192808e8c0b18a6aca92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants