-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add entropy search acquisition functions #1458
Conversation
Thanks for putting this up. I will review shortly. Note that BoTorch requires 100% test coverage, so you will need to write unit tests before this PR can be merged in. |
Codecov Report
@@ Coverage Diff @@
## main #1458 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 149 153 +4
Lines 12936 13567 +631
==========================================
+ Hits 12936 13567 +631
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Regarding testing, my comment was not intended to be criticizing. I am super excited about your PR for these AFs. The code looks quite clean. I just wanted to flag the need for unit tests since that will likely be a fair bit of work. There are many examples of testing other AFs, e.g. https://github.com/pytorch/botorch/blob/main/test/acquisition/multi_objective/test_monte_carlo.py. Let me know if you have any questions on testing. Thanks again for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up. This is a partial initial pass (I didn't make it all the way through the PR).
One high level comment is that we should make sure that this is dtype/device agnostic, so that we can run it on a GPU--the main blocker at the moment is initializing tensors (e.g. torch.ones()
) without specifying dtype and device. Typically these can be fetched from some other tensor (e.g. X).
Another comment is that much of the code is shared between JES and MES (for example in __init__
. It would be good to consolidate this into one common base class.
Also, it would be nice to highlight in the docstrings what functions are differentiable/not for easy of use.
hypercell_bounds: Tensor, | ||
maximize: bool = True, | ||
X_pending: Optional[Tensor] = None, | ||
estimation_type: str = "Lower bound", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A python enum might be nice for listing the options and removing the risk of string typos.
maximize: bool = True, | ||
X_pending: Optional[Tensor] = None, | ||
estimation_type: str = "Lower bound", | ||
only_diagonal: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only_diagonal: Optional[bool] = False, | |
only_diagonal: bool = False, |
num_samples: Optional[int] = 64, | ||
num_constraints: Optional[int] = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_samples: Optional[int] = 64, | |
num_constraints: Optional[int] = 0, | |
num_samples: int = 64, | |
num_constraints: int = 0, |
dominated space union the infeasible space. | ||
maximize: If True, consider the problem a maximization problem. | ||
X_pending: A `m x d`-dim Tensor of `m` design points that have been | ||
submitted for function evaluation but have not yet been evaluated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted for function evaluation but have not yet been evaluated. | |
submitted for function evaluation, but have not yet been evaluated. |
estimation_type: A string to determine which entropy estimate is | ||
computed: "Noiseless", "Noiseless lower bound", "Lower bound" or | ||
"Monte Carlo". | ||
only_diagonal: If true we only compute the diagonal elements of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only_diagonal: If true we only compute the diagonal elements of the | |
only_diagonal: If true, we only compute the diagonal elements of the |
verbose: Optional[bool] = False, | ||
**kwargs: Any, | ||
) -> None: | ||
r"""Predictive entropy search acquisition function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is code is pretty dense. Thanks for documenting it so well. Easy to see why there are so few implementation of this...
Returns | ||
A `x_shape`-dim Tensor. | ||
""" | ||
normal = Normal(torch.zeros_like(x), torch.ones_like(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype/device
|
||
|
||
def _initialize_predictive_matrices( | ||
X: Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra indent
observation_noise: Optional[bool] = True, | ||
jitter: Optional[float] = 1e-4, | ||
natural: Optional[bool] = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove optional
observation_noise: If true the posterior is computed with observation noise. | ||
jitter: The jitter added to the covariance matrix. | ||
natural: If true we compute the natural statistics as well. | ||
Return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to follow this format for the autodocs:
botorch/botorch/optim/optimize.py
Lines 110 to 118 in 970664c
Returns: | |
A two-element tuple containing | |
- a `(num_restarts) x q x d`-dim tensor of generated candidates. | |
- a tensor of associated acquisition values. If `sequential=False`, | |
this is a `(num_restarts)`-dim tensor of joint acquisition values | |
(with explicit restart dimension if `return_best_only=False`). If | |
`sequential=True`, this is a `q`-dim tensor of expected acquisition | |
values conditional on having observed candidates `0,1,...,i-1`. |
@benmltu , thanks so much for the PR—the code here is very high quality, as are these ES implementations. This will be super valuable to the community. I am also looking forward to seeing if there are ways to further accelerate JES through GPU-based computation once some of these type issues have been resolved. |
I figured I'd just chime in here. I'm also extremely grateful for this your job, thanks a lot. I've went through the code, tried to run it and to implement my own stuff. As far as plugging in different versions of JES, our differences are very minor and can easily be accounted for within the current structure (and probably not even two different classes). Otherwise, I only have a couple of comments:
Once again, thanks a lot! |
Thanks for all the comments, I've updated the code to address many of the issues. Main changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks so much for the contribution and the many updates in response to the review. This is really great to see.
One high-level comment is that In general we're trying to keep the dependencies of botorch as lightweight as possible, so it would be great if we could make pymoo
an optional dependency. One way to do that would be to do something like
try:
import pymoo
PYMOO_INSTALLED = True
except ImportError:
PYMOO_INSTALLED = False
at the beginning of the sample_pareto.py
module, move the imports from pymoo into the functions inside the module, and if PYMOO_INSTALLED
is False
just raise an informative exception prompting the user to install pymoo if they would like to the newly added entropy search acquisition functions.
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@benmltu FYI, I merged the "main" branch into this one, so please do a "git pull" before doing any local development. I also imported it into an internal Meta repo to see if internal tools catch any issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates and adding tests!
@@ -161,3 +167,121 @@ def prune_inferior_points_multi_objective( | |||
effective_n_w = obj_vals.shape[-2] // X.shape[-2] | |||
idcs = (idcs / effective_n_w).long().unique() | |||
return X[idcs] | |||
|
|||
|
|||
def compute_sample_box_decomposition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could be greatly simplified by using a BoxDecompositionList, which already pads the box decompositions to make sure all box decompositions in the list have the same number of boxes:
botorch/botorch/utils/multi_objective/box_decompositions/box_decomposition_list.py
Lines 75 to 76 in e5391cf
# pad the decomposition with empty cells so that all | |
# decompositions have the same number of cells |
botorch/utils/sample_pareto.py
Outdated
num_generations: int = 100, | ||
pop_size: int = 100, | ||
num_offsprings: int = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm are these good defaults? 10 offsprings seems quite small. Why not use the pymoo defaults?
botorch/utils/sample_pareto.py
Outdated
) -> Tuple[Tensor, Tensor]: | ||
r"""Computes the Pareto optimal set and front from samples of the GP. | ||
|
||
(i) Samples are generated using random Fourier features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need not be RFFs. It could also use decoupled sampling. Let's add a TODO to generalize the GP sampling bit
botorch/utils/sample_pareto.py
Outdated
M = model.num_outputs | ||
d = bounds.shape[-1] | ||
|
||
if M == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this function should simply raise an exception if M==1
.
botorch/utils/sample_pareto.py
Outdated
"Only found " | ||
+ str(num_pareto_generated) | ||
+ " Pareto efficient points instead of " | ||
+ str(num_pareto_points) | ||
+ "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use f-strings
botorch/utils/sample_pareto.py
Outdated
up = hypercell_bounds[1].unsqueeze(0) | ||
|
||
# Compute the sample hypervolume improvement. | ||
hvi = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this would be handy to have as an instance method of BoxDecomposition
s (e.g. a method to compute the incremental HVI of a new point --- or more generally (batch_shape x) 1 x M
new points)
botorch/utils/sample_pareto.py
Outdated
# LICENSE file in the root directory of this source tree. | ||
|
||
r""" | ||
Some methods for sampling the Pareto optimal points. This relies on the pymoo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should live in botorch/utils/multi_objective/sample_pareto.py
optimal_outputs: A `num_samples x 1`-dim Tensor containing the optimal | ||
set of objectives of dimension `1`. | ||
maximize: If true, we consider a maximization problem. | ||
hypercell_bounds: A `num_samples x 2 x J x 1`-dim Tensor containing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm these calling these hypercell bounds is odd, since they are not hyperrectangles, but rather lines... I guess it makes more sense in the constrained case
from torch import Tensor | ||
|
||
|
||
class qLowerBoundJointEntropySearch(qLowerBoundMultiObjectiveJointEntropySearch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's seems kind of odd to have the single objective version inherit from the multi-objective version (just an observation, not a criticism)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I wasn't too keen on this but it was a bit involved doing it the other way round, especially for PES
# "Fantasy observations can only be added after making predictions with a | ||
# model so that all test independent caches exist." | ||
|
||
if condition_on_samples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific for qLowerBoundMultiObjectiveJointEntropySearch
? If so, let's move it into qLowerBoundMultiObjectiveJointEntropySearch
I've added a small update now which removes the pymoo optimizer and replaces it with a basic random search optimizer. |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
How does this affect the performance of the algorithm (both in terms of anytime optimization performance and runtime?) |
fs = old_factor.shape | ||
|
||
df = damping_factor | ||
for i in range(len(fs[len(bs) :])): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in range(len(fs[len(bs) :])): | |
for _ in range(len(fs[len(bs) :])): |
Addressing a complaint from an internal linter
] = random_search_optimizer, | ||
num_rff_features: int = 512, | ||
maximize: bool = True, | ||
optimizer_kwargs: dict = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer_kwargs: dict = None, | |
optimizer_kwargs: Optional[Dict[str, Any]] = None, |
addressing a complaint from a Meta internal linter
posterior_statistics = dict() | ||
# Compute the prior entropy term depending on `X`. | ||
initial_posterior_plus_noise = self.initial_model.posterior( | ||
X, observation_noise=True | ||
) | ||
|
||
# Additional constant term. | ||
add_term = ( | ||
0.5 | ||
* self.model.num_outputs | ||
* (1 + torch.log(2 * pi * torch.ones(1, **tkwargs))) | ||
) | ||
# The variance initially has shape `batch_shape x (q*M) x (q*M)` | ||
# prior_entropy has shape `batch_shape`. | ||
initial_entropy = add_term + 0.5 * torch.logdet( | ||
initial_posterior_plus_noise.mvn.covariance_matrix | ||
) | ||
|
||
posterior_statistics["initial_entropy"] = initial_entropy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posterior_statistics = dict() | |
# Compute the prior entropy term depending on `X`. | |
initial_posterior_plus_noise = self.initial_model.posterior( | |
X, observation_noise=True | |
) | |
# Additional constant term. | |
add_term = ( | |
0.5 | |
* self.model.num_outputs | |
* (1 + torch.log(2 * pi * torch.ones(1, **tkwargs))) | |
) | |
# The variance initially has shape `batch_shape x (q*M) x (q*M)` | |
# prior_entropy has shape `batch_shape`. | |
initial_entropy = add_term + 0.5 * torch.logdet( | |
initial_posterior_plus_noise.mvn.covariance_matrix | |
) | |
posterior_statistics["initial_entropy"] = initial_entropy | |
# Compute the prior entropy term depending on `X`. | |
initial_posterior_plus_noise = self.initial_model.posterior( | |
X, observation_noise=True | |
) | |
# Additional constant term. | |
add_term = ( | |
0.5 | |
* self.model.num_outputs | |
* (1 + torch.log(2 * pi * torch.ones(1, **tkwargs))) | |
) | |
# The variance initially has shape `batch_shape x (q*M) x (q*M)` | |
# prior_entropy has shape `batch_shape`. | |
initial_entropy = add_term + 0.5 * torch.logdet( | |
initial_posterior_plus_noise.mvn.covariance_matrix | |
) | |
posterior_statistics = {"initial_entropy": initial_entropy} |
A Meta-internal linter complains about the dict()
call, and also let's define posterior_statistics
closer to where it's used
tkwargs = {"dtype": X.dtype, "device": X.device} | ||
CLAMP_LB = torch.finfo(tkwargs["dtype"]).eps | ||
|
||
posterior_statistics = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posterior_statistics = dict() |
see below
initial_entropy = add_term + 0.5 * torch.logdet( | ||
posterior_plus_noise.mvn.covariance_matrix | ||
) | ||
posterior_statistics["initial_entropy"] = initial_entropy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posterior_statistics["initial_entropy"] = initial_entropy | |
posterior_statistics = {"initial_entropy": initial_entropy} |
A Meta-internal linter complains about the dict()
call, and also let's define posterior_statistics
closer to where it's used
"For single-objective optimization `num_points` should be 1." | ||
) | ||
if optimizer_kwargs is None: | ||
optimizer_kwargs = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer_kwargs = dict() | |
optimizer_kwargs = {} |
A Meta-internal linter prefers this (C408 from flake8-comprehensions, if anyone is curious https://pypi.org/project/flake8-comprehensions/)
Thanks a lot for adding these benchmarks! I think it's fine to go ahead with the random search solution for now (but add a TODO in code to improve this) so we can get this merged in soon. Then we can figure out the pymoo dependency (or other approaches) in a follow-up. There is also this tutorial error in the CI:
You can address this by adding the tutorial to this file: https://github.com/pytorch/botorch/blob/main/website/tutorials.json |
Looks like the recent large scale refactors on our end caused some merge conflicts - once those are resolve this should be able to go in! |
I ran the tutorial and noticed the batch AF values are often negative (printed below with no modification to the tutorial). What causes this? Is this because the approximation of the batch information gain is a loose lower bound? The 4 AF values printed below for MES and JES are the joint AF value of candidates 1,..., j for j=1,..., q=4 (obtained via sequential greedy optimization) |
That's right, the upper bound H[p(A_1,...,A_q)] <= H[p(A_1)] + ... + H[p(A_q)] can be loose because it essentially ignores the correlation between the A_i's. As q increases, the upper bound becomes larger so the overall lower bound acq can become negative. |
Interesting. It's kind of funky 1) have the approximate joint utility of the optimal q-batch of points be worse (and monotonically decreasing it appears) as q increases (as new points are sequentially added to the batch), 2) have the optimal approximate information gain be negative. Neither of these would be the case the prior joint entropy used the same approximation. Is that something you considered? It might be worth noting in the docstring that negative (and decreasing) values can occur due to the loose upper bound in the batch approx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for adding these! @esantorella, let's re-import and make sure there are no remaining issues
Yeah this was something that we considered early on, the acquisition function reduces down to something like: |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This has been landed! Thanks so much @benmltu ! |
This is amazing! Thank you so much for your excellent contribution! |
Hi,
I have provided some implementations of entropy search acquisition functions used in a recent paper (https://arxiv.org/abs/2210.02905). This PR includes the acquisition function and the necessary utilities. I have included a notebook that describes how to use these methods.
I was not sure what were the best places to add these acquisition functions, so I put them all in the multi-objective folder. Nevertheless, they should work for single-objective optimization as well.
Thanks,
Ben