Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buxfix for Proximal acquisition function wrapper for negative base acquisition functions #1447

Closed
wants to merge 16 commits into from

Conversation

roussel-ryan
Copy link
Contributor

Motivation

This PR fixes a major issue when using the ProximalAcquisitionFunction with base acquisition functions that are not strictly positive. This PR fixes it by applying a Softplus transformation to the base acquisition function values (using optional beta = 1,0 argument) before multiplying by proximal weighting.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Tests have been updated with correct (softplus transformed) values.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 10, 2022
@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Merging #1447 (5a6ceaf) into main (0277720) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 5a6ceaf differs from pull request most recent head 213be32. Consider uploading reports for the commit 213be32 to get more accurate results

@@            Coverage Diff            @@
##              main     #1447   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          124       124           
  Lines        11548     11554    +6     
=========================================
+ Hits         11548     11554    +6     
Impacted Files Coverage Δ
botorch/acquisition/proximal.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@saitcakmak saitcakmak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm! The tutorial failure is unrelated (#1446), so please ignore it.

@facebook-github-bot
Copy link
Contributor

@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@SebastianAment
Copy link
Contributor

SebastianAment commented Oct 10, 2022

This is a higher level design question, rather than a comment on the particular fix.

@roussel-ryan, is there a reason against simply adding a quadratic to an arbitrary acquisition function? With this fix, ProximalAcquisitionFunction will compute

log(1+exp(acquisition(x))) * exp(-Q(x)),

where Q(x) = (x-x_{n-1}).t() * proximal_weights^{-1} * (x-x_{n-1})). Similar biasing could be achieved with the simpler expression

acquisition(x) - Q(x).

In addition, the latter would not change the relative magnitude of the acquisition function to the proximal term if the acquisition is negative. If it takes a large negative value, softplus might also lead to numerical instabilities.
Would love to hear your thoughts!

@Balandat
Copy link
Contributor

Thanks for fixing this. One thing that we should keep in mind here is that if the acquisition function values are small (as is often the case later during the optimization for things like Expected Improvement) this could substantially impact the behavior for functions that are indeed guaranteed to be positive. So in that case it seems that the choice of a large beta would be necessary (and in general this highlights the importance of the choice of beta). Would it make sense to allow passing in beta as optional and fall back to the current behavior?

@roussel-ryan
Copy link
Contributor Author

@SebastianAment In response to large negative values for the base acquisition function I'm hoping that using a standardize outcome transform will prevent extreme values. It's not a perfect solution but I think for most cases the proposed solution is sufficient. As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa). Unless I'm misunderstanding what you wrote?

@roussel-ryan
Copy link
Contributor Author

@Balandat I could potentially remove the transform when all of the acquisition function values are found to be positive. Alternatively I could set a large beta as the default or raise an error when the acqf values are negative. We need to prevent it from ever being used with negative acqf because the issue can be quite easy to overlook

@SebastianAment
Copy link
Contributor

As for adding the biasing, instead of multiplying it you mentioned that the relative scale of each term would remain the same, but I think it presents the biasing from working when the acquisition function is large (or vice versa)

That makes sense. The multiplicative version lets us choose weights proportional to the scale of the acquisition functions, which we might not know a priori but could update during a run. It's also equivalent to the additive version by taking the log of the acquisition function, and that might be more easily applied to negative acquisition values.
Thanks for the quick response, enjoyed reading your paper!

@Balandat
Copy link
Contributor

I am not sure how widely used this is. The main thing I'm worried about is that the default behavior for non-negative but small-in-magnitude acquisition functions may change quite a bit with this change. How about we (i) make beta optional and if None do not apply the softplus, and (ii) raise an error if any of the acquisition values are negative, pointing the user to set a beta value?

@roussel-ryan
Copy link
Contributor Author

@Balandat made the change you suggested

Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A couple of nits, o/w this lgtm.

@@ -49,6 +53,7 @@ def __init__(
acq_function: AcquisitionFunction,
proximal_weights: Tensor,
transformed_weighting: bool = True,
beta: float = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
beta: float = None,
beta: Optional[float] = None,


ei_prox_beta = EI_prox_beta(test_X)
self.assertTrue(torch.allclose(ei_prox_beta, ei * test_prox_weight))
self.assertTrue(ei_prox_beta.shape == torch.Size([1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
self.assertTrue(ei_prox_beta.shape == torch.Size([1]))
self.assertEqual(ei_prox_beta.shape, torch.Size([1]))

Comment on lines 114 to 116
test_prox_weight = torch.exp(
mv_normal.log_prob(proximal_test_X)
) / torch.exp(mv_normal.log_prob(last_X))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_prox_weight = torch.exp(
mv_normal.log_prob(proximal_test_X)
) / torch.exp(mv_normal.log_prob(last_X))
test_prox_weight = torch.exp(
mv_normal.log_prob(proximal_test_X) -
mv_normal.log_prob(last_X)
)

Comment on lines 30 to 31
acquisition function. Then the acquisition function is
weighted via a squared exponential centered at the last training point,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
acquisition function. Then the acquisition function is
weighted via a squared exponential centered at the last training point,
acquisition function. The acquisition function is
weighted via a squared exponential centered at the last training point,

weighted via a squared exponential centered at the last training point,
with varying lengthscales corresponding to `proximal_weights`. Can only be used
with acquisition functions based on single batch models. Acquisition functions
must be positive or `beta` is specified to apply a SoftPlus transform before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
must be positive or `beta` is specified to apply a SoftPlus transform before
must be positive or `beta` must be specified to apply a SoftPlus transform before

Comment on lines 176 to 177
with self.assertRaises(RuntimeError):
bad_neg_prox(test_X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use assertRaisesRegex in order to check for the proper error message here?

@roussel-ryan
Copy link
Contributor Author

Updated w/suggestions. Thanks for the comments!

Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, thanks for the contribution!

@saitcakmak I think this is ready for a re-import and merge!

@facebook-github-bot
Copy link
Contributor

@saitcakmak has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants