Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in inducing point allocators #1890

Closed
wants to merge 1 commit into from

Conversation

esantorella
Copy link
Member

@esantorella esantorella commented Jun 16, 2023

Fixes #1788 .

Motivation

allocate_inducing_points leaks memory when passed a kernel_matrix with requires_grad=True. The memory leak happens due to a specific pattern of in-place torch operations in _pivoted_cholesky_init; see this comment for more explanation. There is no need for allocate_inducing_points to support a kernel_matrix with requires_grad=True, because the output of allocate_inducing_points is not differentiable anyway (thanks to in-place operations).

[x] make _pivoted_cholesky_init raise an UnsupportedError when passed a kernel_matrix with requires_grad=True. That is mildly BC-breaking, but I think that is okay since the alternative is a memory leak.
[x] Evaluate kernels with torch.no_grad() where they are only used to be passed to _pivoted_cholesky_init

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

[x] Unit test for memory leak
[x] Unit test for UnsupportedError

@esantorella esantorella self-assigned this Jun 16, 2023
@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 16, 2023
@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the memory leak fix!

@@ -192,7 +192,7 @@ def __init__(
Args:
train_X: Training inputs (due to the ability of the SVGP to sub-sample
this does not have to be all of the training inputs).
train_Y: Training targets (optional).
train_Y: Not used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting - why did this have the arg in the first place? Any reason (other than linting) not to get rid of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it had this so that _SingleTaskVariationalGP would have a similar signature to SingleTaskVariationalGP? In any case, I don't see a good reason to keep it, especially since they differ in other ways at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it to support inducing point allocators that take into account the target values. To this end, we'd need to add outputs (in addition to inputs) to the signature of

inducing_point_allocator.allocate_inducing_points

This would be ignored for the default variance reduction strategy, but would be necessary for e.g. @henrymoss's Inducing Point Allocation for Sparse Gaussian Processes in High-Throughput
Bayesian Optimisation
. This would also require supporting outputs in the quality functions, but not a lot else as far as I can tell.

Copy link
Contributor

@saitcakmak saitcakmak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was a tricky one to debug. More evidence that in-place operations and autograd do not mix well.

train_train_kernel = (
MaternKernel().to(self.device)(inputs).evaluate_kernel()
)
quality_scores = torch.ones([10, 1], device=self.device)
quality_scores = torch.ones([10, 1], device=self.device)
with self.assertRaises(ValueError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good explicitly check for the specific error using assertRaisesRegex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely agree. I ignored it since this test was pre-existing (I just rearranged it). Changing a bunch of these to assertRaisesRegex could be a good starter task

@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Merging #1890 (8f39b6e) into main (00fa4a8) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 8f39b6e differs from pull request most recent head 0ac6b14. Consider uploading reports for the commit 0ac6b14 to get more accurate results

@@            Coverage Diff            @@
##              main     #1890   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          173       173           
  Lines        15137     15147   +10     
=========================================
+ Hits         15137     15147   +10     
Impacted Files Coverage Δ
botorch/models/approximate_gp.py 100.00% <ø> (ø)
botorch/models/utils/inducing_point_allocators.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@facebook-github-bot
Copy link
Contributor

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
Fixes pytorch#1788 .
## Motivation

`allocate_inducing_points` leaks memory when passed a `kernel_matrix` with `requires_grad=True`. The memory leak happens due to a specific pattern of in-place torch operations in `_pivoted_cholesky_init`;  see [this comment](pytorch#1788 (comment)) for more explanation. There is no need for `allocate_inducing_points` to support a `kernel_matrix` with `requires_grad=True`, because the output of `allocate_inducing_points` is not differentiable anyway (thanks to in-place operations).

[x] make `_pivoted_cholesky_init` raise an `UnsupportedError` when passed a `kernel_matrix` with `requires_grad=True`. That is mildly BC-breaking, but I think that is okay since the alternative is a memory leak.
[x] Evaluate kernels with `torch.no_grad()` where they are only used to be passed to `_pivoted_cholesky_init`

### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)?

Yes

Pull Request resolved: pytorch#1890

Test Plan:
[x] Unit test for memory leak
[x] Unit test for UnsupportedError

Reviewed By: saitcakmak, Balandat

Differential Revision: D46803080

Pulled By: esantorella

fbshipit-source-id: 1fb9c6500d4246a3740a9fce4bda290043f8ac3b
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46803080

@Balandat
Copy link
Contributor

cc @henrymoss

@facebook-github-bot
Copy link
Contributor

@esantorella merged this pull request in 80ec6f4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Memory not freed after a variational GP model is discarded
5 participants