-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in inducing point allocators #1890
Conversation
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the memory leak fix!
@@ -192,7 +192,7 @@ def __init__( | |||
Args: | |||
train_X: Training inputs (due to the ability of the SVGP to sub-sample | |||
this does not have to be all of the training inputs). | |||
train_Y: Training targets (optional). | |||
train_Y: Not used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting - why did this have the arg in the first place? Any reason (other than linting) not to get rid of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it had this so that _SingleTaskVariationalGP
would have a similar signature to SingleTaskVariationalGP
? In any case, I don't see a good reason to keep it, especially since they differ in other ways at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it to support inducing point allocators that take into account the target values. To this end, we'd need to add outputs
(in addition to inputs
) to the signature of
inducing_point_allocator.allocate_inducing_points
This would be ignored for the default variance reduction strategy, but would be necessary for e.g. @henrymoss's Inducing Point Allocation for Sparse Gaussian Processes in High-Throughput
Bayesian Optimisation. This would also require supporting outputs
in the quality functions, but not a lot else as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This was a tricky one to debug. More evidence that in-place operations and autograd do not mix well.
train_train_kernel = ( | ||
MaternKernel().to(self.device)(inputs).evaluate_kernel() | ||
) | ||
quality_scores = torch.ones([10, 1], device=self.device) | ||
quality_scores = torch.ones([10, 1], device=self.device) | ||
with self.assertRaises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good explicitly check for the specific error using assertRaisesRegex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely agree. I ignored it since this test was pre-existing (I just rearranged it). Changing a bunch of these to assertRaisesRegex
could be a good starter task
Codecov Report
@@ Coverage Diff @@
## main #1890 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 173 173
Lines 15137 15147 +10
=========================================
+ Hits 15137 15147 +10
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Fixes pytorch#1788 . ## Motivation `allocate_inducing_points` leaks memory when passed a `kernel_matrix` with `requires_grad=True`. The memory leak happens due to a specific pattern of in-place torch operations in `_pivoted_cholesky_init`; see [this comment](pytorch#1788 (comment)) for more explanation. There is no need for `allocate_inducing_points` to support a `kernel_matrix` with `requires_grad=True`, because the output of `allocate_inducing_points` is not differentiable anyway (thanks to in-place operations). [x] make `_pivoted_cholesky_init` raise an `UnsupportedError` when passed a `kernel_matrix` with `requires_grad=True`. That is mildly BC-breaking, but I think that is okay since the alternative is a memory leak. [x] Evaluate kernels with `torch.no_grad()` where they are only used to be passed to `_pivoted_cholesky_init` ### Have you read the [Contributing Guidelines on pull requests](https://github.com/pytorch/botorch/blob/main/CONTRIBUTING.md#pull-requests)? Yes Pull Request resolved: pytorch#1890 Test Plan: [x] Unit test for memory leak [x] Unit test for UnsupportedError Reviewed By: saitcakmak, Balandat Differential Revision: D46803080 Pulled By: esantorella fbshipit-source-id: 1fb9c6500d4246a3740a9fce4bda290043f8ac3b
c97d727
to
0ac6b14
Compare
This pull request was exported from Phabricator. Differential Revision: D46803080 |
cc @henrymoss |
@esantorella merged this pull request in 80ec6f4. |
Fixes #1788 .
Motivation
allocate_inducing_points
leaks memory when passed akernel_matrix
withrequires_grad=True
. The memory leak happens due to a specific pattern of in-place torch operations in_pivoted_cholesky_init
; see this comment for more explanation. There is no need forallocate_inducing_points
to support akernel_matrix
withrequires_grad=True
, because the output ofallocate_inducing_points
is not differentiable anyway (thanks to in-place operations).[x] make
_pivoted_cholesky_init
raise anUnsupportedError
when passed akernel_matrix
withrequires_grad=True
. That is mildly BC-breaking, but I think that is okay since the alternative is a memory leak.[x] Evaluate kernels with
torch.no_grad()
where they are only used to be passed to_pivoted_cholesky_init
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan
[x] Unit test for memory leak
[x] Unit test for UnsupportedError