-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] AffineInputTransform causes retain_graph errors when previous transforms have parameters #1635
Comments
Thanks for this most excellent bug report & proposed fix! We can certainly patch this in. @saitcakmak do you think it make sense to do this now or should this just go into your planned transform refactor / upstream? |
Thanks for flagging this! These coefficients should never have gradient on them, right? This is an affine transform, so the gradient will also be subject to the same transform when these do not have any gradient attached. If they do, that'll only lead to issues, as discovered here. Adding |
Summary: See pytorch#1635. These should never retain grad. Differential Revision: D42700421 fbshipit-source-id: e3057ac5e4ff50700840497b52f1ff807544d4ca
Here's an example that illustrates why I think gradient information should flow through the Normalize coefficients / offsets. Suppose you have two hyperparameters,
Thus the actual effect of increasing w1 is to increase some train_X feature values and decrease other feature values. However, if you've not allowed gradients to flow through the Normalize coefficients / offset, you've removed this information from the gradient, and the optimization will be aware of Effect 1 but not Effect 2. I agree that it's strange for the cached values to ever contain grad info. My personal preference is to refactor to use the following pattern: # Pseudocode
if self.training:
offset, coefficient = compute_these()
self.offset = offset.detach()
self.coefficient = coefficient.detach()
else:
offset = self.offset
coefficient = self.coefficient
# now use offset and coefficient (not self.offset and self.coefficient) |
Thanks @mrcslws for reporting this and @saitcakmak for putting up the PR! I agree that this makes sense for So I suggest we call the |
@SebastianAment does @mrcslws's suggestion work for your case as well? It'd let you have trainable |
That should work! We could also make the buffers private ( @property
def coefficient(self) -> Tensor:
coeff = self._coefficient
return coeff if self.learn_coefficients and self.training else coeff.detach()
@property
def offset(self) -> Tensor:
offset = self._offset
return offset if self.learn_coefficients and self.training else offset.detach() When someone needs gradient information of the coefficients, this structure would allow them to access it independent of the |
This seems like a good solution to me! @saitcakmak if this makes sense to you as well could you incorporate that into #1642? |
Any thoughts on how we can test that necessary and correct gradient info is propagated and retained? And how we can test that it's not persisted where it's not needed? (The |
…h#1642) Summary: Pull Request resolved: pytorch#1642 See pytorch#1635. These should only retain grad if learning the coefficient and while in train mode. Differential Revision: D42700421 fbshipit-source-id: 907f72a6f585f55f1496e23f58bc463eac063860
Closing this since #1642 landed |
🐛 Bug
If you use an AffineInputTransform (like
Normalize
orInputStandardize
) after an input transform with learned parameters, bad things happen. The AffineInputTransform will unintentionally store values that have autograd compute graphs, because this class was not written to expectX
to have gradient information during training.This has two effects:
loss.backward()
to optimize X, there are retain_graph errors.optimize_acqf
This bug will occur any time
X
has gradient information during training, so there may be other user scenarios where this happens.To reproduce
Error message:
Expected Behavior
botorch is correct to propagate gradient information into
coefficient
andoffset
for use during training, but the saved version of these values (for use during inference) should not contain gradient information. The code should be refactored so that it continues to useoffset
andcoefficient
with gradient information, but it should always calldetach()
before storing these on theself
.System information
botorch: 0.7.3.dev22+g208470e7.d20221102
gpytorch: 1.9.1.dev32+g23b068b5
torch: 1.13.1
MacOS 13.1 M1
The text was updated successfully, but these errors were encountered: