-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter grad
s don't get initialized with BatchL2Grad
and BatchNorm
#239
Comments
Hi @thomasahle, I'm not sure I fully understand your question. Individual gradients don't exist in a network with BN layer in train mode, because the individual losses depend on all samples in the mini-batch. The purpose of the warning you're seeing is exactly to point out this caveat. Are you sure the Best, |
Hi Felix, I'm indeed interested in individual gradient l2 norms. But it would be nice if the normal non batched Right now the non batched |
Hi Thomas,
That seems odd to me because BackPACK does not intervene into PyTorch's gradient computation. Are you sure that these parameters have
For the loss of a neural network with batch normalization, individual gradients, and hence their l2 norm, don't exist. BackPACK only detects this when it encounters a batch norm module. So the result in |
Yes, it is only the parameters that BatchL2Grad does not support that don't get I was thinking this might be a matter of how the exception is handled? That when the |
Here is example code of what I mean:
This outputs:
In other words, the linear layer gets both |
Hi Thomas, I think the problem is that the error message is not strong enough. It should be
The If you specifically want to look at what the quantity that would be obtained by applying the same code used to get individual gradients but in a batchnorm network, you can install from source ( |
I guess you are right about |
BatchL2Grad, perhaps naturally, raises an error when it sees a BatchNorm, since batch normalization mixes gradients in a way that makes the individual contribution hard to discern.
The error says I can ignore it, if I know what I'm doing. I can't say I completely do, but if I ignore it, I do indeed get both
grad
s andbatch_l2
s on the top levels of my mode, which aren't using batch-norm.I'm happy with that.
My problem is that the lower level parameters - which do use batch norm - don't just have a None
batch_l2
, but also a Nonegrad
.So my model doesn't train at all.
This seems wrong, since
grad
is indeed computable, as witnessed by PyTorch being able to do so fine without backpack.Is there a way I can get
batch_l2
s on as many of my parameters as possible, butgrad
s on everything?I an do this now by first calling
backward()
without backpack, and then calling it again insidewith backpack(BatchL2Grad()):
, but that seems wasteful.The text was updated successfully, but these errors were encountered: