-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101
Comments
@oliver-batchelor The reason that taichi returns doubled result is from
After running But when you return The root cause here is that torch expected the backward function to have no side-effect to gradients (but only return them), however running taichi kernels does have side effect. So there're two ways to work around it, remove taichi side effect or just return zero grad to torch.
or
|
Ah ha, thanks! It did seem a little weird to me returning the I must admit I just followed the recipe from here (which I assume has also double gradients) - but this wouldn't necessarily cause big problems, just a doubled learning rate... depending on what you used it for. https://github.com/taichi-dev/taichi-nerfs/blob/main/modules/hash_encoder.py |
We might want to fix the taichi-nerfs one to ensure hyper parameter compatibility hmmm. But I will mark this one resolved |
From further testing both these methods seem flawed - I'll create some test cases but in essence:
Seems it's not safe to mess with the .grad nor is it safe to return zeros!! |
On second glance I think method 1 seems to do the right thing - because normally the direct predecessor node won't have grad anyway unless retain_grad is set to True - and it matches the pytorch behaviour here. The only difference in this case is that it will have zeros as grad rather than "None" which is what it is under pytorch autograd. |
Is it possible to create another attribute on the tensor object like taichi_grad or some such that means we don't have to mess with the torch |
Hi,
An example below trying to use a taichi kernel on torch tensors with autograd on the most recent nightly (taichi-nightly-1.7.0.post20230529). Latest torch from pip (2.01), python 3.10.
Outputs (distances) is the same in both cases.
Gradients from torch (x.grad):
Gradients from taichi (x.grad):
A couple of other problems:
grad_output.contiguous()
- the result just seems (more) differentI am only moderately confident I'm using this the right way so please excuse me if I'm abusing the system somehow!
The text was updated successfully, but these errors were encountered: