-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward and backward propagation in NTT stage #4
Comments
Thanks for your interest and this keen observation. As mentioned in the paper, I intended to backprop through the mask. However, as mentioned in another issue you linked, the gradients might not be sparse — I am curious about this but I haven’t checked this myself (perhaps it only happens in global pruning? I am not sure). May I ask you to verify this phenomenon once more? It will be great if you could train a layerwise sparse net using NTT, and check if the layerwise gradient is indeed sparse. If not sparse, then this may hint to a glitch that I made in the implementation. But I would assume that masking or not masking the gradient will not significantly change the final NTT loss. |
I have observed the experimental results before, and the model is indeed sparse. Is it possible that the gradient is updated, but the parameters are not updated? |
Sure! The line
computes the gradient of the NTT loss Some more detailed comments:
As we see in the code, I still believe that the gradient is indeed backprop through the mask (because the gradient is only taken w.r.t the first argument, the parameters of the student network). Let me know if this helps and if there are further questions! |
Hello!
Impressive work!
I'd like to know the meaning of parameters in the function: nt_transfer_step (ntt.py):
masked_g = grad(self.nt_transfer_loss)(student_net_params, masks, teacher_net_params, x, nn_density_level)
I noticed that in your reply to another issue (Pruned weights in NTK #3 ), the gradients of all parameters are updated; however,in the paper:
"Here, the backpropagated gradients flow through the fixed mask m and thus the masked-out parameters are not updated"
it makes me confused.
Look forward to your reply !
Best wishes!
The text was updated successfully, but these errors were encountered: