This repository has been archived by the owner on Oct 30, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 100
Question #15
Comments
Closed
Softmax is applied to both dimensions in attention.py, not only the columns (dim = 1). |
Ok, I will change that :) I have implemented the module there. Could you please have a quick look to tell me if you think it is correct ? Best, |
@tchaton in our implementation, the computational speed is not much different from the previous. I doubt that the other factors, missing nonlinear activations, regularization, and the other implementational details, may be related. However, we found that memory consumption is hugely reduced (around 30%) due to its efficiency in dealing with computational temporaries. Please refer to our implementation in #23. Thank you for the heads-up. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello guys,
Very nice piece of work.

I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.
This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.
Best,
T.C
The text was updated successfully, but these errors were encountered: