You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
Hi,
I found that training Transformers with Adam is three times slower than with Adafactor.
Here is the command I am using for Adam:
Here is the command I am using for Adafactor:
I found that training 100 steps cost 240 seconds for Adam, while it just needs 80s for Adafactor.
Could anyone help take a look?
Thanks very much!
The text was updated successfully, but these errors were encountered: