-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic mixed precision #2359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still working on the changes model.cpp
and the amp.{cpp,cu}
files. I suppose I should also read the .py
file too. But just a few small things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking good after the rebase/recent commits. I touched the memory_profile callback and added an example of how to use the SwitchDispatcher
. I found it a bit clunkier than I remembered to use, but I think I have some ideas to improve it in future H2 versions (in particular, I want to make it possible to just use a lambda as the functor).
Also, please clang-format
the whole PR.
dc5c648
to
b5b91c8
Compare
This allows specifying float as a datatype even when it would not be the default.
- Permit using tensor cores by default, if no data conversion is needed. - Change "no tensor ops" mode to prevent TF32 conversion. - Add mode to allow conversion.
This is a hack that we should revisit.
This only matters on cuDNN, where we want "pseudo half".
Co-authored-by: Tom Benson <benson31@llnl.gov>
Co-authored-by: Tom Benson <benson31@llnl.gov>
This includes a fix from @tbennun.
DistData also includes the datatype, and so spuriously indicated things were sharded when mixing datatypes.
Co-authored-by: Tom Benson <benson31@llnl.gov>
Co-authored-by: Tom Benson <benson31@llnl.gov>
This is an initial pass at AMP. It appears correct in the cases I've tested.
It has some particularly pointy bits in batchnorm and in gradient unscaling.