Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic mixed precision #2359

Merged
merged 52 commits into from
Nov 9, 2023
Merged

Automatic mixed precision #2359

merged 52 commits into from
Nov 9, 2023

Conversation

ndryden
Copy link
Collaborator

@ndryden ndryden commented Oct 19, 2023

This is an initial pass at AMP. It appears correct in the cases I've tested.

It has some particularly pointy bits in batchnorm and in gradient unscaling.

Copy link
Collaborator

@benson31 benson31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on the changes model.cpp and the amp.{cpp,cu} files. I suppose I should also read the .py file too. But just a few small things.

applications/vision/resnet.py Show resolved Hide resolved
applications/vision/lenet.py Outdated Show resolved Hide resolved
include/lbann/layers/transform/evaluation.hpp Outdated Show resolved Hide resolved
src/layers/regularizers/batch_normalization_builder.cpp Outdated Show resolved Hide resolved
include/lbann/utils/print_helpers.hpp Outdated Show resolved Hide resolved
src/utils/cudnn.cpp Outdated Show resolved Hide resolved
src/utils/amp.cu Outdated Show resolved Hide resolved
Copy link
Collaborator

@benson31 benson31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking good after the rebase/recent commits. I touched the memory_profile callback and added an example of how to use the SwitchDispatcher. I found it a bit clunkier than I remembered to use, but I think I have some ideas to improve it in future H2 versions (in particular, I want to make it possible to just use a lambda as the functor).

Also, please clang-format the whole PR.

@ndryden ndryden marked this pull request as ready for review October 30, 2023 18:34
@ndryden ndryden force-pushed the amp branch 2 times, most recently from dc5c648 to b5b91c8 Compare November 1, 2023 17:04
src/models/model.cpp Outdated Show resolved Hide resolved
ndryden and others added 26 commits November 9, 2023 11:26
Co-authored-by: Tom Benson <benson31@llnl.gov>
Co-authored-by: Tom Benson <benson31@llnl.gov>
DistData also includes the datatype, and so spuriously indicated
things were sharded when mixing datatypes.
Co-authored-by: Tom Benson <benson31@llnl.gov>
Co-authored-by: Tom Benson <benson31@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants