Automatic mixed precision #2359

ndryden · 2023-10-19T19:19:48Z

This is an initial pass at AMP. It appears correct in the cases I've tested.

It has some particularly pointy bits in batchnorm and in gradient unscaling.

benson31

I'm still working on the changes model.cpp and the amp.{cpp,cu} files. I suppose I should also read the .py file too. But just a few small things.

applications/vision/resnet.py

applications/vision/lenet.py

include/lbann/layers/transform/evaluation.hpp

src/layers/regularizers/batch_normalization_builder.cpp

include/lbann/utils/print_helpers.hpp

src/utils/cudnn.cpp

include/lbann/layers/regularizers/batch_normalization.hpp

src/utils/amp.cu

benson31

Still looking good after the rebase/recent commits. I touched the memory_profile callback and added an example of how to use the SwitchDispatcher. I found it a bit clunkier than I remembered to use, but I think I have some ideas to improve it in future H2 versions (in particular, I want to make it possible to just use a lambda as the functor).

Also, please clang-format the whole PR.

src/models/model.cpp

This allows specifying float as a datatype even when it would not be the default.

- Permit using tensor cores by default, if no data conversion is needed. - Change "no tensor ops" mode to prevent TF32 conversion. - Add mode to allow conversion.

This is a hack that we should revisit.

This only matters on cuDNN, where we want "pseudo half".

Co-authored-by: Tom Benson <benson31@llnl.gov>

@tbennun

This includes a fix from @tbennun.

DistData also includes the datatype, and so spuriously indicated things were sharded when mixing datatypes.

Co-authored-by: Tom Benson <benson31@llnl.gov>

ndryden added the enhancement label Oct 19, 2023

ndryden force-pushed the amp branch from b0ceadd to 2468c85 Compare October 24, 2023 19:56

benson31 reviewed Oct 26, 2023

View reviewed changes

benson31 reviewed Oct 27, 2023

View reviewed changes

include/lbann/layers/regularizers/batch_normalization.hpp Show resolved Hide resolved

benson31 reviewed Oct 27, 2023

View reviewed changes

src/utils/amp.cu Outdated Show resolved Hide resolved

benson31 approved these changes Oct 27, 2023

View reviewed changes

ndryden force-pushed the amp branch from 634e57a to 99224ad Compare October 27, 2023 20:52

benson31 mentioned this pull request Oct 28, 2023

Fix FP16 support on ROCm platforms #2366

Merged

benson31 approved these changes Oct 29, 2023

View reviewed changes

ndryden force-pushed the amp branch from e5605b0 to 978aa27 Compare October 30, 2023 17:56

ndryden marked this pull request as ready for review October 30, 2023 18:34

ndryden force-pushed the amp branch 2 times, most recently from dc5c648 to b5b91c8 Compare November 1, 2023 17:04

benson31 approved these changes Nov 2, 2023

View reviewed changes

benson31 reviewed Nov 6, 2023

View reviewed changes

src/models/model.cpp Outdated Show resolved Hide resolved

ndryden added 15 commits November 9, 2023 11:26

Initial gradient loss scaling.

69adfbe

Initial Python AMP support.

3186118

Have an explicit default data type that is not float.

b2cbf75

This allows specifying float as a datatype even when it would not be the default.

Print optimizer datatype and whether AMP is enabled.

6d4706c

Update tensor core defaults.

2f4c086

- Permit using tensor cores by default, if no data conversion is needed. - Change "no tensor ops" mode to prevent TF32 conversion. - Add mode to allow conversion.

Create initializers for weights in AMP.

e62f4be

Fix gradient unscaling to avoid operating on an ephemeral copy.

7e7562b

This is a hack that we should revisit.

Use DNNLib to select the right datatype for convolution descriptors.

bf7ec83

This only matters on cuDNN, where we want "pseudo half".

Python command line flag for AMP for LeNet and ResNet.

832f492

Properly handle default datatypes in operator layers.

ffcb605

Bugfix.

43226c6

Start to support mixed-precision batchnorm.

6b2cd7a

Mixed-precision batchnorm implementation.

6cf520c

Fix some warnings/issues.

6cf185e

Warn when AMP skips many iterations in a row.

13ac049

ndryden and others added 26 commits November 9, 2023 11:26

Don't unnecessarily use a multisync.

93d146f

Add AMP support to GPT.

01530e1

Apply suggestions from code review

7af3da9

Co-authored-by: Tom Benson <benson31@llnl.gov>

Fix a compilation error.

a55e1f0

Clean up batchnorm accumulation type.

9ed35c8

Use gpu_lib isfinite in AMP unscaling.

8d6ef0d

Support reshape with different datatypes.

79efc8d

Co-authored-by: Tom Benson <benson31@llnl.gov>

Make embedding layer run in mixed precision.

4cf7715

This includes a fix from @tbennun.

Fix compilation from rebase.

f23c494

Fix issue with checking DistDatas for whether things are sharded.

41bda5f

DistData also includes the datatype, and so spuriously indicated things were sharded when mixing datatypes.

Fix warning.

6661c62

Fix memory profiler to work with AMP.

f7af7b2

Use right gradient in get_raw_gradients.

ba1a298

Don't launch unscale kernels on empty matrices.

210d4be

Remove unused env variable settings.

92a5af7

Clean up ResNet arguments and add progress bar.

1d5328a

Fix ResNet default epochs.

6a9b819

Use SwitchDispatcher instead of direct dynamic_casts

6825d80

Update release notes.

723283f

Fix compilation on ancient GCC versions (e.g., 8.3.1).

3dcc46e

Co-authored-by: Tom Benson <benson31@llnl.gov>

Updates for ROCm build

dac2279

A horrid and mortifying update to atomic_add for fp16 on ROCm

b7af96c

Fix weight setup when some weights are already present.

f05697e

Add missing half guard.

067b461

Update src/models/model.cpp

643bbd8

Co-authored-by: Tom Benson <benson31@llnl.gov>

Drop ResNet accuracy due to CI variance on AMD.

67ce1a5

ndryden force-pushed the amp branch from 359d281 to 67ce1a5 Compare November 9, 2023 19:27

ndryden merged commit c00fa68 into LBANN:develop Nov 9, 2023

ndryden deleted the amp branch November 9, 2023 19:28

ndryden mentioned this pull request Nov 9, 2023

Variability in ResNet integration test #2386

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic mixed precision #2359

Automatic mixed precision #2359

ndryden commented Oct 19, 2023

benson31 left a comment

benson31 left a comment •

edited

Loading

Automatic mixed precision #2359

Automatic mixed precision #2359

Conversation

ndryden commented Oct 19, 2023

benson31 left a comment

Choose a reason for hiding this comment

benson31 left a comment • edited Loading

Choose a reason for hiding this comment

benson31 left a comment •

edited

Loading