New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[AMP]Master grad in static graph #53362

Merged

Xreki merged 29 commits into PaddlePaddle:develop from shaojiewang:master_grad_in_static_graph

May 18, 2023

Contributor

shaojiewang commented Apr 26, 2023 •

edited

Loading

PR types

New features

PR changes

Others

Description

Pcard-70458
enable master grad on static graph mode

背景与功能和 #52235 一致，本PR在静态图做功能实现。

功能和效果

amp O2模式下训练时，bf16和fp16精度会出现梯度小于bf16/fp16精度或大于bf16/fp16表达范围，在静态图上，将梯度转为fp32后做check_finite_and_unscale、grad clip、regularization和optimizer，以确保训练精度。

使用

默认不开启master_grad，仅在O2 level下，用户手动设置时开启。
用户通过paddle.static.amp.decorate接口设置master_grad，master_grad=True。配置生效后，会在OptimizerWithMixedPrecision.apply_gradients接口中，创建master_grad tensor，并且在_check_finite_and_unscale之前，插入cast op，把bf16/fp16的grad转换成fp32的master_grad。check_finite_and_unscale、grad clip、regularization和optimizer都使用fp32的master grad计算。

影响

启用后，会在program中插入一些cast op，并且check_finite_and_unscale、grad clip、regularization和optimizer的gradients参数变为fp32，单个step速度会变慢。

shaojiewang added 3 commits

April 25, 2023 20:51


          add master gradients on static graph

3e71022


          merge remote develop

82207bd


          make test program runnable

6659cca

paddle-bot bot commented Apr 26, 2023

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot bot commented Apr 26, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

paddle-ci-bot bot commented May 4, 2023

Sorry to inform you that 6659cca's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

shaojiewang added 3 commits

May 8, 2023 09:35


          add unit test for bf16 master grad static graph

9af1239


          remove some seeds in python unittest

12ab19a


          use float16 as v100 test dtype

f07d594

ZzSean reviewed

View reviewed changes

test/amp/test_amp_master_grad_static.py Outdated



		@unittest.skipIf(
		not core.supports_bfloat16(), "place does not support BF16 evaluation"

Contributor

ZzSean May 8, 2023 •

edited

Loading

这个好像只会判断CPU的place是否支持bf16，如果是GPU的话需要用这个core.is_compiled_with_cuda()+core.is_bfloat16_supported(core.CUDAPlace(0))判断

Contributor Author

shaojiewang May 8, 2023

换成了判断GPU的接口

shaojiewang added 2 commits

May 8, 2023 11:36


          only skip GPU which do not support bf16

ec319e0


          1. add doc in api 2. remove non-used code in adamw

4fc205a

shaojiewang changed the title ~~Master grad in static graph~~ [AMP]Master grad in static graph

shaojiewang added 10 commits

May 8, 2023 15:21


          add _ before new func name

835e4c5


          Merge branch 'develop' into master_grad_in_static_graph

1799c74


          use linear layer to test master grad

a00f347


          refine modification when test not passed

63b542e


          Merge branch 'develop' into master_grad_in_static_graph

2470da4


          Merge branch 'develop' into master_grad_in_static_graph

d3d79fe


          remove non-used var


          Merge branch 'develop' into master_grad_in_static_graph

884fc7d


          fix bug

ba24079


          remove unused code

171444b

shaojiewang requested a review from ZzSean

May 11, 2023 02:02

zhangting2020 reviewed

View reviewed changes

python/paddle/optimizer/adamw.py Outdated

+                      # master gradients
+                      self._already_create_master_grad = set()
+                      self._master_grads = {}
+                      self._master_grad = False

Contributor

zhangting2020 May 11, 2023

这几行是不是不需要，我看在基类中已有设置

Contributor Author

shaojiewang May 11, 2023

adamw.init()里面没有调用super.init()，是否是因为有某些考量所以没有调用？

test/amp/amp_base_models.py Outdated

@@ @@ -277,6 +395,6 @@ def run_program( @@
                                   feed={feed_vars[0].name: x_np},
                                   fetch_list=fetch_vars,
                               )
-                              print(f"-- [BF16 {level}] iter={iter_id}, loss={results[0]}")
+                              # print(f"-- [BF16 {level}] iter={iter_id}, loss={results[0]}")

Contributor

zhangting2020 May 11, 2023

这里注释要打开么

Contributor Author

shaojiewang May 11, 2023

这个测试的内容改成比较O1 O2的loss结果是否equal了，所以是否可以删掉这条打印？

Contributor Author

shaojiewang May 11, 2023

看到有别的测试再用它，打开了

shaojiewang added 3 commits

May 11, 2023 16:34


          add layernorm into test model

ae96e97


          use fp16 as test type for master grad, because v100 do not run bf16 k…

b9f7310

…ernels


          add print in test run

c67eab1

shaojiewang requested a review from zhangting2020

May 12, 2023 03:18

zhangting2020 previously approved these changes

View reviewed changes

Xreki reviewed

View reviewed changes

python/paddle/optimizer/optimizer.py

+                      # master gradients
+                      self._already_create_master_grad = set()
+                      self._master_grads = {}
+                      self._master_grad = False

Contributor

Xreki May 15, 2023

定义一个函数吧，create_master_grad_states

Contributor Author

shaojiewang May 15, 2023

已修改

python/paddle/optimizer/optimizer.py

+                      if grad.name in self._master_grads:
+                          var = self._master_grads[grad.name]
+                      else:
+                          var_name = grad.name + "_fp32_master"

Contributor

Xreki May 15, 2023

是否需要判断一下grad的数据类型？或者加一个assert？

Contributor Author

shaojiewang May 15, 2023

在调用这个函数的时候判断了grad的数据类型，这里是否也要再次判断下？

Contributor Author

shaojiewang May 15, 2023

增加一个assert

python/paddle/optimizer/optimizer.py Outdated

+                      Add ops to cast gradient to master gradient
+                      Args:
+                        param_grads(list(tuple(Tensor, Tensor))):

Contributor

Xreki May 15, 2023

虽然这个函数不自动生成文档，但这参数和功能描述的格式不太符合常规

Contributor Author

shaojiewang May 15, 2023

已修改，请检查是否改正确了

python/paddle/optimizer/optimizer.py Outdated

+                      assert isinstance(target_block, framework.Block)
+                      # create
+                      for p, g in param_grads:
+                          if g.name not in self._already_create_master_grad:

Contributor

Xreki May 15, 2023

这里用if g.name not in self._master_grads.keys()也能判断吧，没有必要另外存一个self._already_create_master_grad？

Contributor Author

shaojiewang May 15, 2023

使用if g.name not in self._master_grads.keys()判断

python/paddle/optimizer/optimizer.py Outdated

@@ @@ -1170,9 +1246,10 @@ def apply_gradients(self, params_grads): @@
                       # 'optimizer(grad_clip)' or 'set_gradient_clip'
                       if self._grad_clip is not None:
+                          # create master gradients
+                          params_grads = self._append_cast_to_master_grad_op(params_grads)

Contributor

Xreki May 15, 2023

如果没有_grad_clip，master_grad能生效吗？这里的params是master_weight吗，即用于grad_clip计算的param是不是master_weight？

我理解并不只是grad_clip里面使用master_grad，而是backward之后一切需要用到grad的地方都使用master_grad。

Contributor Author

shaojiewang May 15, 2023

如果没有_grad_clip，master_grad能生效吗？

不能生效。这里写的不对，应该挪到if self._grad_clip is not None外面去判断。随后修改

这里的params是master_weight吗，即用于grad_clip计算的param是不是master_weight?

params不是master_weight。grad_clip不使用params参数，是否需要改成传入master_weight和master_grad的tuple？

python/paddle/static/amp/decorator.py Outdated

@@ @@ -791,6 +798,7 @@ def decorate( @@
                   use_dynamic_loss_scaling=None,
                   use_amp_guard=False,
                   use_promote=False,
+                  use_master_grad=False,

Contributor

Xreki May 15, 2023

加到L792行之后，参数形式为master_grad=False，并添加参数对应的文档

Contributor Author

shaojiewang May 15, 2023

已修改

test/amp/amp_base_models.py Outdated

@@ @@ -42,14 +72,18 @@ def _build_optimizer( @@
                       beta2=0.836,
                       epsilon=1e-4,
                       weight_decay=0.01,
+                      multi_precision=True,

Contributor

Xreki May 15, 2023

这里不要加multi_precision参数，decorate已经支持设置master_weight，并且O2训练会自动设置成True。

Contributor Author

shaojiewang May 15, 2023

删掉了

test/amp/amp_base_models.py Outdated

                           use_promote=use_promote,
+                          master_weight=True,
+                          init_loss_scaling=1,

Contributor

Xreki May 15, 2023

init_loss_scaling也没必要设置，bfloat16训练会自动设置成1

Contributor Author

shaojiewang May 15, 2023

删掉了

test/amp/test_amp_master_grad_static.py Outdated

+                          f"The number of optimizers with multi_precison = True is expected to be {expected_num_mp}, but recieved {actual_num_mp}.",
+                      )
+                  def test_amp_fp16_o1(self):

Contributor

Xreki May 15, 2023

这个单测这是为了测试master_grad功能的话，o1的检查感觉没有必要？

Contributor Author

shaojiewang May 15, 2023

是的。已删除

test/amp/amp_base_models.py Outdated

+                              amp_dtype,
+                              amp_level,
+                              amp_lists,
+                              True,

Contributor

Xreki May 15, 2023

需要一个grad_clip为False的单测

Contributor Author

shaojiewang May 15, 2023

已增加


          1.push master grad creation before all optimizer ops; 2.remove useles…

e7623dd

…s unittest; 3.use a function to create master grad states

shaojiewang dismissed zhangting2020’s stale review via

e7623dd

May 15, 2023 09:09

shaojiewang added 2 commits

May 15, 2023 17:20


          remove master weight in test

d6a31b0


          fix unit test under cuda10-2

ac03317

shaojiewang requested a review from Xreki

May 16, 2023 02:25

shaojiewang added 5 commits

May 16, 2023 14:56


          merge remote develop and resolve conflict

9a0e92d


          fix run_program caller error

9dcff9a


          merge remote develop and resolve conflict

aceffe0


          Merge branch 'develop' into master_grad_in_static_graph


          fix amp api test failure

44004f5

Xreki approved these changes

View reviewed changes

Contributor

Xreki left a comment

LGTM. PR描述再加强一下吧，功能是什么、怎么做的、达到了什么效果

test/amp/test_amp_master_grad_static.py

+                          return losses
+                      dtype = "float16"
+                      max_iters = 25

Contributor

Xreki May 17, 2023

可能没必要跑这么多个iter

Contributor Author

shaojiewang May 18, 2023

这个单测测试了两个项目，1.O1和O2加master grad的loss相等，2.O1和O2不加master grad的loss不相等。两个条件同时满足出现在了第24 step，所以设置了25

test/amp/test_amp_master_grad_static.py

+                          seed = 0
+                          paddle.seed(seed)
+                          np.random.seed(seed)
+                          random.seed(seed)

Contributor

Xreki May 17, 2023

seed不需要重复设置吧？

Contributor Author

shaojiewang May 18, 2023

这里是想让两次startup program跑出同样的结果。这样的写法比较简单。

Contributor Author

shaojiewang commented May 18, 2023

LGTM. PR描述再加强一下吧，功能是什么、怎么做的、达到了什么效果

补充了PR描述

Xreki merged commit 972581d into PaddlePaddle:develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet