Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support excluded_layers for amp.decorate #52871

Merged
merged 4 commits into from
Apr 18, 2023

Conversation

zhangting2020
Copy link
Contributor

@zhangting2020 zhangting2020 commented Apr 13, 2023

PR types

New features

PR changes

APIs

Describe

support excluded_layers for amp.decorate

英文文档:http://preview-paddle-pr-52871.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/en/api/paddle/amp/decorate_en.html
中文文档:PaddlePaddle/docs#5792

@paddle-bot
Copy link

paddle-bot bot commented Apr 13, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

),
):
need_keep_fp32 = True
elif (layer._dtype == 'float16') or isinstance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layer._dtype代表什么,需要加一些注释解释下?这个接口也用于BF16吗,那BF16的LayerNorm参数是FP32还是BF16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已增加注释。该接口就是要统一fp16和bf16的参数转换过程。
bf16下原始接口为pure_bf16_initialize,会将所有层参数做转换,PR中处理为只有bn保持fp32,其他层都会被转换。

# initialize parameters of the model
for idx in range(len(excluded_layers)):
for layer in excluded_layers[idx].sublayers(include_self=True):
layer._cast_to_low_precison_amp = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low_precisonamp重复了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

),
):
need_keep_fp32 = True
elif not layer._cast_to_low_precison_amp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下,LayerNorm是否可以通过某种方式,配置参数使用FP16/BF16类型?实际业务中可能会有这种需求,且pytorch的LayerNorm参数类型默认也是FP16的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前实现不改变fp16的默认行为,依然是原来的处理方式,即BN、LayerNorm、InstanceNorm都保持fp32。

暂时没有提供任何方式设置fp16下允许LN的参数使用fp16。

gradients will be FP32 dtype after the backpropagation. Default is False.
master_grad(bool, optional): For level='O2', whether to use float32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If master_grad is enabled, the weight
gradients will be float32 dtype after the backpropagation. Default is False.
excluded_layers(Layer|list of Layer, optional): Specifies the layers not to be decorated. The weights of these layers will always keep float32 when level is O2. Default is None, the weights of the whole model will be casted to float16 or bfloat16.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下,这里是设置类型如[nn.LayerNorm],还是设置实例对象如[norm],还是两者皆可?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经支持2种方式

models=model,
level='O2',
dtype='float16',
excluded_layers=[model.conv],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个测试里面,设置多个实例吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zhangting2020 zhangting2020 force-pushed the amp_decorate branch 3 times, most recently from 257c7b6 to 49d9b84 Compare April 17, 2023 11:48
Xreki
Xreki previously approved these changes Apr 18, 2023
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

layer,
(
paddle.incubate.nn.FusedFeedForward,
paddle.incubate.nn.FusedMultiHeadAttention,
),
):
layer._amp_decorate(dtype='float16')
layer._amp_decorate(dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这个逻辑,后面还可以优化、统一下

master_grad(bool, optional): For level='O2', whether to use float32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If master_grad is enabled, the weight
gradients will be float32 dtype after the backpropagation. Default is False.
excluded_layers(Layer|list of Layer, optional): Specifies the layers not to be decorated. The weights of these layers will always keep float32 when level is O2. `excluded_layers` can be specified as
an Layer instance/type or a list of Layer instances/tpyes. Default is None, the weights of the whole model will be casted to float16 or bfloat16.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instances/tpyes,有个typo

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for new args

@zhangting2020 zhangting2020 merged commit 534efcb into PaddlePaddle:develop Apr 18, 2023
jjyaoao pushed a commit to jjyaoao/Paddle that referenced this pull request Apr 19, 2023
lijialin03 pushed a commit to lijialin03/Paddle that referenced this pull request Apr 25, 2023
zhangting2020 added a commit to zhangting2020/Paddle that referenced this pull request Apr 28, 2023
Xreki pushed a commit to Xreki/Paddle that referenced this pull request May 2, 2023
zhangting2020 added a commit to zhangting2020/Paddle that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants