-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support excluded_layers for amp.decorate #52871
support excluded_layers for amp.decorate #52871
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
db9d53a
to
e93bd5d
Compare
python/paddle/amp/auto_cast.py
Outdated
), | ||
): | ||
need_keep_fp32 = True | ||
elif (layer._dtype == 'float16') or isinstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
layer._dtype
代表什么,需要加一些注释解释下?这个接口也用于BF16
吗,那BF16的LayerNorm
参数是FP32还是BF16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已增加注释。该接口就是要统一fp16和bf16的参数转换过程。
bf16下原始接口为pure_bf16_initialize,会将所有层参数做转换,PR中处理为只有bn保持fp32,其他层都会被转换。
python/paddle/amp/auto_cast.py
Outdated
# initialize parameters of the model | ||
for idx in range(len(excluded_layers)): | ||
for layer in excluded_layers[idx].sublayers(include_self=True): | ||
layer._cast_to_low_precison_amp = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low_precison
和amp
重复了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/amp/auto_cast.py
Outdated
), | ||
): | ||
need_keep_fp32 = True | ||
elif not layer._cast_to_low_precison_amp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认下,LayerNorm
是否可以通过某种方式,配置参数使用FP16/BF16类型?实际业务中可能会有这种需求,且pytorch的LayerNorm参数类型默认也是FP16的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前实现不改变fp16的默认行为,依然是原来的处理方式,即BN、LayerNorm、InstanceNorm都保持fp32。
暂时没有提供任何方式设置fp16下允许LN的参数使用fp16。
python/paddle/amp/auto_cast.py
Outdated
gradients will be FP32 dtype after the backpropagation. Default is False. | ||
master_grad(bool, optional): For level='O2', whether to use float32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If master_grad is enabled, the weight | ||
gradients will be float32 dtype after the backpropagation. Default is False. | ||
excluded_layers(Layer|list of Layer, optional): Specifies the layers not to be decorated. The weights of these layers will always keep float32 when level is O2. Default is None, the weights of the whole model will be casted to float16 or bfloat16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确认下,这里是设置类型如[nn.LayerNorm]
,还是设置实例对象如[norm]
,还是两者皆可?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经支持2种方式
test/amp/test_amp_decorate.py
Outdated
models=model, | ||
level='O2', | ||
dtype='float16', | ||
excluded_layers=[model.conv], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个测试里面,设置多个实例吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
257c7b6
to
49d9b84
Compare
49d9b84
to
20c616b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
layer, | ||
( | ||
paddle.incubate.nn.FusedFeedForward, | ||
paddle.incubate.nn.FusedMultiHeadAttention, | ||
), | ||
): | ||
layer._amp_decorate(dtype='float16') | ||
layer._amp_decorate(dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉这个逻辑,后面还可以优化、统一下
python/paddle/amp/auto_cast.py
Outdated
master_grad(bool, optional): For level='O2', whether to use float32 weight gradients for calculations such as gradient clipping, weight decay, and weight updates. If master_grad is enabled, the weight | ||
gradients will be float32 dtype after the backpropagation. Default is False. | ||
excluded_layers(Layer|list of Layer, optional): Specifies the layers not to be decorated. The weights of these layers will always keep float32 when level is O2. `excluded_layers` can be specified as | ||
an Layer instance/type or a list of Layer instances/tpyes. Default is None, the weights of the whole model will be casted to float16 or bfloat16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instances/tpyes
,有个typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for new args
PR types
New featuresPR changes
APIsDescribe
support excluded_layers for amp.decorate英文文档:http://preview-paddle-pr-52871.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/en/api/paddle/amp/decorate_en.html
中文文档:PaddlePaddle/docs#5792