Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete extra input (Bias, ResidualData) in OpMaker of conv2d #49121

Merged
merged 14 commits into from
Feb 6, 2023

Conversation

zyfncg
Copy link
Contributor

@zyfncg zyfncg commented Dec 16, 2022

PR types

Others

PR changes

OPs

Describe

删除conv2d算子OpMaker中的Extra类型输入参数(Bias, ResidualData)。

在前序工作(PR47579, PR48848)的基础上,conv2d算子Extra输入参数的使用依赖基本被解除,因此本PR移除了conv2d算子的Extra输入参数(Bias, ResidualData)。
对于conv2d中仅用于fuse场景的Extra属性(如fuse_activation, fuse_residual_connection等)由于不再被使用在本PR中也进行了删除。


Based on previous work (PR47579, PR48848), The dependence on the use of the extra input parameters(Bias, ResidualData) of the conv2d operator has been removed, so this PR deletes the extra input parameters (Bias, ResidualData) in OpMaker of conv2d.
Extra attributes (such asfuse_activation, fuse_residual_connection, etc.) that are only used in fused kernel for conv2d are no longer used, so we also delete them in this PR.

@paddle-bot
Copy link

paddle-bot bot commented Dec 16, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zyfncg zyfncg force-pushed the clear_conv2d_extra branch from cccaafe to dc02b62 Compare December 21, 2022 07:29
chenwhql
chenwhql previously approved these changes Dec 27, 2022
@zyfncg zyfncg requested review from jczaja and Silv3S December 27, 2022 11:35
jiahy0825
jiahy0825 previously approved these changes Dec 27, 2022
YuanRisheng
YuanRisheng previously approved these changes Dec 27, 2022
Silv3S
Silv3S previously approved these changes Dec 28, 2022
Copy link
Member

@Silv3S Silv3S left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -354,7 +354,9 @@ void CPUQuantizeSquashPass::OpDequantSquash(Graph* graph) const {
FindOutputNameByVarName(any_op->Op(), dequant_in->Name());

if (output_name.empty()) return;

if (any_op->Op()->Type() == "conv2d") {
any_op->Op()->SetType("fused_conv2d");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all int8-oneDNN kernels should be executed as fused kernels by default?

Copy link
Contributor Author

@zyfncg zyfncg Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are trying to delete the extra inputs and attributes in base op, so some extra attributes for int8-oneDNN kernel are also removed, currently we have to put them into fused kernel to execute because no better choice.
I think a good way to execute int8-oneDNN kernel is creating a new kernel for int8-oneDNN, but it is difficult to implement at the current stage, maybe we could come up with a good solution in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for explaining

@@ -556,6 +553,11 @@
extra :
attrs : [bool use_mkldnn = false]

- op : fused_conv2d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fused_conv2d has own operator with extra attributes declared. Is it necessary to add op_compat anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is necessary.
The extra attributes declared in fused_conv2d.pbtxt are used in Pass. The info of extra attributes is getted from op_compat.yaml when executor run the kernel, so it is needed to add op_compat for extra attributes.

@zyfncg
Copy link
Contributor Author

zyfncg commented Jan 4, 2023

@jczaja Is this PR need to run the tests?

@jczaja
Copy link
Contributor

jczaja commented Jan 4, 2023

@zyfncg Hi, I just noticed this PR. Yes, I started our tests and will be able to share results within 2 days.

@jczaja
Copy link
Contributor

jczaja commented Jan 5, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81
QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@zyfncg
Copy link
Contributor Author

zyfncg commented Jan 6, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81 QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@jczaja This command doesn't work in my local machine, where can I get the model file of ernie3.0/float32?

@jczaja
Copy link
Contributor

jczaja commented Jan 9, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81 QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@jczaja This command doesn't work in my local machine, where can I get the model file of ernie3.0/float32?

Ok. @yaomichael will help in providing model for you.

@zyfncg
Copy link
Contributor Author

zyfncg commented Jan 17, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81 QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@jczaja This command doesn't work in my local machine, where can I get the model file of ernie3.0/float32?

Ok. @yaomichael will help in providing model for you.

@yaomichael Hi~ could you send the model file to my email: zhangyunfei07@baidu.com ?

@zyfncg
Copy link
Contributor Author

zyfncg commented Jan 31, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81 QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@jczaja Does this performance problem only occur on the specific CPU platform? The results of performance test are same between this PR and develop on my local machine. Do I need a CPU platform which support bf16 to debug?

@jczaja
Copy link
Contributor

jczaja commented Jan 31, 2023

@zyfncg We tested this PR and noticed performance problem on ERNIE 3.0 bf16 model on most recent platform

Results:

QPS (this PR): 122.81 QPS(develop: 23c1ac2): 154.50

Commandline

FLAGS_use_mkldnn=true python /root/models/PaddleNLP/model_zoo/ernie-3.0/infer.py --task_name tnews --model_path /data/PaddlePaddle/pp_models/ernie3.0/float32 --perf --device cpu --num_threads 1 --enable_bf16

@jczaja Does this performance problem only occur on the specific CPU platform? The results of performance test are same between this PR and develop on my local machine. Do I need a CPU platform which support bf16 to debug?

@zyfncg This performance regression was found on recently published Intel processor: Sapphire Rapids (SPR) as this one has hardware support of bf16 instructions. I do not have results from other processors. So I have started this test on other processors that you have to see if problem is also there (bf16 instructions are emulated there). I will update you once I got some results.

@zyfncg
Copy link
Contributor Author

zyfncg commented Jan 31, 2023

@zyfncg This performance regression was found on recently published Intel processor: Sapphire Rapids (SPR) as this one has hardware support of bf16 instructions. I do not have results from other processors. So I have started this test on other processors that you have to see if problem is also there (bf16 instructions are emulated there). I will update you once I got some results.

Thanks!

@jczaja jczaja closed this Feb 2, 2023
@jczaja jczaja reopened this Feb 2, 2023
@jczaja
Copy link
Contributor

jczaja commented Feb 3, 2023

@zyfncg This performance regression was found on recently published Intel processor: Sapphire Rapids (SPR) as this one has hardware support of bf16 instructions. I do not have results from other processors. So I have started this test on other processors that you have to see if problem is also there (bf16 instructions are emulated there). I will update you once I got some results.

Thanks!

@zyfncg We run more tests and found that actually this test ernie 3.0 got some issue (problem on our side). Your PR is fine according to our other tests. Please proceed with review and merge

@zyfncg
Copy link
Contributor Author

zyfncg commented Feb 3, 2023

@zyfncg This performance regression was found on recently published Intel processor: Sapphire Rapids (SPR) as this one has hardware support of bf16 instructions. I do not have results from other processors. So I have started this test on other processors that you have to see if problem is also there (bf16 instructions are emulated there). I will update you once I got some results.

Thanks!

@zyfncg We run more tests and found that actually this test ernie 3.0 got some issue (problem on our side). Your PR is fine according to our other tests. Please proceed with review and merge

@jczaja Thank you very much! This PR will be merged after resolving conflicts.

@zyfncg zyfncg dismissed stale reviews from Silv3S, YuanRisheng, jiahy0825, and chenwhql via bdb9a0e February 5, 2023 15:44
Copy link
Contributor

@jiahy0825 jiahy0825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zyfncg zyfncg merged commit 2deada9 into PaddlePaddle:develop Feb 6, 2023
@zyfncg zyfncg deleted the clear_conv2d_extra branch February 6, 2023 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants