[oneDNN] Optimize fused elementwise kernel #59663

LLee233 · 2023-12-04T09:18:32Z

PR types

Bug fixes

PR changes

Others

Description

This PR aims for int8 case in #59252 when config.enable_mkldnn_int8() is activated. For float & int8, paddle will go through different passes and hence different kernels. So for int8 case, the refreshed kernel can not utilizes what former PR #59421 optimizes.

paddle-bot · 2023-12-04T09:18:41Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

LLee233 · 2023-12-05T05:27:43Z

Hi @yuanlehome, would you mind helping check this PR? Since the CI coverage seems not work... The coverage result didn't show up and I restarted the check but the result was as still. Thanks~

yuanlehome · 2023-12-05T05:45:21Z

Hi @yuanlehome, would you mind helping check this PR? Since the CI coverage seems not work... The coverage result didn't show up and I restarted the check but the result was as still. Thanks~

Okay, please wait for the latest CI results.

LLee233 · 2023-12-05T10:09:03Z

Hi @yuanlehome, would you mind helping check this PR? Since the CI coverage seems not work... The coverage result didn't show up and I restarted the check but the result was as still. Thanks~

Okay, please wait for the latest CI results.

Hi @yuanlehome, seems it's still no result...

LLee233 · 2023-12-06T01:40:31Z

@xinyu-intel, @vivienfanghuagood, @yuanlehome, hi, would you mind helping review this PR? Thanks~

xinyu-intel · 2023-12-06T02:11:31Z

paddle/phi/kernels/fusion/onednn/fused_elementwise_kernel.cc

-  const auto src_y_memory = handler.AcquireSecondSrcMemory(non_const_y);
+  const auto src_x_memory =
+      handler.swin_case ? (x.numel() == y.numel()
+                               ? handler.AcquireExtendSrcMemory(non_const_x, 0)


what is extendsrc for?

This aligns with former PR #59421. Since we need to manually broadcast src1/src2, I hereby name such operations as "extend".

Optimize fused elementwise kernel

173cbee

paddle-bot bot added the contributor External developers label Dec 4, 2023

LLee233 mentioned this pull request Dec 4, 2023

pr_58560 caused the inference time of swin_transformer model is about 15 times longer than that of setting disable_mkldnn() #59252

Closed

LLee233 changed the title ~~Optimize fused elementwise kernel~~ [oneDNN] Optimize fused elementwise kernel Dec 5, 2023

xinyu-intel reviewed Dec 6, 2023

View reviewed changes

xinyu-intel approved these changes Dec 6, 2023

View reviewed changes

yuanlehome approved these changes Dec 6, 2023

View reviewed changes

xinyu-intel merged commit 0d4bbd6 into PaddlePaddle:develop Dec 6, 2023
29 checks passed

xinyu-intel added the Intel label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oneDNN] Optimize fused elementwise kernel #59663

[oneDNN] Optimize fused elementwise kernel #59663

LLee233 commented Dec 4, 2023

paddle-bot bot commented Dec 4, 2023

LLee233 commented Dec 5, 2023 •

edited

Loading

yuanlehome commented Dec 5, 2023

LLee233 commented Dec 5, 2023

LLee233 commented Dec 6, 2023

xinyu-intel Dec 6, 2023

LLee233 Dec 6, 2023

[oneDNN] Optimize fused elementwise kernel #59663

[oneDNN] Optimize fused elementwise kernel #59663

Conversation

LLee233 commented Dec 4, 2023

PR types

PR changes

Description

paddle-bot bot commented Dec 4, 2023

LLee233 commented Dec 5, 2023 • edited Loading

yuanlehome commented Dec 5, 2023

LLee233 commented Dec 5, 2023

LLee233 commented Dec 6, 2023

xinyu-intel Dec 6, 2023

Choose a reason for hiding this comment

LLee233 Dec 6, 2023

Choose a reason for hiding this comment

LLee233 commented Dec 5, 2023 •

edited

Loading