-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oneDNN] Optimize fused elementwise kernel #59663
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Hi @yuanlehome, would you mind helping check this PR? Since the CI coverage seems not work... The coverage result didn't show up and I restarted the check but the result was as still. Thanks~ |
Okay, please wait for the latest CI results. |
Hi @yuanlehome, seems it's still no result... |
@xinyu-intel, @vivienfanghuagood, @yuanlehome, hi, would you mind helping review this PR? Thanks~ |
const auto src_y_memory = handler.AcquireSecondSrcMemory(non_const_y); | ||
const auto src_x_memory = | ||
handler.swin_case ? (x.numel() == y.numel() | ||
? handler.AcquireExtendSrcMemory(non_const_x, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is extendsrc for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This aligns with former PR #59421. Since we need to manually broadcast src1/src2, I hereby name such operations as "extend".
PR types
Bug fixes
PR changes
Others
Description
This PR aims for int8 case in #59252 when
config.enable_mkldnn_int8()
is activated. For float & int8, paddle will go through different passes and hence different kernels. So for int8 case, the refreshed kernel can not utilizes what former PR #59421 optimizes.