-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。 |
CI已过,麻烦老师reviewe一下 @JamesLim-sy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
目前Paddle中的Tile算子在GPU和CPU的计算逻辑相同,没有编写对应的Cuda代码,存在一定优化空间
设计文档:https://github.com/PaddlePaddle/community/blob/master/rfcs/OPs-Perf/20230319_tile_op_optimization.md
开发环境
优化方法
phi::funcs::BroadcastKernel
与kps::IdentityFunctor<T>()
的组合方式,加速tile
执行中的复制操作完成优化后,Paddle与优化前的Paddle的性能对比效果:
完成优化后,Paddle与Pytorch的性能对比效果如下:
针对四种不同case, 优化后的性能有不同程度的提升。
感谢 @AndPuQing @Asthestarsfalll 在我Debug时提供帮助。