【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

zeroRains · 2023-04-03T13:24:23Z

PR types

Performance optimization

PR changes

OPs

Describe

目前Paddle中的Tile算子在GPU和CPU的计算逻辑相同，没有编写对应的Cuda代码，存在一定优化空间
设计文档：https://github.com/PaddlePaddle/community/blob/master/rfcs/OPs-Perf/20230319_tile_op_optimization.md

开发环境
1. 设备：Tesla V100
2. 环境：CUDA11.2，cuDNN 8
优化方法
- 使用phi::funcs::BroadcastKernel与kps::IdentityFunctor<T>()的组合方式，加速tile执行中的复制操作
  完成优化后，Paddle与优化前的Paddle的性能对比效果:

Case No.	device	repeat_times	input_shape	input_type	Paddle Perf(ms)	old Paddle Perf(ms)	diff
1	Tesla V100	[1,10,128,128]	[16L,100L,2L,2L]	float32	5.1831	10.1888	faster than 96.58%
2	Tesla V100	[1,10,128,128]	[16L,100L,2L,2L]	float16	3.5461	16.7348	faster than 372%
3	Tesla V100	[4,1,807]	[32L, 807L, 1L]	float32	0.3885	0.7381	faster than 89.99%
4	Tesla V100	[4,1,807]	[32L, 807L, 1L]	float16	0.2465	0.9850	faster than 300%

完成优化后，Paddle与Pytorch的性能对比效果如下:

Case No.	device	repeat_times	input_shape	input_type	Paddle Perf(ms)	Pytorch Perf(ms)	diff
1	Tesla V100	[1,10,128,128]	[16L,100L,2L,2L]	float32	5.1831	8.0796	faster than 55.88%
2	Tesla V100	[1,10,128,128]	[16L,100L,2L,2L]	float16	3.5461	7.7898	faster than 120%
3	Tesla V100	[4,1,807]	[32L, 807L, 1L]	float32	0.3885	0.5342	faster than 37.50%
4	Tesla V100	[4,1,807]	[32L, 807L, 1L]	float16	0.2465	0.3768	faster than 52.86%

针对四种不同case, 优化后的性能有不同程度的提升。
感谢 @AndPuQing @Asthestarsfalll 在我Debug时提供帮助。

… develop

paddle-bot · 2023-04-03T13:24:29Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-04-03T13:24:54Z

很抱歉，经过我们的反复讨论，你的PR暂未达到合入标准，请阅读飞桨原生算子开发规范，你可以重新提交新的PR，我们先将此PR关闭，感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

… tile

zeroRains · 2023-04-04T06:21:00Z

CI已过，麻烦老师reviewe一下 @JamesLim-sy

JamesLim-sy

LGTM

zeroRains added 9 commits March 15, 2023 07:44

fix divide zero bug for softmax_with_cross_entropy

8091352

change the single test way

a6adf59

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ef94d12

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6292f3a

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6de26e1

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5e4671f

… develop

can run but slow. the most important is that I do not know why it slow

30d5113

remove some useless commet

d218532

change the copyright to correct

b98c756

paddle-bot bot added contributor External developers status: proposed labels Apr 3, 2023

zeroRains closed this Apr 3, 2023

paddle-bot bot added status: not progressed and removed status: proposed labels Apr 3, 2023

remove some useless change

bb63e56

zeroRains reopened this Apr 3, 2023

zeroRains mentioned this pull request Apr 3, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

28b06d8

… tile

luotao1 assigned luotao1, Ligoml and JamesLim-sy Apr 4, 2023

if repeat_times == 1, we will not use BroadcastKernel

ce46e94

JamesLim-sy approved these changes Apr 10, 2023

View reviewed changes

JamesLim-sy merged commit 61fe219 into PaddlePaddle:develop Apr 10, 2023

zeroRains deleted the tile branch April 10, 2023 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

zeroRains commented Apr 3, 2023 •

edited

Loading

paddle-bot bot commented Apr 3, 2023

paddle-bot bot commented Apr 3, 2023

zeroRains commented Apr 4, 2023

JamesLim-sy left a comment

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 #52482

Conversation

zeroRains commented Apr 3, 2023 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Apr 3, 2023

paddle-bot bot commented Apr 3, 2023

zeroRains commented Apr 4, 2023

JamesLim-sy left a comment

Choose a reason for hiding this comment

zeroRains commented Apr 3, 2023 •

edited

Loading