[cutlass] Sparse conv3d backward fusion #52361

umiswing · 2023-03-30T11:33:38Z

PR types

Performance optimization

PR changes

Others

Describe

1.在sparse/gpu/conv_grad_kernel.cu中添加了计算d_kernel, d_x时的gather_gemm_scatter融合。
2.在生成脚本中添加了sparse conv3d反向融合需要用到的kernel的生成代码。
3.在auto tune中添加反向融合接口。
4.cutlass提供的算子融合在反向时需要将gemm计算结果写入buffer中，一次性分配大小为sizeof(float) * max_in_channels * max_out_channels * max_splitk_slices = 4 * 256 * 256 *256 bytes = 67MB的buffer。若训练过程中需要更大的buffer则更新。buffer将在训练结束后释放。

相比develop版本，添加反向融合后，4卡a100上训练性能提高5%。

	PaddlePaddle	PyTorch	PyTorch / PaddlePaddle
A100-40G	21h -> 20h	22h	1.1

… auto_select

Change splitK slices select rules, faster on some shapes, need to be tuned

paddle-bot · 2023-03-30T11:33:43Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-03-30T11:33:45Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

JamesLim-sy

LGTM

zyfncg

LGTM for static-check-ci

umiswing added 28 commits February 21, 2023 11:44

commit for saving, not work now :(

d037457

finally it pass compilation...

63decbd

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5a1f9a3

… auto_select

change GetKey() to GenKey()

865f12a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5084bc6

… auto_select

works for fp16 and fp32 on sm 80.

a21618f

clean the code.

630319d

remove scripts for sm 70

31984a0

remove some comment

cd414d9

remove some unused header.

4f24f11

restructure code.

62c8120

restructure more codes.

384c34e

remove some unused codes.

1b8072b

commit for saving.

4c7c25f

modify interface for backward.

5eb554c

run successfully, result need to be checked.

447d4db

add split k, but still slow

7aebcdb

Fix a bug in conv_grad_kernel.cu.

afc967d

Change splitK slices select rules, faster on some shapes, need to be tuned

fix compile

6d2bc70

Merge branch 'fix_make' into spconv_back_fuse

a6d7c95

fix shape to key mapping error in conv_grad shape.

d587a76

try to add a reduce kernel, not work yet...

5ac88b7

Can pass compilication adding reduce, not work yet.

a96b299

using device::reduce, still not work yet. :(

197c59a

Reduction run without illegal memory access, but still not correct.

bc20db8

Compute correctlly, but super slow.

2b861fc

Works and fast now.

1431d1b

Merge branch 'spconv_back_fuse' into back_fusion

7f26ec7

codestyle fix.

f2c83fe

umiswing changed the title ~~Back fusion~~ [cutlass] Sparse conv3d backward fusion Mar 30, 2023

umiswing added 3 commits March 30, 2023 12:22

revert some changes.

f8ed53b

Add a status checks

e602bd7

Remove backward fusion in fp16 since it's slow.

f215798

JamesLim-sy requested review from Xreki and zkh2016 April 10, 2023 06:14

zkh2016 approved these changes Apr 10, 2023

View reviewed changes

JamesLim-sy approved these changes Apr 11, 2023

View reviewed changes

zyfncg approved these changes Apr 13, 2023

View reviewed changes

zkh2016 merged commit 0b98d1a into PaddlePaddle:develop Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cutlass] Sparse conv3d backward fusion #52361

[cutlass] Sparse conv3d backward fusion #52361

umiswing commented Mar 30, 2023 •

edited

Loading

paddle-bot bot commented Mar 30, 2023

paddle-bot bot commented Mar 30, 2023

JamesLim-sy left a comment

zyfncg left a comment

[cutlass] Sparse conv3d backward fusion #52361

[cutlass] Sparse conv3d backward fusion #52361

Conversation

umiswing commented Mar 30, 2023 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Mar 30, 2023

paddle-bot bot commented Mar 30, 2023

JamesLim-sy left a comment

Choose a reason for hiding this comment

zyfncg left a comment

Choose a reason for hiding this comment

umiswing commented Mar 30, 2023 •

edited

Loading