-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cutlass] Sparse conv3d backward fusion #52361
Conversation
Change splitK slices select rules, faster on some shapes, need to be tuned
你的PR提交成功,感谢你对开源项目的贡献! |
❌ The PR is not created using PR's template. You can refer to this Demo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for static-check-ci
PR types
Performance optimization
PR changes
Others
Describe
1.在
sparse/gpu/conv_grad_kernel.cu
中添加了计算d_kernel, d_x
时的gather_gemm_scatter
融合。2.在生成脚本中添加了sparse conv3d反向融合需要用到的kernel的生成代码。
3.在auto tune中添加反向融合接口。
4.cutlass提供的算子融合在反向时需要将gemm计算结果写入buffer中,一次性分配大小为
sizeof(float) * max_in_channels * max_out_channels * max_splitk_slices = 4 * 256 * 256 *256 bytes = 67MB
的buffer。若训练过程中需要更大的buffer则更新。buffer将在训练结束后释放。相比develop版本,添加反向融合后,4卡a100上训练性能提高5%。