Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable strided split #56882

Merged

Conversation

wanghuancoder
Copy link
Contributor

PR types

Others

PR changes

Others

Description

Pcard-71699
框架支持stride后。发现split性能下降。
在split后调用matmul,假如split 3份,且axis不为0,此时前向需要调用3次contiguous,反向也调用3次contiguous。导致模型性能下降。
通过Python手动调用contiguous。可以将contiguous的次数降为3次。但3个contiguous kernel的性能不及1个split的性能。
在V100上,3个contiguous kernel比1个split慢7%。但在A100上慢87%。
#56866 尝试加速contiguous。但发生CUDA700,应该没有写对。即使写对了,新的写法速度不如原有写法。
暂时让split不走stride。

@paddle-bot
Copy link

paddle-bot bot commented Sep 1, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki
Xreki previously approved these changes Sep 1, 2023
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghuancoder wanghuancoder merged commit eddf6d0 into PaddlePaddle:develop Sep 4, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants