-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using PyTorch WARP_SHFL_DOWN macro for half support #2843
Conversation
May I have an ETA for the CI to complete? I appreciate it! |
We are needing this for rocm |
@grimoire can you take a look please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @zhouzaida
thanks @grimoire! Anyone else, please review |
@zhouzaida can you take a look please? |
Hi @zstreet87 , sorry for my late reply. |
Motivation
In preparation for ROCm5.6, this is required for successful compilation. Using AMD's intrinsic resulted in an "ambiguous-type" error. TLDR; some hip-header migration it seems.
This PR will require tests against ROCm5.6 once it is released and the CI is using it.
Modification
Using Pytorch's macro to handle the shfl_down intrinsic for functionality and portability.
Checklist
Before PR:
After PR: