-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size #3755
[WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size #3755
Conversation
These changes makes clang generate NVPTX annotations for reqntidx, reqntidy, and reqntidz from the reqd_work_group_size attribute on kernels in a SYCL program when targetting. These are in turn converted to the reqntid PTX directive. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for the delay in review. I missed the email notification.
@@ -0,0 +1,44 @@ | |||
// RUN: %clang_cc1 -fsycl-is-device %s -emit-llvm -triple nvptx64-nvidia-cuda-sycldevice -o - | FileCheck %s | |||
|
|||
template <typename name, typename Func> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please #include "sycl.hpp"
. You can use the headers included in test folder by adding -internal-isystem %S/Inputs
to RUN line.
Also, please add a comment describing the test
Optional<llvm::APSInt> YDimVal = A->getYDimVal(ClangCtx); | ||
Optional<llvm::APSInt> ZDimVal = A->getZDimVal(ClangCtx); | ||
|
||
// For a SYCLDevice ReqdWorkGroupSizeAttr arguments are reversed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am understanding this correctly, in user code the order of arguments will be reversed for SYCL vs OpenCL but in final IR generated, the order will be openCL convention for all? @AaronBallman @smanna12 can you confirm this is correct? Does this match what we do for other targets?
I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632. I believe the tests represent this as well.
I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.
Good point. I think you are right that the swapping becomes a problem if there are any defaults in there, i.e. the first two concrete IR tests should be
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidy", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidz", i32 1}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidx", i32 32}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidy", i32 64}
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidz", i32 1}
Since this code is not generating the defaults, it will be hard to distinguish default-generated 1's from user-specified 1 requirements. Maybe this will have to wait until there is a resolution to the defaults. Do you know if there is an issue open somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is an issue open somewhere?
yes, there is one: #3743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this match what we do for other targets?
This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632.
Thanks for confirmation.
I have marked this WIP until #3755 (comment) can be resolved. |
These changes makes clang generate NVVM annotations for
reqntidx
,reqntidy
, andreqntidz
from thereqd_work_group_size
attribute on kernels in a SYCL program when targeting. These are in turn converted to thereqntid
PTX directive.This is related to, but not dependent on, #3735.