-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SYCL-PTX] Generate reqntid PTX directive from reqd_work_group_size #3755
Closed
steffenlarsen
wants to merge
1
commit into
intel:sycl
from
steffenlarsen:steffen/cuda_reqntid_from_reqd_work_group_size
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
// RUN: %clang_cc1 -fsycl-is-device %s -emit-llvm -triple nvptx64-nvidia-cuda-sycldevice -o - | FileCheck %s | ||
|
||
template <typename name, typename Func> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please Also, please add a comment describing the test |
||
__attribute__((sycl_kernel)) void kernel(const Func &kernelFunc) { | ||
kernelFunc(); | ||
} | ||
|
||
int main() { | ||
kernel<class kernel_no_reqd_work_size>([]() {}); | ||
// CHECK: define dso_local void @{{.*}}kernel_no_reqd_work_size() | ||
// CHECK-NOT: define dso_local void @{{.*}}kernel_no_reqd_work_size() {{.*}} !reqd_work_group_size ![[WGSIZE1D:[0-9]+]] | ||
|
||
kernel<class kernel_reqd_work_size_1d>( | ||
[]() [[intel::reqd_work_group_size(32)]]{}); | ||
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_1d() {{.*}} !reqd_work_group_size ![[WGSIZE1D:[0-9]+]] | ||
|
||
kernel<class kernel_reqd_work_size_2d>( | ||
[]() [[intel::reqd_work_group_size(64, 32)]]{}); | ||
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_2d() {{.*}} !reqd_work_group_size ![[WGSIZE2D:[0-9]+]] | ||
|
||
kernel<class kernel_reqd_work_size_3d>( | ||
[]() [[intel::reqd_work_group_size(128, 64, 32)]]{}); | ||
// CHECK: define dso_local void @{{.*}}kernel_reqd_work_size_3d() {{.*}} !reqd_work_group_size ![[WGSIZE3D:[0-9]+]] | ||
} | ||
|
||
// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidx", i32 !{{[0-9]+}}} | ||
// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidy", i32 !{{[0-9]+}}} | ||
// CHECK-NOT: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_no_reqd_work_size, !"reqntidz", i32 !{{[0-9]+}}} | ||
|
||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidx", i32 1} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidy", i32 1} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_1d, !"reqntidz", i32 32} | ||
|
||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidx", i32 1} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidy", i32 32} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_2d, !"reqntidz", i32 64} | ||
|
||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidx", i32 32} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidy", i32 64} | ||
// CHECK: !{{[0-9]+}} = !{void ()* @{{.*}}kernel_reqd_work_size_3d, !"reqntidz", i32 128} | ||
|
||
// CHECK: ![[WGSIZE1D]] = !{i32 1, i32 1, i32 32} | ||
// CHECK: ![[WGSIZE2D]] = !{i32 1, i32 32, i32 64} | ||
// CHECK: ![[WGSIZE3D]] = !{i32 32, i32 64, i32 128} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am understanding this correctly, in user code the order of arguments will be reversed for SYCL vs OpenCL but in final IR generated, the order will be openCL convention for all? @AaronBallman @smanna12 can you confirm this is correct? Does this match what we do for other targets?
I'm also unsure of how default values come into play here. I see 1 being generated in IR below. IIRC we have an existing bug with default values right? How should they be handled especially in the context of this swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was heavily inspired by https://github.com/intel/llvm/blob/sycl/clang/lib/CodeGen/CodeGenFunction.cpp#L632. I believe the tests represent this as well.
Good point. I think you are right that the swapping becomes a problem if there are any defaults in there, i.e. the first two concrete IR tests should be
Since this code is not generating the defaults, it will be hard to distinguish default-generated 1's from user-specified 1 requirements. Maybe this will have to wait until there is a resolution to the defaults. Do you know if there is an issue open somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is one: #3743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirmation.