[SYCL] Add kernel properties for three function attributes #14448

frasercrmck · 2024-07-04T14:58:30Z

This patch adds three kernel properties, max_work_group_size, min_work_groups_per_multiprocessor, and max_work_groups_per_cluster for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp, respectively. The semantics of the properties are the same as for their respective function attributes. Note the change of name from cu to multiprocessor and mp to cluster. This attempts to fix some of the ambiguous naming of the attributes by more closely matching the original CUDA __launch_bounds__ names.

Each of these attributes was originally earmarked as a SYCL equivalent to CUDA's __launch_bounds__ qualifier, hence the focus on lowering for NVPTX. It has since been identified that we will need an additional property for CUDA's maxThreadsPerBlock which isn't always expressible with the three-argument max_work_group_size. That new property will come in a follow-up patch.

This patch also wires up the max_work_group_size property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases. It lowers the other two properties despite them not currently having a SPIR-V equivalent.

This patch adds kernel properties for three existing SYCL function attributes: 'max_work_group_size', 'min_work_groups_per_cu', and 'max_work_groups_per_mp'. The semantics of the properties are the same as for their respective function attributes. Each of these attributes was originally earmarked as a SYCL equivalent to CUDA's '__launch_bounds__' qualifier, hence the focus on lowering for NVPTX. It has since been identified that we will need an additional property for CUDA's 'maxThreadsPerBlock' which isn't always expressable with 'max_work_group_size'. That property will come in a follow-up patch. This patch also wires up the 'max_work_group_size' property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases. It lowers the other two properties despite them not currently having a SPIR-V equivalent.

frasercrmck · 2024-07-04T15:13:29Z

clang/lib/CodeGen/Targets/NVPTX.cpp

+    } else if (auto Attr = F->getFnAttribute("sycl-max-work-group-size");
+               Attr.isValid()) {
+      // Split values in the comma-separated list integers.
+      SmallVector<StringRef, 3> ValStrs;


There's a lot of code overlap between NVPTX lowering of kernel properties and CompileTimePropertiesPass lowering of kernel properties. We might want to think of a shared utility to work with kernel properties.

Hiding the fact that they're all represented as strings in LLVM IR as an "implementation detail" and providing a std::optional<T> KernelPropertyAttr::getAsInteger or getAsIntegerList might be a good step.

…rnel-prop

steffenlarsen

I like the addition, but we should have an experimental extension document for it as well. See sycl/doc/extensions/template.asciidoc or maybe it could be made as an expansion of sycl_ext_oneapi_kernel_arg_properties.

llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp

sycl/include/sycl/ext/oneapi/kernel_properties/properties.hpp

steffenlarsen · 2024-07-04T16:53:47Z

Tag @gmlueck for the comment above about extensions.

frasercrmck · 2024-07-04T16:56:47Z

I like the addition, but we should have an experimental extension document for it as well. See sycl/doc/extensions/template.asciidoc or maybe it could be made as an expansion of sycl_ext_oneapi_kernel_arg_properties.

(sorry for lightning response - I was already writing this to no-one when your feedback came in)

Yeah sorry, I couldn't find any specifications for any of the existing kernel properties. Are you suggesting I add an extension for all properties that already exist, or just these ones?

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

steffenlarsen · 2024-07-05T05:16:59Z

Yeah sorry, I couldn't find any specifications for any of the existing kernel properties.

Are you referring to the Clang attributes with the same names? I am not sure we originally saw them as "SYCL extensions", even though they only work on SYCL kernels currently, from what I know. I suppose they are documented in the Clang attributes documentation though.

For the kernel properties, I believe most are documented either in sycl_ext_oneapi_kernel_arg_properties or in sycl_ext_intel_fpga_kernel_interface_properties.

Are you suggesting I add an extension for all properties that already exist, or just these ones?

Just these new ones. Since they are somewhat similar to the ones in sycl_ext_oneapi_kernel_arg_properties, I wouldn't personally be opposed to adding them in there. Do they have an effect on non-CUDA targets? If not, then we'd want to note that in the extension.

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

I think the ConflictingProperties trait is what you're looking for. You can specialize it for the new properties, then check if the given list has some conflicting property or even compare values if the conflicts depend on the specified arguments.

We also have a check in SemaDeclAttr that might be relevant to expand here. The check issues a warning if the user is using properties on kernels that have a corresponding property. It's not actually checking if there is a conflict, hence it is just a warning. It is not my favorite part of the properties-related code, but we try to make the Clang frontend as oblivious to these properties as possible.

…rnel-prop

frasercrmck · 2024-07-08T10:10:06Z

For the kernel properties, I believe most are documented either in sycl_ext_oneapi_kernel_arg_properties or in sycl_ext_intel_fpga_kernel_interface_properties.

Are you suggesting I add an extension for all properties that already exist, or just these ones?

Just these new ones. Since they are somewhat similar to the ones in sycl_ext_oneapi_kernel_arg_properties, I wouldn't personally be opposed to adding them in there. Do they have an effect on non-CUDA targets? If not, then we'd want to note that in the extension.

I'm not sure these belong in sycl_ext_oneapi_kernel_arg_properties because they're kernel properties, and the naming of that extension implies it concerns only argument properties. However, it looks like sycl_ext_oneapi_kernel_properties is a good place for these properties. Should I add them there?

Sorry, for some reason I couldn't find that sycl_ext_oneapi_kernel_properties extension earlier. You'd think I'd find them with a simple grep, but hey...

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

I think the ConflictingProperties trait is what you're looking for. You can specialize it for the new properties, then check if the given list has some conflicting property or even compare values if the conflicts depend on the specified arguments.

Ah excellent, thanks - I'll check that out. It might not be essential for this PR, and this PR (and its subsequent follow-up) is time sensitive as we want it in before the cut-off.

We also have a check in SemaDeclAttr that might be relevant to expand here. The check issues a warning if the user is using properties on kernels that have a corresponding property. It's not actually checking if there is a conflict, hence it is just a warning. It is not my favorite part of the properties-related code, but we try to make the Clang frontend as oblivious to these properties as possible.

I hadn't spotted that either, thank you. Yeah, I've been made vaguely aware that properties are the way forward, as opposed to attributes. I can see why.

steffenlarsen · 2024-07-08T11:22:07Z

I'm not sure these belong in sycl_ext_oneapi_kernel_arg_properties because they're kernel properties, and the naming of that extension implies it concerns only argument properties. However, it looks like sycl_ext_oneapi_kernel_properties is a good place for these properties. Should I add them there?

Sorry, that was the extension I meant! I may have been a tad too quick when looking for it. I think these fit in there, but I would like to hear @gmlueck's thoughts.

gmlueck · 2024-07-08T14:58:08Z

This patch adds kernel properties for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp. The semantics of the properties are the same as for their respective function attributes.

Are there already C++ attributes with these names that can be used to decorate a kernel? Are they documented someplace?

frasercrmck · 2024-07-08T15:00:29Z

This patch adds kernel properties for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp. The semantics of the properties are the same as for their respective function attributes.

Are there already C++ attributes with these names that can be used to decorate a kernel? Are they documented someplace?

Yep, they were added late last year as function attributes.

max_work_group_size
min_work_groups_per_cu
max_work_groups_per_mp

gmlueck · 2024-07-08T21:13:37Z

What is the behavior if a kernel with these attributes/properties is compiled for a non-CUDA device?

frasercrmck · 2024-07-09T08:57:04Z

What is the behavior if a kernel with these attributes/properties is compiled for a non-CUDA device?

The max_work_group_size attribute is already mapped via SPIR-V to the MaxWorkgroupSizeINTEL Execution Mode so has well-defined semantics. I know of one non-CUDA SPIR-V implementation that made use of that information.

The idea is that the behaviour of min_work_groups_per_cu and max_work_groups_per_mp do nothing on non-CUDA devices. The documentation should be clear about that. In fact I just noticed that these function attributes were mis-spelled and min/max have been accidentally swapped. I'll see if we can fix that for the function attributes, but at the very least we should correct it for the properties, especially if we're going to spec them properly.

My hunch is that max_work_group_size would belong in sycl_ext_oneapi_kernel_properties. The two that are CUDA-specific don't really belong there and might need their own extension.

frasercrmck · 2024-07-09T08:58:53Z

To be specific, I think we can say something to the effect that the two CUDA attributes/properties "have no effect and can be safely ignored when compiled for non-CUDA devices".

frasercrmck · 2024-07-09T12:14:25Z

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Note also that I intend to add another target-agnostic property to help with launch_bounds, which folds the three-dimensional max_work_group_size into a flattened max_total_work_group_size which can be used if you can't guarantee the maximum size of all three dimensions but you still have a total number you know you won't exceed.

I suspect that it might be best if this PR splits into two: one for the two CUDA-specific properties, and another for the two target-agnostic properties. Thoughts?

gmlueck · 2024-07-10T20:36:21Z

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Will you add a specification for these CUDA properties also? If they will have no effect on non-CUDA targets, I think it would be OK to document them in sycl_ext_oneapi_kernel_properties.

elizabethandrews

Please add FE tests

frasercrmck · 2024-07-11T11:29:19Z

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Will you add a specification for these CUDA properties also? If they will have no effect on non-CUDA targets, I think it would be OK to document them in sycl_ext_oneapi_kernel_properties.

Yes I'll add some kind of specification. I'll probably continue it here as a separate PR to #14518 as those have a higher priority.

Please add FE tests

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

elizabethandrews · 2024-07-11T13:36:18Z

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

frasercrmck · 2024-07-11T14:01:30Z

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

elizabethandrews · 2024-07-11T15:35:09Z

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

frasercrmck · 2024-07-15T10:37:52Z

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

Yeah exactly, codegen for the other properties is being testing in the same place.

elizabethandrews · 2024-07-16T15:25:52Z

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

Yeah exactly, codegen for the other properties is being testing in the same place.

My concern is that it is very common to run just check-clang when making pure FE changes and these tests will not be run in that case. My understanding with llvm.org policy is that clang changes need to be accompanied by tests in the clang folder, and we follow that same guideline. So I would personally prefer having these tests in CodeGenSYCL and moving other tests over at some point. @premanandrao thoughts?

bader · 2024-07-16T16:14:59Z

I agree with @elizabethandrews.
Front-end compiler changes are supposed to be tested by clang/test/*.
sycl/test is primary intended to test the code from sycl/include directory.

frasercrmck · 2024-07-17T10:58:17Z

I agree with @elizabethandrews. Front-end compiler changes are supposed to be tested by clang/test/*. sycl/test is primary intended to test the code from sycl/include directory.

I agree with you both in principle but I don't believe this is truly a "clang front-end" change, despite it touching code in clang.

I think what's going on here is that the code that's in clang should really be in LLVM. This is not really a front-end change but is for historical reasons. @steffenlarsen and I have already discussed in #14502 about moving this code at some point.

For the absence of doubt, let me walk through how these kernel properties end up in LLVM IR:

The properties are defined in the SYCL headers. They each specialise ext::oneapi::experimental::detail::PropertyMetaInfo which defines a mapping from the property to an LLVM IR function attribute name.

The SYCL headers for APIs like parallel_for forward each of the properties through the [[__sycl_detail__::add_ir_attributes_function(...)]] attribute which - inside clang - automagically attaches those function attribute to the function in LLVM IR here. Note that the functionality of this attribute is already tested in clang/test/CodeGenSYCL.

Now that the functions have their attributes, they need to be "lowered" for each target. That is, the function attributes are transformed into LLVM IR that the toolchains expect.

For the SPIR-V path, note how this happens in llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp for the SPIR-V path - this isn't using clang at all. If we weren't wanting any CUDA-specific behaviour we wouldn't be having this discussion.

For CUDA, yes, this happens clang/lib/CodeGen/Targets/NVPTX.cpp but it's really an LLVM IR -> LLVM IR transformation (like the CompileTimePropertiesPass does). I'd prefer to move this code into LLVM in a follow-up PR than to test it in clang/test now. To test this in clang/test/CodeGenSYCL would require us to copy or replicate the kernel properties I'm adding into clang/test/CodeGenSYCL/Inputs/sycl.hpp which I don't personally think is worth it. This is not testing code that is intrinsically part of clang, it's ultimately coming from the SYCL headers.

bader · 2024-07-17T15:09:34Z

I agree with @elizabethandrews. Front-end compiler changes are supposed to be tested by clang/test/*. sycl/test is primary intended to test the code from sycl/include directory.

I agree with you both in principle but I don't believe this is truly a "clang front-end" change, despite it touching code in clang.

I think what's going on here is that the code that's in clang should really be in LLVM. This is not really a front-end change but is for historical reasons. @steffenlarsen and I have already discussed in #14502 about moving this code at some point.

For the absence of doubt, let me walk through how these kernel properties end up in LLVM IR:

The properties are defined in the SYCL headers. They each specialise ext::oneapi::experimental::detail::PropertyMetaInfo which defines a mapping from the property to an LLVM IR function attribute name.

The SYCL headers for APIs like parallel_for forward each of the properties through the [[__sycl_detail__::add_ir_attributes_function(...)]] attribute which - inside clang - automagically attaches those function attribute to the function in LLVM IR here. Note that the functionality of this attribute is already tested in clang/test/CodeGenSYCL.

Now that the functions have their attributes, they need to be "lowered" for each target. That is, the function attributes are transformed into LLVM IR that the toolchains expect.

For the SPIR-V path, note how this happens in llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp for the SPIR-V path - this isn't using clang at all. If we weren't wanting any CUDA-specific behaviour we wouldn't be having this discussion.

For CUDA, yes, this happens clang/lib/CodeGen/Targets/NVPTX.cpp but it's really an LLVM IR -> LLVM IR transformation (like the CompileTimePropertiesPass does). I'd prefer to move this code into LLVM in a follow-up PR than to test it in clang/test now. To test this in clang/test/CodeGenSYCL would require us to copy or replicate the kernel properties I'm adding into clang/test/CodeGenSYCL/Inputs/sycl.hpp which I don't personally think is worth it. This is not testing code that is intrinsically part of clang, it's ultimately coming from the SYCL headers.

Why do we push conceptually wrong code, which is supposed to be refactored someday? Why can't we move the code logic from clang/lib/CodeGen/Targets/NVPTX.cpp to LLVM IR pass first and build on top of that?

frasercrmck · 2024-09-25T14:28:57Z

Closed in favour of #14518. If the two other properties are desired in the end, we can open a new PR.

frasercrmck requested a review from steffenlarsen July 4, 2024 14:58

frasercrmck requested review from a team as code owners July 4, 2024 14:58

remove unnecessary comment

a677fae

frasercrmck had a problem deploying to WindowsCILock July 4, 2024 15:00 — with GitHub Actions Error

fix formatting; fix max_work_group_size property value; test it

f9df938

frasercrmck had a problem deploying to WindowsCILock July 4, 2024 15:11 — with GitHub Actions Error

frasercrmck commented Jul 4, 2024

View reviewed changes

make assertion do as it says

781aeee

frasercrmck had a problem deploying to WindowsCILock July 4, 2024 15:19 — with GitHub Actions Error

Merge remote-tracking branch 'origin/sycl' into sycl-launch-bounds-ke…

2df674f

…rnel-prop

frasercrmck had a problem deploying to WindowsCILock July 4, 2024 15:34 — with GitHub Actions Error

remove unused code

f30b594

frasercrmck temporarily deployed to WindowsCILock July 4, 2024 16:36 — with GitHub Actions Inactive

steffenlarsen reviewed Jul 4, 2024

View reviewed changes

llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp Outdated Show resolved Hide resolved

sycl/include/sycl/ext/oneapi/kernel_properties/properties.hpp Outdated Show resolved Hide resolved

frasercrmck temporarily deployed to WindowsCILock July 4, 2024 20:26 — with GitHub Actions Inactive

frasercrmck added 4 commits July 8, 2024 10:39

bounds check

323d1d0

Merge remote-tracking branch 'origin/sycl' into sycl-launch-bounds-ke…

14ffb1a

…rnel-prop

update comment

7496935

feedback: property doesn't need parameter pack

60da7cd

frasercrmck temporarily deployed to WindowsCILock July 8, 2024 10:00 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock July 8, 2024 10:57 — with GitHub Actions Inactive

rename properties

a10c350

frasercrmck temporarily deployed to WindowsCILock July 9, 2024 12:08 — with GitHub Actions Inactive

frasercrmck temporarily deployed to WindowsCILock July 9, 2024 13:49 — with GitHub Actions Inactive

frasercrmck mentioned this pull request Jul 10, 2024

[SYCL] Add max work-group size kernel properties #14518

Merged

elizabethandrews reviewed Jul 10, 2024

View reviewed changes

frasercrmck closed this Sep 25, 2024

frasercrmck deleted the sycl-launch-bounds-kernel-prop branch September 25, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add kernel properties for three function attributes #14448

[SYCL] Add kernel properties for three function attributes #14448

frasercrmck commented Jul 4, 2024 •

edited

Loading

frasercrmck Jul 4, 2024

frasercrmck Jul 4, 2024

steffenlarsen left a comment

steffenlarsen commented Jul 4, 2024

frasercrmck commented Jul 4, 2024

steffenlarsen commented Jul 5, 2024

frasercrmck commented Jul 8, 2024

steffenlarsen commented Jul 8, 2024

gmlueck commented Jul 8, 2024

frasercrmck commented Jul 8, 2024

gmlueck commented Jul 8, 2024

frasercrmck commented Jul 9, 2024

frasercrmck commented Jul 9, 2024

frasercrmck commented Jul 9, 2024

gmlueck commented Jul 10, 2024

elizabethandrews left a comment

frasercrmck commented Jul 11, 2024

elizabethandrews commented Jul 11, 2024

frasercrmck commented Jul 11, 2024

elizabethandrews commented Jul 11, 2024

frasercrmck commented Jul 15, 2024

elizabethandrews commented Jul 16, 2024

bader commented Jul 16, 2024

frasercrmck commented Jul 17, 2024

bader commented Jul 17, 2024

frasercrmck commented Sep 25, 2024

[SYCL] Add kernel properties for three function attributes #14448

[SYCL] Add kernel properties for three function attributes #14448

Conversation

frasercrmck commented Jul 4, 2024 • edited Loading

frasercrmck Jul 4, 2024

Choose a reason for hiding this comment

frasercrmck Jul 4, 2024

Choose a reason for hiding this comment

steffenlarsen left a comment

Choose a reason for hiding this comment

steffenlarsen commented Jul 4, 2024

frasercrmck commented Jul 4, 2024

steffenlarsen commented Jul 5, 2024

frasercrmck commented Jul 8, 2024

steffenlarsen commented Jul 8, 2024

gmlueck commented Jul 8, 2024

frasercrmck commented Jul 8, 2024

gmlueck commented Jul 8, 2024

frasercrmck commented Jul 9, 2024

frasercrmck commented Jul 9, 2024

frasercrmck commented Jul 9, 2024

gmlueck commented Jul 10, 2024

elizabethandrews left a comment

Choose a reason for hiding this comment

frasercrmck commented Jul 11, 2024

elizabethandrews commented Jul 11, 2024

frasercrmck commented Jul 11, 2024

elizabethandrews commented Jul 11, 2024

frasercrmck commented Jul 15, 2024

elizabethandrews commented Jul 16, 2024

bader commented Jul 16, 2024

frasercrmck commented Jul 17, 2024

bader commented Jul 17, 2024

frasercrmck commented Sep 25, 2024

frasercrmck commented Jul 4, 2024 •

edited

Loading