Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add kernel properties for three function attributes #14448

Closed

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Jul 4, 2024

This patch adds three kernel properties, max_work_group_size, min_work_groups_per_multiprocessor, and max_work_groups_per_cluster for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp, respectively. The semantics of the properties are the same as for their respective function attributes. Note the change of name from cu to multiprocessor and mp to cluster. This attempts to fix some of the ambiguous naming of the attributes by more closely matching the original CUDA __launch_bounds__ names.

Each of these attributes was originally earmarked as a SYCL equivalent to CUDA's __launch_bounds__ qualifier, hence the focus on lowering for NVPTX. It has since been identified that we will need an additional property for CUDA's maxThreadsPerBlock which isn't always expressible with the three-argument max_work_group_size. That new property will come in a follow-up patch.

This patch also wires up the max_work_group_size property to the equivalent SPIR-V execution mode, which should hopefully improve certain use cases. It lowers the other two properties despite them not currently having a SPIR-V equivalent.

This patch adds kernel properties for three existing SYCL function
attributes: 'max_work_group_size', 'min_work_groups_per_cu', and
'max_work_groups_per_mp'. The semantics of the properties are the same as
for their respective function attributes.

Each of these attributes was originally earmarked as a SYCL equivalent
to CUDA's '__launch_bounds__' qualifier, hence the focus on lowering for
NVPTX. It has since been identified that we will need an additional
property for CUDA's 'maxThreadsPerBlock' which isn't always expressable
with 'max_work_group_size'. That property will come in a follow-up
patch.

This patch also wires up the 'max_work_group_size' property to the
equivalent SPIR-V execution mode, which should hopefully improve certain
use cases. It lowers the other two properties despite them not currently
having a SPIR-V equivalent.
@frasercrmck frasercrmck requested review from a team as code owners July 4, 2024 14:58
} else if (auto Attr = F->getFnAttribute("sycl-max-work-group-size");
Attr.isValid()) {
// Split values in the comma-separated list integers.
SmallVector<StringRef, 3> ValStrs;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code overlap between NVPTX lowering of kernel properties and CompileTimePropertiesPass lowering of kernel properties. We might want to think of a shared utility to work with kernel properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiding the fact that they're all represented as strings in LLVM IR as an "implementation detail" and providing a std::optional<T> KernelPropertyAttr::getAsInteger or getAsIntegerList might be a good step.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition, but we should have an experimental extension document for it as well. See sycl/doc/extensions/template.asciidoc or maybe it could be made as an expansion of sycl_ext_oneapi_kernel_arg_properties.

llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp Outdated Show resolved Hide resolved
@steffenlarsen
Copy link
Contributor

Tag @gmlueck for the comment above about extensions.

@frasercrmck
Copy link
Contributor Author

I like the addition, but we should have an experimental extension document for it as well. See sycl/doc/extensions/template.asciidoc or maybe it could be made as an expansion of sycl_ext_oneapi_kernel_arg_properties.

(sorry for lightning response - I was already writing this to no-one when your feedback came in)

Yeah sorry, I couldn't find any specifications for any of the existing kernel properties. Are you suggesting I add an extension for all properties that already exist, or just these ones?

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

@steffenlarsen
Copy link
Contributor

Yeah sorry, I couldn't find any specifications for any of the existing kernel properties.

Are you referring to the Clang attributes with the same names? I am not sure we originally saw them as "SYCL extensions", even though they only work on SYCL kernels currently, from what I know. I suppose they are documented in the Clang attributes documentation though.

For the kernel properties, I believe most are documented either in sycl_ext_oneapi_kernel_arg_properties or in sycl_ext_intel_fpga_kernel_interface_properties.

Are you suggesting I add an extension for all properties that already exist, or just these ones?

Just these new ones. Since they are somewhat similar to the ones in sycl_ext_oneapi_kernel_arg_properties, I wouldn't personally be opposed to adding them in there. Do they have an effect on non-CUDA targets? If not, then we'd want to note that in the extension.

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

I think the ConflictingProperties trait is what you're looking for. You can specialize it for the new properties, then check if the given list has some conflicting property or even compare values if the conflicts depend on the specified arguments.

We also have a check in SemaDeclAttr that might be relevant to expand here. The check issues a warning if the user is using properties on kernels that have a corresponding property. It's not actually checking if there is a conflict, hence it is just a warning. It is not my favorite part of the properties-related code, but we try to make the Clang frontend as oblivious to these properties as possible.

@frasercrmck
Copy link
Contributor Author

For the kernel properties, I believe most are documented either in sycl_ext_oneapi_kernel_arg_properties or in sycl_ext_intel_fpga_kernel_interface_properties.

Are you suggesting I add an extension for all properties that already exist, or just these ones?

Just these new ones. Since they are somewhat similar to the ones in sycl_ext_oneapi_kernel_arg_properties, I wouldn't personally be opposed to adding them in there. Do they have an effect on non-CUDA targets? If not, then we'd want to note that in the extension.

I'm not sure these belong in sycl_ext_oneapi_kernel_arg_properties because they're kernel properties, and the naming of that extension implies it concerns only argument properties. However, it looks like sycl_ext_oneapi_kernel_properties is a good place for these properties. Should I add them there?

Sorry, for some reason I couldn't find that sycl_ext_oneapi_kernel_properties extension earlier. You'd think I'd find them with a simple grep, but hey...

Also I couldn't pinpoint where (if anywhere) kernel properties are verified with respect to each other, in the same way that attributes are. We'd ideally like to check that various work-group properties make sense as a holistic whole, wouldn't we?

I think the ConflictingProperties trait is what you're looking for. You can specialize it for the new properties, then check if the given list has some conflicting property or even compare values if the conflicts depend on the specified arguments.

Ah excellent, thanks - I'll check that out. It might not be essential for this PR, and this PR (and its subsequent follow-up) is time sensitive as we want it in before the cut-off.

We also have a check in SemaDeclAttr that might be relevant to expand here. The check issues a warning if the user is using properties on kernels that have a corresponding property. It's not actually checking if there is a conflict, hence it is just a warning. It is not my favorite part of the properties-related code, but we try to make the Clang frontend as oblivious to these properties as possible.

I hadn't spotted that either, thank you. Yeah, I've been made vaguely aware that properties are the way forward, as opposed to attributes. I can see why.

@steffenlarsen
Copy link
Contributor

I'm not sure these belong in sycl_ext_oneapi_kernel_arg_properties because they're kernel properties, and the naming of that extension implies it concerns only argument properties. However, it looks like sycl_ext_oneapi_kernel_properties is a good place for these properties. Should I add them there?

Sorry, that was the extension I meant! I may have been a tad too quick when looking for it. I think these fit in there, but I would like to hear @gmlueck's thoughts.

@gmlueck
Copy link
Contributor

gmlueck commented Jul 8, 2024

This patch adds kernel properties for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp. The semantics of the properties are the same as for their respective function attributes.

Are there already C++ attributes with these names that can be used to decorate a kernel? Are they documented someplace?

@frasercrmck
Copy link
Contributor Author

This patch adds kernel properties for three existing SYCL function attributes: max_work_group_size, min_work_groups_per_cu, and max_work_groups_per_mp. The semantics of the properties are the same as for their respective function attributes.

Are there already C++ attributes with these names that can be used to decorate a kernel? Are they documented someplace?

Yep, they were added late last year as function attributes.

max_work_group_size
min_work_groups_per_cu
max_work_groups_per_mp

@gmlueck
Copy link
Contributor

gmlueck commented Jul 8, 2024

What is the behavior if a kernel with these attributes/properties is compiled for a non-CUDA device?

@frasercrmck
Copy link
Contributor Author

What is the behavior if a kernel with these attributes/properties is compiled for a non-CUDA device?

The max_work_group_size attribute is already mapped via SPIR-V to the MaxWorkgroupSizeINTEL Execution Mode so has well-defined semantics. I know of one non-CUDA SPIR-V implementation that made use of that information.

The idea is that the behaviour of min_work_groups_per_cu and max_work_groups_per_mp do nothing on non-CUDA devices. The documentation should be clear about that. In fact I just noticed that these function attributes were mis-spelled and min/max have been accidentally swapped. I'll see if we can fix that for the function attributes, but at the very least we should correct it for the properties, especially if we're going to spec them properly.

My hunch is that max_work_group_size would belong in sycl_ext_oneapi_kernel_properties. The two that are CUDA-specific don't really belong there and might need their own extension.

@frasercrmck
Copy link
Contributor Author

To be specific, I think we can say something to the effect that the two CUDA attributes/properties "have no effect and can be safely ignored when compiled for non-CUDA devices".

@frasercrmck
Copy link
Contributor Author

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Note also that I intend to add another target-agnostic property to help with launch_bounds, which folds the three-dimensional max_work_group_size into a flattened max_total_work_group_size which can be used if you can't guarantee the maximum size of all three dimensions but you still have a total number you know you won't exceed.

I suspect that it might be best if this PR splits into two: one for the two CUDA-specific properties, and another for the two target-agnostic properties. Thoughts?

@gmlueck
Copy link
Contributor

gmlueck commented Jul 10, 2024

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Will you add a specification for these CUDA properties also? If they will have no effect on non-CUDA targets, I think it would be OK to document them in sycl_ext_oneapi_kernel_properties.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add FE tests

@frasercrmck
Copy link
Contributor Author

Note, I've now renamed two of the CUDA-specific properties as we found that the original function attributes' names were misleading, with cu mapping to "multiprocessor" and mp to "cluster".

Will you add a specification for these CUDA properties also? If they will have no effect on non-CUDA targets, I think it would be OK to document them in sycl_ext_oneapi_kernel_properties.

Yes I'll add some kind of specification. I'll probably continue it here as a separate PR to #14518 as those have a higher priority.

Please add FE tests

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

@elizabethandrews
Copy link
Contributor

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

@frasercrmck
Copy link
Contributor Author

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

@elizabethandrews
Copy link
Contributor

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

@frasercrmck
Copy link
Contributor Author

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

Yeah exactly, codegen for the other properties is being testing in the same place.

@elizabethandrews
Copy link
Contributor

Where abouts exactly, sorry? I had assumed that the tests I'd added to sycl/tests were sufficient as it's how we test the existing work_group_size, work_group_size_hint, sub_group_size, and device_has properties from the same extension.

We usually require frontend tests for any frontend changes. It looks like this PR adds new metadata. Tests checking for it should suffice.

The tests I've added in sycl/tests do check metadata, though - for both SPIR-V and NVPTX lowering paths. Maybe I'm missing something, sorry.

No its my bad sorry. I did not realize you were doing IR checks in sycl/test. We usually do those in CodeGenSYCL in clang/tests without invoking the driver. Are metadata checks for other properties tested in sycl/test and not in clang/tests?

Yeah exactly, codegen for the other properties is being testing in the same place.

My concern is that it is very common to run just check-clang when making pure FE changes and these tests will not be run in that case. My understanding with llvm.org policy is that clang changes need to be accompanied by tests in the clang folder, and we follow that same guideline. So I would personally prefer having these tests in CodeGenSYCL and moving other tests over at some point. @premanandrao thoughts?

@bader
Copy link
Contributor

bader commented Jul 16, 2024

I agree with @elizabethandrews.
Front-end compiler changes are supposed to be tested by clang/test/*.
sycl/test is primary intended to test the code from sycl/include directory.

@frasercrmck
Copy link
Contributor Author

I agree with @elizabethandrews. Front-end compiler changes are supposed to be tested by clang/test/*. sycl/test is primary intended to test the code from sycl/include directory.

I agree with you both in principle but I don't believe this is truly a "clang front-end" change, despite it touching code in clang.

I think what's going on here is that the code that's in clang should really be in LLVM. This is not really a front-end change but is for historical reasons. @steffenlarsen and I have already discussed in #14502 about moving this code at some point.

For the absence of doubt, let me walk through how these kernel properties end up in LLVM IR:

The properties are defined in the SYCL headers. They each specialise ext::oneapi::experimental::detail::PropertyMetaInfo which defines a mapping from the property to an LLVM IR function attribute name.

The SYCL headers for APIs like parallel_for forward each of the properties through the [[__sycl_detail__::add_ir_attributes_function(...)]] attribute which - inside clang - automagically attaches those function attribute to the function in LLVM IR here. Note that the functionality of this attribute is already tested in clang/test/CodeGenSYCL.

Now that the functions have their attributes, they need to be "lowered" for each target. That is, the function attributes are transformed into LLVM IR that the toolchains expect.

For the SPIR-V path, note how this happens in llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp for the SPIR-V path - this isn't using clang at all. If we weren't wanting any CUDA-specific behaviour we wouldn't be having this discussion.

For CUDA, yes, this happens clang/lib/CodeGen/Targets/NVPTX.cpp but it's really an LLVM IR -> LLVM IR transformation (like the CompileTimePropertiesPass does). I'd prefer to move this code into LLVM in a follow-up PR than to test it in clang/test now. To test this in clang/test/CodeGenSYCL would require us to copy or replicate the kernel properties I'm adding into clang/test/CodeGenSYCL/Inputs/sycl.hpp which I don't personally think is worth it. This is not testing code that is intrinsically part of clang, it's ultimately coming from the SYCL headers.

@bader
Copy link
Contributor

bader commented Jul 17, 2024

I agree with @elizabethandrews. Front-end compiler changes are supposed to be tested by clang/test/*. sycl/test is primary intended to test the code from sycl/include directory.

I agree with you both in principle but I don't believe this is truly a "clang front-end" change, despite it touching code in clang.

I think what's going on here is that the code that's in clang should really be in LLVM. This is not really a front-end change but is for historical reasons. @steffenlarsen and I have already discussed in #14502 about moving this code at some point.

For the absence of doubt, let me walk through how these kernel properties end up in LLVM IR:

The properties are defined in the SYCL headers. They each specialise ext::oneapi::experimental::detail::PropertyMetaInfo which defines a mapping from the property to an LLVM IR function attribute name.

The SYCL headers for APIs like parallel_for forward each of the properties through the [[__sycl_detail__::add_ir_attributes_function(...)]] attribute which - inside clang - automagically attaches those function attribute to the function in LLVM IR here. Note that the functionality of this attribute is already tested in clang/test/CodeGenSYCL.

Now that the functions have their attributes, they need to be "lowered" for each target. That is, the function attributes are transformed into LLVM IR that the toolchains expect.

For the SPIR-V path, note how this happens in llvm/lib/SYCLLowerIR/CompileTimePropertiesPass.cpp for the SPIR-V path - this isn't using clang at all. If we weren't wanting any CUDA-specific behaviour we wouldn't be having this discussion.

For CUDA, yes, this happens clang/lib/CodeGen/Targets/NVPTX.cpp but it's really an LLVM IR -> LLVM IR transformation (like the CompileTimePropertiesPass does). I'd prefer to move this code into LLVM in a follow-up PR than to test it in clang/test now. To test this in clang/test/CodeGenSYCL would require us to copy or replicate the kernel properties I'm adding into clang/test/CodeGenSYCL/Inputs/sycl.hpp which I don't personally think is worth it. This is not testing code that is intrinsically part of clang, it's ultimately coming from the SYCL headers.

Why do we push conceptually wrong code, which is supposed to be refactored someday? Why can't we move the code logic from clang/lib/CodeGen/Targets/NVPTX.cpp to LLVM IR pass first and build on top of that?

@frasercrmck
Copy link
Contributor Author

Closed in favour of #14518. If the two other properties are desired in the end, we can open a new PR.

@frasercrmck frasercrmck deleted the sycl-launch-bounds-kernel-prop branch September 25, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants