Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Enable parameter optimization for SYCL kernels #2236

Closed

Conversation

srividya-sundaram
Copy link
Contributor

@srividya-sundaram srividya-sundaram commented Jul 31, 2020

Add SYCL integration header support with kernel arg optimization table to mark kernel parameters that can be omitted.

@erichkeane
Copy link
Contributor

Note this is the CFE component to: #2226

clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenSYCL/integration_header.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenSYCL/integration_header.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenSYCL/integration_header.cpp Outdated Show resolved Hide resolved
sycl/include/CL/sycl/detail/kernel_desc.hpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/test/CodeGenSYCL/kernel-param-acc-array-ih.cpp Outdated Show resolved Hide resolved
@erichkeane
Copy link
Contributor

I lost the thing, but to do clang format, check this out:

git diff -U0 --no-color COMMIT_ID_BEFORE_YOURS | clang-format-diff.py -i -p1

You might have to give it a direct path to clang-format-diff.py, and make sure clang-format executable is in your path (and make sure you are in the llvm 'root' directory), but this tends to work for me.

Alternatively, you can manually apply the clang-format-diff file from the build bot.

Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handful of small changes, otherwise this LGTM. Re-request a review when you think you've got them all and I'll take a look (likely Monday).

clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
@srividya-sundaram srividya-sundaram changed the title Kernel arg optimization [SYCL] Enable parameter optimization for SYCL kernels Aug 1, 2020
Copy link
Contributor

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 nit left for me.

@Naghasan
Copy link
Contributor

Naghasan commented Aug 3, 2020

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.

If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q

clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Show resolved Hide resolved
clang/lib/Sema/SemaSYCL.cpp Outdated Show resolved Hide resolved
sycl/include/CL/sycl/detail/kernel_desc.hpp Outdated Show resolved Hide resolved
erichkeane
erichkeane previously approved these changes Aug 3, 2020
Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small nit, otherwise lgtm

clang/include/clang/Sema/Sema.h Outdated Show resolved Hide resolved
@erichkeane
Copy link
Contributor

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.

If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Can you clarify this comment? @romanovvlad has taken a look at this solution and seems to think it is workable, so I'd be interested to hear what the two of you believe should be changed here?

@Naghasan
Copy link
Contributor

Naghasan commented Aug 3, 2020

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.
If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Can you clarify this comment? @romanovvlad has taken a look at this solution and seems to think it is workable, so I'd be interested to hear what the two of you believe should be changed here?

If I compile using -fsycl-targets=spir64-unknown-linux-sycldevice,spir64_fpga-unknown-linux-sycldevice, what guarantees is there that the compiler will remove exactly the same arguments in my kernels for both module ? If I start to tweak the optimization parameters, it is likely to expose different optimization opportunities and the dead argument elimination may yield different results. An extreme case would be something like -fsycl-targets=spir64-unknown-linux-sycldevice,spir64_fpga-unknown-linux-sycldevice -Xsycl-target-backend=spir64-unknown-linux-sycldevice "-O3" -Xsycl-target-backend=spir64_fpga-unknown-linux-sycldevice "-O0".

So unless I missed something in the integration header evolution and it is emitted per target, the result is likely to fail for one of the target.

PD.Kind = Kind;
PD.Info = Info;
PD.Offset = Offset;
K->Params.push_back({Kind, Info, Offset, NumOpenCLParams});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.emplace_back()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell : emplace_back doesn't support aggregate initialization until C++20, and our codebase is C++14.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell : emplace_back doesn't support aggregate initialization until C++20, and our codebase is C++14.

Then you know what you have to do. Please replace all the -std=c++14 with -std=c++20 ;-)
This is depressing all this time wasted writing old code... :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keryell : emplace_back doesn't support aggregate initialization until C++20, and our codebase is C++14.

Then you know what you have to do. Please replace all the -std=c++14 with -std=c++20 ;-)
This is depressing all this time wasted writing old code... :-(

Wish I could! It was hard enough to get the LLVM project switched over from a terrible-subset-of C++11 to C++14.

@romanovvlad
Copy link
Contributor

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.
If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Can you clarify this comment? @romanovvlad has taken a look at this solution and seems to think it is workable, so I'd be interested to hear what the two of you believe should be changed here?

The solution is workable from SYCL RT point of view - SYCL RT can skip setting argument which are marked in special way.
But, I believe, the problem @Naghasan is describing it really a problem and related to the compilation toolchain.

@erichkeane
Copy link
Contributor

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.
If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Can you clarify this comment? @romanovvlad has taken a look at this solution and seems to think it is workable, so I'd be interested to hear what the two of you believe should be changed here?

The solution is workable from SYCL RT point of view - SYCL RT can skip setting argument which are marked in special way.
But, I believe, the problem @Naghasan is describing it really a problem and related to the compilation toolchain.

What happens in the driver (@mdtoguchi ?) with this multiple device targets? Do they all share a single integration header, or is there a separate one for each?

Short-term we could probably limit the LLVM DSE enablement (via driver) to only single-target invocations. Long-term, could we have a separate integration header per-target? The CFE can be changed to output multiple files if necessary for the integration header.

@romanovvlad
Copy link
Contributor

The kernel arg optimization information is not a kernel descriptor. The information should be bound to the module descriptor.
If I compile for 2 targets, there is no guarantees that arguments will be elided in the same way in both cases.

Can you clarify this comment? @romanovvlad has taken a look at this solution and seems to think it is workable, so I'd be interested to hear what the two of you believe should be changed here?

The solution is workable from SYCL RT point of view - SYCL RT can skip setting argument which are marked in special way.
But, I believe, the problem @Naghasan is describing it really a problem and related to the compilation toolchain.

What happens in the driver (@mdtoguchi ?) with this multiple device targets? Do they all share a single integration header, or is there a separate one for each?

Short-term we could probably limit the LLVM DSE enablement (via driver) to only single-target invocations. Long-term, could we have a separate integration header per-target? The CFE can be changed to output multiple files if necessary for the integration header.

The integration header are included into the application, so we will have multiple definitions of things headers define.

@mdtoguchi
Copy link
Contributor

What happens in the driver (@mdtoguchi ?) with this multiple device targets? Do they all share a single integration header, or is there a separate one for each?

Current behavior is to generate a single integration header that is pulled into the host file.

@erichkeane
Copy link
Contributor

So then, what is our solution here? How can we communicate this to the runtime? Do we just prohibit the optimization in multi-device situations? Do we prohibit different opt-settings in that case?

@romanovvlad
Copy link
Contributor

So then, what is our solution here? How can we communicate this to the runtime? Do we just prohibit the optimization in multi-device situations? Do we prohibit different opt-settings in that case?

As an idea we could change the approach and embed "omit array" to the device image as a metadata. But it requires more changes in the RT because in the place where we currently collect arguments we don't know which specific device binary will be used.

@srividya-sundaram srividya-sundaram deleted the kernel-arg-optimization branch August 4, 2020 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants