Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] Adds PI CUDA support for reqd_work_group_size attribute #3735

Merged
merged 12 commits into from
Jul 16, 2021

Conversation

steffenlarsen
Copy link
Contributor

This commit adds support for reqd_work_group_size in the PI CUDA backend
by extracting the attribute as program metadata. The program metadata
accompanies the binary when passed to the backend and it is up to the
backend if they extract any useful metadata. This adds two additional
parameters to piProgramCreateWithBinary for passing the program
metadata.

Program metadata is transported as a properties created by
sycl-post-link, so this commit also changes the behaviour of the NVPTX
path for linkage actions leading to the offload wrapper. These changes
uses file tables for the NVPTX path as well to allow generation and
preservation of properties. This assumes that the file table only ever
contains a single row if taking the NVPTX path and will fail otherwise.

@steffenlarsen
Copy link
Contributor Author

Fixes test SYCL/Basic/reqd_work_group_size.cpp in test suite for CUDA. (PR: intel/llvm-test-suite#278)

bader
bader previously approved these changes May 13, 2021
sycl/unittests/pi/cuda/test_kernels.cpp Show resolved Hide resolved
sycl/plugins/cuda/pi_cuda.cpp Outdated Show resolved Hide resolved
@steffenlarsen
Copy link
Contributor Author

Looks like I missed a clang driver test. Should be fixed now.

bader
bader previously approved these changes May 13, 2021
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review for PropertySetIO, file-table-tform and sycl-post-link: overall changes look good to me, just a few (mostly minor) comments

llvm/tools/sycl-post-link/sycl-post-link.cpp Outdated Show resolved Hide resolved
llvm/tools/sycl-post-link/sycl-post-link.cpp Outdated Show resolved Hide resolved
llvm/lib/Support/SimpleTable.cpp Show resolved Hide resolved
bader
bader previously approved these changes May 17, 2021
Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation (device link and wrap image) should be updated as well. https://github.com/intel/llvm/blob/sycl/sycl/doc/images/DeviceLinkAndWrap.svg

@steffenlarsen
Copy link
Contributor Author

Documentation (device link and wrap image) should be updated as well. https://github.com/intel/llvm/blob/sycl/sycl/doc/images/DeviceLinkAndWrap.svg

Definitely! I have updated the documentation. Hopefully I found all the relevant sections.

Copy link
Contributor

@pvchupin pvchupin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @kbobrovs please take a look as well.

pvchupin
pvchupin previously approved these changes May 18, 2021
@smaslov-intel
Copy link
Contributor

This commit adds support for reqd_work_group_size in the PI CUDA backend
by extracting the attribute as program metadata. The program metadata
accompanies the binary when passed to the backend and it is up to the
backend if they extract any useful metadata. This adds two additional
parameters to piProgramCreateWithBinary for passing the program
metadata.

How is this attribute supported when program is created with SPIR-V?

@steffenlarsen
Copy link
Contributor Author

How is this attribute supported when program is created with SPIR-V?

I have not checked with L0, but OpenCL can both query information about reqd_work_group_size of a kernel using the PI_KERNEL_GROUP_INFO_COMPILE_WORK_GROUP_SIZE descriptor and will explicitly fail with CL_INVALID_WORK_GROUP_SIZE if the attribute restrictions are not met during launch.

PTX has reqntid (which reqd_work_group_size is mapped to with #3755), but the documentation is vague about the specific error behavior when it is not met. Likewise, the CUDA driver API does not expose a way of querying this information. This is the motivation for carrying metadata to the runtime, more or less.

In case either OpenCL or L0 may be in need of any similar program metadata, it should be as simple as just making sycl-post-link generate the information, append it to PropSet[llvm::util::PropertySetRegistry::SYCL_PROGRAM_METADATA] and flip the switch so that it is generated for their path. Then it should reach the backend through piProgramCreateWithBinary.

pvchupin
pvchupin previously approved these changes Jul 16, 2021
@pvchupin
Copy link
Contributor

@steffenlarsen, could you please resolve conflicts. I think everybody required reviewed and approved at least once.
@mdtoguchi, @AlexeySachkov, @kbobrovs, please comment if anything left.

@steffenlarsen steffenlarsen force-pushed the steffen/cuda_reqd_wg_size branch 2 times, most recently from 79632f4 to 93a35ad Compare July 16, 2021 10:31
@steffenlarsen
Copy link
Contributor Author

Thanks @pvchupin . Sorry for the double rebase. New conflicts snuck in while I was doing the first rebase. It should be ready now.

@bader
Copy link
Contributor

bader commented Jul 16, 2021

Ouch... @steffenlarsen, sorry, I seem to introduce another merge conflict. Could you take a look, please?

Steffen Larsen added 12 commits July 16, 2021 12:26
This commit adds support for reqd_work_group_size in the PI CUDA backend
by extracting the attribute as program metadata. The program metadata
accompanies the binary when passed to the backend and it is up to the
backend if they extract any useful metadata. This adds two additional
parameters to piProgramCreateWithBinary for passing the program
metadata.

Program metadata is transported as a properties created by
sycl-post-link, so this commit also changes the behaviour of the NVPTX
path for linkage actions leading to the offload wrapper. These changes
uses file tables for the NVPTX path as well to allow generation and
preservation of properties. This assumes that the file table only ever
contains a single row if taking the NVPTX path and will fail otherwise.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
…consistent

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
@steffenlarsen
Copy link
Contributor Author

Ouch... @steffenlarsen, sorry, I seem to introduce another merge conflict. Could you take a look, please?

Haha, no worries! It has been taken care of. Thanks for letting me know.

Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver, doc, file-table-tform LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants