Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Add device code split feature design #631

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions sycl/doc/SYCLCompilerAndRuntimeDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,51 @@ llvm-no-spir-kernel host.bc

It returns 0 if no kernels are present and 1 otherwise.

#### Device code split

Putting all device code into a single SPIRV module does not work well in the
following cases:
1. There are thousands of kernels defined and only small part of them is used at
run-time. Having them all in one SPIR-V module significantly increases JIT time.
2. Device code can be specialized for different devices. For example, kernels
that are supposed to be executed only on FPGA can use extensions avaliable for
FPGA only. This will cause JIT compilation failure on other devices even if this
particular kernel is never called on them.
bader marked this conversation as resolved.
Show resolved Hide resolved

To resolve these problems the compiler can split a single module into smaller
ones. The following features is supported:
* Emitting a separate module for source (translation unit)
* Emitting a separate module for each kernel

The current approach is:
* Generate special meta-data with translation unit ID for each kernel in SYCL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this ID is used later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ID will be used to group kernels per translation unit.
But I think it can be better clarified in this document, I'll made a change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

front-end. This ID will be used to group kernels on per-translation unit basis
* Link all device LLVM modules using llvm-link
* Perform split on a fully linked module
* Generate a symbol table (list of kernels) for each produced device module for
proper module selection in runtime
* Perform SPIR-V translation and AOT compilation (if requested) on each produced
module separately
* Add information about presented kernels to a wrappring object for each device
image

Device code splitting process:
![Device code splitting](images/DeviceCodeSplit.svg)

The "split" box is implemented as functionality of the dedicated tool
`sycl-post-link`. The tool runs a set of LLVM passes to split input module and
generates a symbol table (list of kernels) for each produced device module.

To enable device code split, a special option must be passed to the clang
driver:

`-fsycl-device-code-split=<value>`

There are three possible values for this option:
* `per_source` - enables emitting a separate module for each source (translation
unit)
* `per_kernel` - enables emitting a separate module for each kernel
* `off` - disables device code split

### Integration with SPIR-V format

Expand Down
Loading