Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add SYCL clang frontend doc #177

Merged
merged 3 commits into from
Jun 9, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 101 additions & 1 deletion sycl/doc/SYCL_compiler_and_runtime_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Device compiler is further split into the following major components:

- **Front-end** - parses input source, outlines "device part" of the code,
applies additional restrictions on the device code (e.g. no exceptions or
virtual calls), generated LLVM IR for the device code only and "integration
virtual calls), generates LLVM IR for the device code only and "integration
header" which provides information like kernel name, parameters order and data
type for the runtime library.
- **Middle-end** - transforms the initial LLVM IR* to get consumed by the
Expand All @@ -42,6 +42,106 @@ back-end. Today middle-end transformations include just a couple of passes:
- **Back-end** - produces native "device" code in ahead-of-time compilation
mode.

### SYCL support in Clang front-end

SYCL support in Clang front-end can be split into the following components:

- Device code outlining. This component is responsible for identifying and
outlining "device code" in the single source.

- SYCL kernel function object (functor or lambda) lowering. This component
creates an OpenCL kernel function interface for SYCL kernels.

- Device code diagnostics. This component enforces language restrictions on
device code.

- Integration header generation. This component emits information required for
binding host and device parts of the SYCL code via OpenCL API.

#### Device code outlining

Here is a code example of a SYCL program that demonstrates compiler outlining
work:

```C++
int foo(int x) { return ++x; }
int bar(int x) { throw std::exception{"CPU code only!"}; }
...
using namespace cl::sycl;
queue Q;
buffer<int, 1> a{range<1>{1024}};
Q.submit([&](handler& cgh) {
auto A = a.get_access<access::mode::write>(cgh);
cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
A[index] = index[0] * 2 + index[1] + foo(42);
});
}
...
```

In this example, the SYCL compiler needs to compile the lambda expression passed
to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
called from the lambda expression for the device.
The compiler must also ignore the `bar` function when we compile the
"device" part of the single source code, as it's unused inside the device
portion of the source code (the contents of the lambda expression passed to the
`cl::sycl::handler::parallel_for` and any function called from this lambda
expression).

The current approach is to use the SYCL kernel attribute in the SYCL runtime to
mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
The SYCL runtime library can't mark foo as "device" code - this is a compiler
job: to traverse all symbols accessible from kernel functions and add them to
the "device part" of the code marking them with the new SYCL device attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify the requirements we put on the function, which SYCL runtime can mark with sycl_kernel attribute.
I can remember at least these off the top of my head:

  • Must be template function with at least two template parameters.
    • First parameter must represent "unique kernel name"
    • Second parameter must be the function object type
  • Must have only one function parameter.
    • Parameter type must be second template parameter
  template <typename KernelName, typename KernelType/*, ...*/>
  __attribute__((sycl_kernel)) void sycl_kernel_function(KernelType KernelFuncObj) {
    KernelFuncObj();
  }

Other requirements can be extracted from the SemaSYCL.cpp.
It's some informal API, which we should document somewhere.
Maybe it would be better to document in comments. What do you think?

Copy link
Contributor Author

@Fznamznon Fznamznon Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add some documentation to comments and this document both. But IMO requirements for function marked with sycl_kernel attribute aren't related to the "device code oulining" feature but related to generation of kernel wrapper from this function. So I suggest adding it to the "SYCL kernel function object (functor or lambda) lowering" section.

#### Lowering of lambda function objects and named function objects

All SYCL memory objects shared between host and device (buffers/images,
these objects map to OpenCL buffers and images) must be accessed through special
`accessor` classes. The "device" side implementation of these classes contain
pointers to the device memory. As there is no way in OpenCL to pass structures
with pointers inside as kernel arguments all memory objects shared between host
and device must be passed to the kernel as raw pointers.
SYCL also has a special mechanism for passing kernel arguments from host to
the device. In OpenCL you need to call `clSetKernelArg`, in SYCL all the
kernel arguments are captures/fields of lambda/functor SYCL functions for
invoking kernels (such as `parallel_for`). For example, in the previous code
snippet above `accessor` `A` is one such captured kernel argument.

To facilitate the mapping of the captures/fields of lambdas/functors to OpenCL
Fznamznon marked this conversation as resolved.
Show resolved Hide resolved
kernel and overcome OpenCL limitations we added the generation of a "kernel
wrapper" function inside the compiler. A "kernel wrapper" function contains the
body of the SYCL kernel function, receives OpenCL like parameters and
additionally does some manipulation to initialize captured lambda/functor fields
with these parameters. In some pseudo code the "kernel wrapper" for the previous
code snippet above looks like this:

```C++

// Let the lambda expression passed to the parallel_for declare unnamed
// function object with "Lambda" type.

// SYCL kernel is defined in SYCL headers:
__attribute__((sycl_kernel)) someSYCLKernel(Lambda lambda) {
lambda();
}

// Kernel wrapper
__kernel wrapper(global int* a) {
Lambda lambda; // Actually lambda declaration doesn't have a name in AST
// Let the lambda have one captured field - accessor A. We need to init it
// with global pointer from arguments:
lambda.A.__init(a);
// Body of SYCL kernel from SYCL headers:
{
lambda();
}
}

```

"Kernel wrapper" is generated by the compiler inside the Sema using AST nodes.

### SYCL support in the driver

SYCL offload support in the driver is based on the clang driver concepts and
Expand Down