Skip to content

Commit

Permalink
[SYCL] Implement OpenCL kernel function generation
Browse files Browse the repository at this point in the history
Summary:
All SYCL memory objects shared between host and device (buffers/images, these
objects map to OpenCL buffers and images) must be accessed through special
accessor classes. The "device" side implementation of these classes contain
pointers to the device memory. As there is no way in OpenCL to pass
structures with pointers inside as kernel arguments, all memory objects
shared between host and device must be passed to the kernel as raw
pointers. SYCL also has a special mechanism for passing kernel arguments
from host to the device. In OpenCL kernel arguments are set by calling
`clSetKernelArg` function for each kernel argument, meanwhile in SYCL all the
kernel arguments are fields of "SYCL kernel function" which can be defined
as a lambda function or a named function object and passed as an argument
to SYCL function for invoking kernels (such as `parallel_for` or `single_task`).

To facilitate the mapping of SYCL kernel data members to OpenCL kernel
arguments and overcome OpenCL limitations we added the generation of an
OpenCL kernel function inside the compiler. An OpenCL kernel function
contains the body of the SYCL kernel function, receives OpenCL-like
parameters and additionally does some manipulation to initialize SYCL
kernel data members with these parameters. In some pseudo code the OpenCL
kernel function can look like this:

```
// SYCL kernel is defined in SYCL headers:
template <typename KernelName, typename KernelType/*, ...*/>
__attribute__((sycl_kernel)) void sycl_kernel_function(KernelType KernelFuncObj) {
  // ...
  KernelFuncObj();
}

// Generated OpenCL kernel function
__kernel KernelName(global int* a) {
  KernelType KernelFuncObj; // Actually kernel function object declaration
  // doesn't have a name in AST.
  // Let the kernel function object have one captured field - accessor A.
  // We need to init it with global pointer from arguments:
  KernelFuncObj.A.__init(a);
  // Body of the SYCL kernel from SYCL headers:
  {
    KernelFuncObj();
  }
}
```
OpenCL kernel function is generated by the compiler inside the Sema
using AST nodes.

Reviewers: bader, Naghasan, ABataev, keryell

Subscribers: agozillon, mgorny, yaxunl, jfb, ebevhan, Anastasia, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71016
  • Loading branch information
Fznamznon authored and bader committed Nov 25, 2021
1 parent 371290d commit 7788782
Show file tree
Hide file tree
Showing 18 changed files with 1,010 additions and 18 deletions.
13 changes: 13 additions & 0 deletions clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -13116,6 +13116,19 @@ class Sema final {
/// Adds Callee to DeviceCallGraph if we don't know if its caller will be
/// codegen'ed yet.
bool checkSYCLDeviceFunction(SourceLocation Loc, FunctionDecl *Callee);

private:
/// Contains generated OpenCL kernel functions for SYCL.
SmallVector<Decl *, 4> SYCLKernels;

public:
void addSYCLKernel(Decl *D) { SYCLKernels.push_back(D); }
/// Access to SYCL kernels.
SmallVectorImpl<Decl *> &getSYCLKernels() { return SYCLKernels; }

/// Constructs an OpenCL kernel using the KernelCaller function and adds it to
/// the SYCL device code.
void constructOpenCLKernel(FunctionDecl *KernelCallerFunc, MangleContext &MC);
};

/// RAII object that enters a new expression evaluation context.
Expand Down
4 changes: 4 additions & 0 deletions clang/lib/AST/ASTContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11072,6 +11072,10 @@ bool ASTContext::DeclMustBeEmitted(const Decl *D) {
if (D->hasAttr<AliasAttr>() || D->hasAttr<UsedAttr>())
return true;

// If SYCL, only kernels are required.
if (LangOpts.SYCLIsDevice && !(D->hasAttr<OpenCLKernelAttr>()))
return false;

if (const auto *FD = dyn_cast<FunctionDecl>(D)) {
// Forward declarations aren't required.
if (!FD->doesThisDeclarationHaveABody())
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/CodeGen/CodeGenModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2960,6 +2960,12 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
}
}

if (LangOpts.SYCLIsDevice && Global->hasAttr<OpenCLKernelAttr>() &&
MustBeEmitted(Global)) {
addDeferredDeclToEmit(GD);
return;
}

// Ignore declarations, they will be emitted on their first use.
if (const auto *FD = dyn_cast<FunctionDecl>(Global)) {
// Forward declarations are emitted lazily on first use.
Expand Down
4 changes: 4 additions & 0 deletions clang/lib/Parse/ParseAST.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,10 @@ void clang::ParseAST(Sema &S, bool PrintStats, bool SkipFunctionBodies) {
for (Decl *D : S.WeakTopLevelDecls())
Consumer->HandleTopLevelDecl(DeclGroupRef(D));

if (S.getLangOpts().SYCLIsDevice)
for (Decl *D : S.getSYCLKernels())
Consumer->HandleTopLevelDecl(DeclGroupRef(D));

Consumer->HandleTranslationUnit(S.getASTContext());

// Finalize the template instantiation observer chain.
Expand Down
443 changes: 443 additions & 0 deletions clang/lib/Sema/SemaSYCL.cpp

Large diffs are not rendered by default.

17 changes: 14 additions & 3 deletions clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "clang/AST/DependentDiagnostic.h"
#include "clang/AST/Expr.h"
#include "clang/AST/ExprCXX.h"
#include "clang/AST/Mangle.h"
#include "clang/AST/PrettyDeclStackTrace.h"
#include "clang/AST/TypeLoc.h"
#include "clang/Basic/SourceManager.h"
Expand Down Expand Up @@ -6270,6 +6271,8 @@ NamedDecl *Sema::FindInstantiatedDecl(SourceLocation Loc, NamedDecl *D,
/// instantiations we have seen until this point.
void Sema::PerformPendingInstantiations(bool LocalOnly) {
std::deque<PendingImplicitInstantiation> delayedPCHInstantiations;
std::unique_ptr<MangleContext> MangleCtx(
getASTContext().createMangleContext());
while (!PendingLocalImplicitInstantiations.empty() ||
(!LocalOnly && !PendingInstantiations.empty())) {
PendingImplicitInstantiation Inst;
Expand All @@ -6288,17 +6291,25 @@ void Sema::PerformPendingInstantiations(bool LocalOnly) {
TSK_ExplicitInstantiationDefinition;
if (Function->isMultiVersion()) {
getASTContext().forEachMultiversionedFunctionVersion(
Function, [this, Inst, DefinitionRequired](FunctionDecl *CurFD) {
Function, [this, Inst, DefinitionRequired,
MangleCtx = move(MangleCtx)](FunctionDecl *CurFD) {
InstantiateFunctionDefinition(/*FIXME:*/ Inst.second, CurFD, true,
DefinitionRequired, true);
if (CurFD->isDefined())
if (CurFD->isDefined()) {
CurFD->setInstantiationIsPending(false);
if (getLangOpts().SYCLIsDevice &&
CurFD->hasAttr<SYCLKernelAttr>())
constructOpenCLKernel(CurFD, *MangleCtx);
}
});
} else {
InstantiateFunctionDefinition(/*FIXME:*/ Inst.second, Function, true,
DefinitionRequired, true);
if (Function->isDefined())
if (Function->isDefined()) {
if (getLangOpts().SYCLIsDevice && Function->hasAttr<SYCLKernelAttr>())
constructOpenCLKernel(Function, *MangleCtx);
Function->setInstantiationIsPending(false);
}
}
// Definition of a PCH-ed template declaration may be available only in the TU.
if (!LocalOnly && LangOpts.PCHInstantiateTemplates &&
Expand Down
86 changes: 86 additions & 0 deletions clang/test/CodeGenSYCL/Inputs/sycl.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#pragma once

inline namespace cl {
namespace sycl {
namespace access {

enum class target {
global_buffer = 2014,
constant_buffer,
local,
image,
host_buffer,
host_image,
image_array
};

enum class mode {
read = 1024,
write,
read_write,
discard_write,
discard_read_write,
atomic
};

enum class placeholder {
false_t,
true_t
};

enum class address_space : int {
private_space = 0,
global_space,
constant_space,
local_space
};
} // namespace access

template <int dim>
struct id {
template <typename... T>
id(T... args) {} // fake constructor
private:
// Some fake field added to see using of id arguments in the
// kernel wrapper
int Data;
};

template <int dim>
struct range {
template <typename... T>
range(T... args) {} // fake constructor
private:
// Some fake field added to see using of range arguments in the
// kernel wrapper
int Data;
};

template <int dim>
struct _ImplT {
range<dim> AccessRange;
range<dim> MemRange;
id<dim> Offset;
};

template <typename dataT, int dimensions, access::mode accessmode,
access::target accessTarget = access::target::global_buffer,
access::placeholder isPlaceholder = access::placeholder::false_t>
class accessor {

public:
void use(void) const {}
template <typename... T>
void use(T... args) {}
template <typename... T>
void use(T... args) const {}
_ImplT<dimensions> impl;

private:
void __init(__attribute__((opencl_global)) dataT *Ptr,
range<dimensions> AccessRange,
range<dimensions> MemRange, id<dimensions> Offset) {}
};

} // namespace sycl
} // namespace cl
14 changes: 13 additions & 1 deletion clang/test/CodeGenSYCL/address-space-conversions.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -I%S/Inputs -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
void bar(int &Data) {}
// CHECK-DAG: define{{.*}} spir_func void @[[RAW_REF:[a-zA-Z0-9_]+]](i32 addrspace(4)* align 4 dereferenceable(4) %
void bar2(int &Data) {}
Expand Down Expand Up @@ -136,3 +136,15 @@ void usages() {
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPU3AS3iEvT_(i32 addrspace(3)* %
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPU3AS0iEvT_(i32* %
// CHECK-DAG: define linkonce_odr spir_func void @_Z4tmplIPiEvT_(i32 addrspace(4)* %

#include "sycl.hpp"

int main() {
cl::sycl::queue Q;
Q.submit([&](cl::sycl::handler &cgh) {
cgh.single_task<class test_kernel>([=]() {
usages();
});
});
return 0;
}
22 changes: 19 additions & 3 deletions clang/test/CodeGenSYCL/address-space-deduction.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -I%S/Inputs -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s

// CHECK-LABEL: @_Z4testv(
// Validates SYCL deduction rules compliance.
// See clang/docs/SYCLSupport.rst#address-space-handling for the details.

// CHECK-LABEL: define {{.*}} @_Z4testv(
// CHECK-NEXT: entry:
// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
// CHECK-NEXT: [[PPTR:%.*]] = alloca i32 addrspace(4)*, align 8
Expand Down Expand Up @@ -87,7 +90,8 @@
// CHECK-NEXT: store i8 addrspace(4)* getelementptr inbounds ([21 x i8], [21 x i8] addrspace(4)* addrspacecast ([21 x i8] addrspace(1)* @.str.1 to [21 x i8] addrspace(4)*), i64 0, i64 0), i8 addrspace(4)* addrspace(4)* [[SELECT_STR_TRIVIAL2_ASCAST]], align 8
// CHECK-NEXT: ret void
//
void test() {
void test() {

static const int foo = 0x42;


Expand Down Expand Up @@ -127,3 +131,15 @@ void test() {
const char *select_str_trivial2 = false ? str : "Another hello world!";
(void)select_str_trivial2;
}

#include "sycl.hpp"

int main() {
cl::sycl::queue Q;
Q.submit([&](cl::sycl::handler &cgh) {
cgh.single_task<class test_kernel>([=]() {
test();
});
});
return 0;
}
32 changes: 22 additions & 10 deletions clang/test/CodeGenSYCL/address-space-mangling.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=SPIR
// RUN: %clang_cc1 -triple x86_64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=X86
// RUN: %clang_cc1 -I%S/Inputs -triple spir64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=SPIR
// RUN: %clang_cc1 -I%S/Inputs -triple x86_64 -fsycl-is-device -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=X86

// REQUIRES: x86-registered-target

Expand All @@ -8,15 +8,15 @@ void foo(__attribute__((opencl_local)) int *);
void foo(__attribute__((opencl_private)) int *);
void foo(int *);

// SPIR: declare spir_func void @_Z3fooPU3AS1i(i32 addrspace(1)*) #1
// SPIR: declare spir_func void @_Z3fooPU3AS3i(i32 addrspace(3)*) #1
// SPIR: declare spir_func void @_Z3fooPU3AS0i(i32*) #1
// SPIR: declare spir_func void @_Z3fooPi(i32 addrspace(4)*) #1
// SPIR: declare spir_func void @_Z3fooPU3AS1i(i32 addrspace(1)*)
// SPIR: declare spir_func void @_Z3fooPU3AS3i(i32 addrspace(3)*)
// SPIR: declare spir_func void @_Z3fooPU3AS0i(i32*)
// SPIR: declare spir_func void @_Z3fooPi(i32 addrspace(4)*)

// X86: declare void @_Z3fooPU8SYglobali(i32*) #1
// X86: declare void @_Z3fooPU7SYlocali(i32*) #1
// X86: declare void @_Z3fooPU9SYprivatei(i32*) #1
// X86: declare void @_Z3fooPi(i32*) #1
// X86: declare void @_Z3fooPU8SYglobali(i32*)
// X86: declare void @_Z3fooPU7SYlocali(i32*)
// X86: declare void @_Z3fooPU9SYprivatei(i32*)
// X86: declare void @_Z3fooPi(i32*)

void test() {
__attribute__((opencl_global)) int *glob;
Expand All @@ -28,3 +28,15 @@ void test() {
foo(priv);
foo(def);
}

#include "sycl.hpp"

int main() {
cl::sycl::queue Q;
Q.submit([&](cl::sycl::handler &cgh) {
cgh.single_task<class test_kernel>([=]() {
test();
});
});
return 0;
}
57 changes: 57 additions & 0 deletions clang/test/CodeGenSYCL/basic-kernel-wrapper.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// RUN: %clang_cc1 -fsycl-is-device -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s

// This test checks that compiler generates correct kernel wrapper for basic
// case.

#include "Inputs/sycl.hpp"

template <typename name, typename Func>
__attribute__((sycl_kernel)) void kernel(Func kernelFunc) {
kernelFunc();
}

int main() {
cl::sycl::accessor<int, 1, cl::sycl::access::mode::read_write> accessorA;
kernel<class kernel_function>(
[=]() {
accessorA.use();
});
return 0;
}

// CHECK: define {{.*}}spir_kernel void @{{.*}}kernel_function
// CHECK-SAME: i32 addrspace(1)* [[MEM_ARG:%[a-zA-Z0-9_]+]],
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[ACC_RANGE:%[a-zA-Z0-9_]+_1]],
// CHECK-SAME: %"struct.cl::sycl::range"* byval{{.*}}align 4 [[MEM_RANGE:%[a-zA-Z0-9_]+_2]],
// CHECK-SAME: %"struct.cl::sycl::id"* byval{{.*}}align 4 [[OFFSET:%[a-zA-Z0-9_]+]])
// Check alloca for pointer argument
// CHECK: [[MEM_ARG]].addr = alloca i32 addrspace(1)*
// CHECK: [[ANONALLOCA:%[0-9]+]] = alloca %class.anon
// CHECK: [[ARANGEA:%agg.tmp.*]] = alloca %"struct.cl::sycl::range"
// CHECK: [[MRANGEA:%agg.tmp.*]] = alloca %"struct.cl::sycl::range"
// CHECK: [[OIDA:%agg.tmp.*]] = alloca %"struct.cl::sycl::id"
// CHECK: [[ANON:%[0-9]+]] = addrspacecast %class.anon* [[ANONALLOCA]] to %class.anon addrspace(4)*
// CHECK: [[ARANGET:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::range"* [[ARANGEA]] to %"struct.cl::sycl::range" addrspace(4)*
// CHECK: [[MRANGET:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::range"* [[MRANGEA]] to %"struct.cl::sycl::range" addrspace(4)*
// CHECK: [[OIDT:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::id"* [[OIDA]] to %"struct.cl::sycl::id" addrspace(4)*
//
// Check store of kernel pointer argument to alloca
// CHECK: store i32 addrspace(1)* [[MEM_ARG]], i32 addrspace(1)* addrspace(4)* [[MEM_ARG]].addr.ascast, align 8

// Check for default constructor of accessor
// CHECK: call spir_func {{.*}}accessor

// Check accessor GEP
// CHECK: [[ACCESSOR:%[a-zA-Z0-9_]+]] = getelementptr inbounds %class.anon, %class.anon addrspace(4)* [[ANON]], i32 0, i32 0

// Check load from kernel pointer argument alloca
// CHECK: [[MEM_LOAD:%[a-zA-Z0-9_]+]] = load i32 addrspace(1)*, i32 addrspace(1)* addrspace(4)* [[MEM_ARG]].addr.ascast

// Check accessor __init method call
// CHECK: [[ARANGE:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::range" addrspace(4)* [[ARANGET]] to %"struct.cl::sycl::range"*
// CHECK: [[MRANGE:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::range" addrspace(4)* [[MRANGET]] to %"struct.cl::sycl::range"*
// CHECK: [[OID:%agg.tmp.*]] = addrspacecast %"struct.cl::sycl::id" addrspace(4)* [[OIDT]] to %"struct.cl::sycl::id"*
// CHECK: call spir_func void @{{.*}}__init{{.*}}(%"class.cl::sycl::accessor" addrspace(4)* {{[^,]*}} [[ACCESSOR]], i32 addrspace(1)* [[MEM_LOAD]], %"struct.cl::sycl::range"* byval({{.*}}) align 4 [[ARANGE]], %"struct.cl::sycl::range"* byval({{.*}}) align 4 [[MRANGE]], %"struct.cl::sycl::id"* byval({{.*}}) align 4 [[OID]])

// Check lambda "()" operator call
// CHECK: call spir_func void @{{.*}}(%class.anon addrspace(4)* {{[^,]*}})
41 changes: 41 additions & 0 deletions clang/test/CodeGenSYCL/device-functions.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// RUN: %clang_cc1 -triple spir64 -fsycl-is-device -S -emit-llvm %s -o - | FileCheck %s

template <typename T>
T bar(T arg);

void foo() {
int a = 1 + 1 + bar(1);
}

template <typename T>
T bar(T arg) {
return arg;
}

template <typename name, typename Func>
__attribute__((sycl_kernel)) void kernel_single_task(Func kernelFunc) {
kernelFunc();
}

// Make sure that definitions for the types not used in SYCL kernels are not
// emitted
// CHECK-NOT: %struct.A
// CHECK-NOT: @a = {{.*}} %struct.A
struct A {
int x = 10;
} a;

int main() {
a.x = 8;
kernel_single_task<class test_kernel>([]() { foo(); });
return 0;
}

// baz is not called from the SYCL kernel, so it must not be emitted
// CHECK-NOT: define {{.*}} @{{.*}}baz
void baz() {}

// CHECK-LABEL: define dso_local spir_kernel void @{{.*}}test_kernel
// CHECK-LABEL: define internal spir_func void @_ZZ4mainENKUlvE_clEv(%class.anon addrspace(4)* {{.*}}%this)
// CHECK-LABEL: define dso_local spir_func void @{{.*}}foo
// CHECK-LABEL: define linkonce_odr spir_func i32 @{{.*}}bar
Loading

0 comments on commit 7788782

Please sign in to comment.