Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] thinLTO for SYCL #15083

Open
wants to merge 10 commits into
base: sycl
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11231,8 +11231,12 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,

bool IsUsingLTO = D.isUsingLTO(/*IsDeviceOffloadAction=*/true);
auto LTOMode = D.getLTOMode(/*IsDeviceOffloadAction=*/true);
if (IsUsingLTO && LTOMode == LTOK_Thin)
if (IsUsingLTO && LTOMode == LTOK_Thin) {
CmdArgs.push_back(Args.MakeArgString("-sycl-thin-lto"));
// TODO: Pass the same value for this argument once we start using it
// for non-thinLTO.
CmdArgs.push_back(Args.MakeArgString("-sycl-module-split-mode=auto"));
sarnex marked this conversation as resolved.
Show resolved Hide resolved
}

if (Args.hasArg(options::OPT_fsycl_embed_ir))
CmdArgs.push_back(Args.MakeArgString("-sycl-embed-ir"));
Expand Down
1 change: 1 addition & 0 deletions clang/tools/clang-linker-wrapper/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
set(LLVM_LINK_COMPONENTS
${LLVM_TARGETS_TO_BUILD}
BitReader
BitWriter
Core
BinaryFormat
Expand Down
510 changes: 402 additions & 108 deletions clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions libdevice/device.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@

#if defined(__SPIR__) || defined(__SPIRV__) || defined(__NVPTX__)
#ifdef __SYCL_DEVICE_ONLY__
#define DEVICE_EXTERNAL SYCL_EXTERNAL __attribute__((weak))
#define DEVICE_EXTERNAL SYCL_EXTERNAL
#else // __SYCL_DEVICE_ONLY__
#define DEVICE_EXTERNAL __attribute__((weak))
#define DEVICE_EXTERNAL
#endif // __SYCL_DEVICE_ONLY__
Comment on lines +20 to 23
Copy link
Contributor Author

@sarnex sarnex Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to get libdevice functions linked in by the thinLTO function importing infrastructure, see here. I'm looking for a better solution for this, I just kept this here in case anybody plans on trying the prototype.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose importing devicelib symbols at compile step can be a solution (see #15114).

On the other hand, I recall discussing the possibility of linking device libraries with upstream maintainers, who expressed a preference for shifting device library linking from the "compile" to the "link" step. It would be ideal if we could discover a solution that aligns with the long-term strategy of upstream and enables us to utilize the thinLTO framework for offload code linking, thereby avoiding the use of weak symbols.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention to @mdtoguchi who has been looking at importing devicelib at compile step from the SYCL perspective.
Point to note: During one of the LLVM community presentation, it was mentioned that they are trying to move importing devicelib to link time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already perform device library linking at link time we can consider abandoning the efforts to pull them into the compilation step. My main concern with performing at the link step is the communication required from the driver to the clang-linker-wrapper informing which device libraries should be linked. The less tie-in we have between the driver and the clang-linker-wrapper at link time, the better. IMO, at the very least the linker wrapper should know a minimum default device libraries to link and any communication from the driver is manipulating that list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mdtoguchi. Unless user wants to change the names/location or disable linking of device libraries, driver should not have any logic to handle device code linking other than invoking clang-linker-wrapper. It makes sense to have driver options for additional configuration of device libraries, but driver's implementation should be just passing corresponding values to clang-linker-wrapper where these options should be processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very interesting.
While working on #15114 I've been wondering whether there is a particular reason why we link against CUDA libdevice and libclc in the compile step, but also again in the link step.
Could I get some clarification on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very interesting. While working on #15114 I've been wondering whether there is a particular reason why we link against CUDA libdevice and libclc in the compile step, but also again in the link step. Could I get some clarification on that?

@Naghasan, @npmiller, are you able to help here?


#define DEVICE_EXTERN_C DEVICE_EXTERNAL EXTERN_C
Expand Down
11 changes: 11 additions & 0 deletions llvm/include/llvm/Object/OffloadBinary.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,14 @@ class OffloadBinary : public Binary {

StringRef getString(StringRef Key) const { return StringData.lookup(Key); }

/// XXX: Hack
const SmallVectorImpl<std::string> &getTmpStrings() const {
return TmpStringData;
}

/// XXX: Hack
void addTmpString(std::string Value) { TmpStringData.push_back(Value); }

static bool classof(const Binary *V) { return V->isOffloadFile(); }

struct Header {
Expand Down Expand Up @@ -151,6 +159,9 @@ class OffloadBinary : public Binary {
const Header *TheHeader;
/// Location of the metadata entries within the binary.
const Entry *TheEntry;

/// XXX: Hack
SmallVector<std::string, 8> TmpStringData;
};

/// A class to contain the binary information for a single OffloadBinary that
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/SYCLLowerIR/ModuleSplitter.h
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ splitSYCLModule(std::unique_ptr<Module> M, ModuleSplitterSettings Settings);

bool isESIMDFunction(const Function &F);
bool canBeImportedFunction(const Function &F);
bool isEntryPoint(const Function &F, bool EmitOnlyKernelsAsEntryPoints);

} // namespace module_split

Expand Down
4 changes: 2 additions & 2 deletions llvm/lib/LTO/LTO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1077,8 +1077,8 @@ Error LTO::addThinLTO(BitcodeModule BM, ArrayRef<InputFile::Symbol> Syms,
for (const std::string &Name : Conf.ThinLTOModulesToCompile) {
if (BM.getModuleIdentifier().contains(Name)) {
ThinLTO.ModulesToCompile->insert({BM.getModuleIdentifier(), BM});
llvm::errs() << "[ThinLTO] Selecting " << BM.getModuleIdentifier()
<< " to compile\n";
LLVM_DEBUG(dbgs() << "[ThinLTO] Selecting " << BM.getModuleIdentifier()
sarnex marked this conversation as resolved.
Show resolved Hide resolved
<< " to compile\n");
}
}
}
Expand Down
52 changes: 26 additions & 26 deletions llvm/lib/SYCLLowerIR/ModuleSplitter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,32 +117,6 @@ bool isKernel(const Function &F) {
F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
}

bool isEntryPoint(const Function &F, bool EmitOnlyKernelsAsEntryPoints) {
// Skip declarations, if any: they should not be included into a vector of
// entry points groups or otherwise we will end up with incorrectly generated
// list of symbols.
if (F.isDeclaration())
return false;

// Kernels are always considered to be entry points
if (isKernel(F))
return true;

if (!EmitOnlyKernelsAsEntryPoints) {
// If not disabled, SYCL_EXTERNAL functions with sycl-module-id attribute
// are also considered as entry points (except __spirv_* and __sycl_*
// functions)
return llvm::sycl::utils::isSYCLExternalFunction(&F) &&
!isSpirvSyclBuiltin(F.getName()) && !isESIMDBuiltin(F.getName()) &&
!isGenericBuiltin(F.getName());
}

// Even if we are emitting only kernels as entry points, virtual functions
// should still be treated as entry points, because they are going to be
// outlined into separate device images and linked in later.
return F.hasFnAttribute("indirectly-callable");
}

// Represents "dependency" or "use" graph of global objects (functions and
// global variables) in a module. It is used during device code split to
// understand which global variables and functions (other than entry points)
Expand Down Expand Up @@ -445,6 +419,32 @@ class ModuleSplitter : public ModuleSplitterBase {
namespace llvm {
namespace module_split {

bool isEntryPoint(const Function &F, bool EmitOnlyKernelsAsEntryPoints) {
// Skip declarations, if any: they should not be included into a vector of
// entry points groups or otherwise we will end up with incorrectly generated
// list of symbols.
if (F.isDeclaration())
return false;

// Kernels are always considered to be entry points
if (isKernel(F))
return true;

if (!EmitOnlyKernelsAsEntryPoints) {
// If not disabled, SYCL_EXTERNAL functions with sycl-module-id attribute
// are also considered as entry points (except __spirv_* and __sycl_*
// functions)
return llvm::sycl::utils::isSYCLExternalFunction(&F) &&
!isSpirvSyclBuiltin(F.getName()) && !isESIMDBuiltin(F.getName()) &&
!isGenericBuiltin(F.getName());
}

// Even if we are emitting only kernels as entry points, virtual functions
// should still be treated as entry points, because they are going to be
// outlined into separate device images and linked in later.
return F.hasFnAttribute("indirectly-callable");
}

std::optional<IRSplitMode> convertStringToSplitMode(StringRef S) {
static const StringMap<IRSplitMode> Values = {{"kernel", SPLIT_PER_KERNEL},
{"source", SPLIT_PER_TU},
Expand Down
Loading