Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL]Enable Dead Function Elimination in sycl-post-link #2723

Closed
wants to merge 2 commits into from

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented Nov 3, 2020

Signed-off-by: gejin ge.jin@intel.com
We have decided to split #2277 into 2 parts:

  1. device libraries link by default
  2. optimization in sycl-post-link to eliminate "dead"(unused) functions in device image which are introduced by device library default link feature.

Currently, part1 has been finished, we have linked all NON-fp64 device libraries with user's device code in offline compilation phase. However, this will lead to code redundancy in final device image since we will pull everything from device libraries into device image regardless of whether those device libraries functions are really used or not.
We expect the related perf impact will be obvious to some application with "small" kernel. To address this problem, we added "DeadFunctionElimination" in offline compilation, the procedure is:
// normal device compilation
........
// llvm-link link user's device image and sycl device libraries by default.
llvm-link user-device-image.bc libsycl-cmath.bc libsycl-complex.bc libsycl-crt.bc... -o final-device-image.bc
// Use sycl-post-link to eliminate "dead" functions in final-device-image.bc
sycl-post-link final-device-image.bc --dead-function-eliminate .... -o reduced-device-image.o
//other build steps including sycl-post-link(code-split, spec-const...), llvm-spirv translation....

Signed-off-by: gejin <ge.jin@intel.com>
@jinge90
Copy link
Contributor Author

jinge90 commented Nov 3, 2020

/summary:run

@jinge90 jinge90 requested a review from asavonic November 3, 2020 16:00
@jinge90 jinge90 changed the title Enable Dead Function Elimination in sycl-post-link [SYCL]Enable Dead Function Elimination in sycl-post-link Nov 4, 2020
Signed-off-by: gejin <ge.jin@intel.com>
Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any significant impact to compile time due to the additional pass?

Also, please add a driver test.

// Eliminate 'dead' functions which are not called in device LLVM IR module,
// there is one execption: functions with 'reference-indirectly' attribute
// can't be eliminated since they will be called indirectly via function ptr.
static void eliminateDeadFunctions(Module &M) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code duplicated functionality LLVM provides with GlobalDCEPass. I suggest re-using LLVM pass to avoid unnecessary maintenance.

Comment on lines +8051 to +8052
// DeadFunctionElimination must work with IROutputOnly to clean the
// original LLVMIR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: I'm sure I understand the first comment in this scope.

    // single file output requested - this means only perform necessary IR
    // transformations (like specialization constant intrinsic lowering) and
    // output LLVMIR

I think opt is the right tool to do LLVM IR transformations. Adding "DeadFunctionElimination" functionality to sycl-post-link tool seems like a bad idea as it duplicates already existing opt functionality.
I think we should remove "IR transformations" functionality from the sycl-post-link tool and use dedicated opt tool instead. I don't have specific instructions how to re-write the code, but we should work in this direction.

Copy link
Contributor Author

@jinge90 jinge90 Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bader
I agree that moving the "DeadFunctionElimination" functionality from sycl-post-link to llvm opt is a better idea. This requires us to enable opt in compiler package, opt tool is not generated in SYCL compiler package currently.
However, existing llvm "GlobalDCE" pass can't meet our requirements although it is much more complicated. The reason is existing llvm "GlobalDCE" has different pre-assumption from SYCL device code.
I dumped device code (LLVM IR .bc format) with all device libraries being linked in and used following command to try to eliminate "dead" code:
opt -globaldce redudant.bc -o reduced.bc
However, the redundant functions from device libraries were not eliminated at all. The reason is all "external" linkage functions who has a function body in current module will be marked as "Alive" and won't be eliminated: https://github.com/intel/llvm/blob/sycl/llvm/lib/Transforms/IPO/GlobalDCE.cpp#L321
The functions from device libraries are "external" and have function body, so they will survive in "GlobalDCE".
The pre-assumption "GlobalDCE" is based on seems to make sense as the "GlobalDCE" pass doesn't know whether current module is completely linked or not, the exported functions may be used if current module linked with other modules later. In our scenario, "DeadFunctionElimination" will work after llvm-link finishes linking everything including device libraries and its input is a completely linked device image(In spv online link mode, some "_devicelib* references will be handled by sycl runtime") which means all SYCL functions who has a function body will not be used by other modules later. And the logic of "DeadFunctionsElimination" is much simpler than existing "GlobalDCE", we only need to go through all spirv functions and if the function doesn't have any "user" in current module, we can remove it. The only exception are functions with "referenced-indirectly" attribute, we need to keep them to support device function pointers.
Existing llvm "GlobalDCE" also includes something which are not necessary in SYCL device scenario such as handling virtual functions, scanning vtable load, handling ifuncs...
So, in order to enable "DeadFunctionElimination" in sycl toolchain, we may have following ways:

  1. Implementing "DeadFunctionElimination" in sycl-post.cpp which seems not to be a good idea
  2. Splitting the "DeadFunctionElimination" into a new pass such as "SYCLDeadFunctionEliminationPass" and run the pass in sycl-post-link tool which is the same way as "SpecConstantsPass"
  3. Modifying the existing llvm "GlobalDCE" pass and make it to meet our requirement for SYCL. It seems that we may need to modify much community code and introduce some sycl specific code.
  4. We add a "SYCLGlobalDCE" and don't touch existing community's "GlobalDCE", the "SYCLGlobalDCE" is much simpler. This is similar to "sycl dead argument elimination".
    In 3 and 4, we need to enable opt after completely linked device image is generate by llvm-link.

Which one do you prefer or do you have any other good ideas?
Thanks very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to just have a small pass cleaning up the code first before using plain GlobalDCE?
Similar to for example https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLKernelFilter.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @keryell
It seems to be a different scenario. In compilation phase, we can run "GlobalDCE" to each module no matter it is "completely linked" or not.
"DeadFunctionElimination" introduced by this patch aims to eliminate SPIRV functions which are not called in device code(mainly for cleaning up unused functions introduced by device libraries), so its input must be a "completely linked" device image. We can only get such "completely linked" device image when llvm-link finishes linking everything.

But I think moving the functionality to a small SYCL pass should be the right direction. At least, we can make sycl-post-link.cpp more clean by doing so.
Thanks very much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think opt is the right tool to do LLVM IR transformations. Adding "DeadFunctionElimination" functionality to sycl-post-link tool seems like a bad idea as it duplicates already existing opt functionality.
I think we should remove "IR transformations" functionality from the sycl-post-link tool and use dedicated opt tool instead. I don't have specific instructions how to re-write the code, but we should work in this direction.

I agree with @bader here. Originally the post-link tool's purpose was to handle device code splitting. Then spec constants handling was added as it depended on the splitting. Now it turns more and more into an LTO driver. This is not the purpose of the tool.

I'm not sure if we can or it makes sense to reuse the LLVM LTO infra for our device code - maybe @andykaylor has a perspective. To me using opt-based Driver job to run necessary LTO passes like the GlobalDCE before handing off the optimized IR to SYCL post-link is the right way to go at this point. Later we can decide to reuse LTO infra and adjust accordingly. Linking with default device libraries should also be factored out of the post-link into a driver job, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kbobrovs
Thank you for suggestion.
Currently, we have enabled functionality to analyze device code to get required device libraries(spv file) in sycl-post-link tool too which is not the purpose of sycl-post-link tool either.
I am not sure whether we can enable llvm "opt" tools in device compilation workflow, if "opt" can be enabled, we can put "dead function elimination" or "device code device library requirement analysis" into llvm passes and run them via opt offline. This will require some driver work, we need to add "opt" in driver workflow but will make sycl-post-link clean.

@jinge90
Copy link
Contributor Author

jinge90 commented Nov 18, 2020

Close this PR now since we have a better way: #2783.

@jinge90 jinge90 closed this Nov 18, 2020
jsji pushed a commit that referenced this pull request Oct 10, 2024
…led (#2723)

This change allows to preserve the correct builtin mangling in reverse translation.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@33150bf7b08a0d4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants