-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL]Enable Dead Function Elimination in sycl-post-link #2723
Conversation
Signed-off-by: gejin <ge.jin@intel.com>
/summary:run |
Signed-off-by: gejin <ge.jin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any significant impact to compile time due to the additional pass?
Also, please add a driver test.
// Eliminate 'dead' functions which are not called in device LLVM IR module, | ||
// there is one execption: functions with 'reference-indirectly' attribute | ||
// can't be eliminated since they will be called indirectly via function ptr. | ||
static void eliminateDeadFunctions(Module &M) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code duplicated functionality LLVM provides with GlobalDCEPass. I suggest re-using LLVM pass to avoid unnecessary maintenance.
// DeadFunctionElimination must work with IROutputOnly to clean the | ||
// original LLVMIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment: I'm sure I understand the first comment in this scope.
// single file output requested - this means only perform necessary IR
// transformations (like specialization constant intrinsic lowering) and
// output LLVMIR
I think opt
is the right tool to do LLVM IR transformations. Adding "DeadFunctionElimination" functionality to sycl-post-link
tool seems like a bad idea as it duplicates already existing opt
functionality.
I think we should remove "IR transformations" functionality from the sycl-post-link
tool and use dedicated opt
tool instead. I don't have specific instructions how to re-write the code, but we should work in this direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bader
I agree that moving the "DeadFunctionElimination" functionality from sycl-post-link to llvm opt is a better idea. This requires us to enable opt in compiler package, opt tool is not generated in SYCL compiler package currently.
However, existing llvm "GlobalDCE" pass can't meet our requirements although it is much more complicated. The reason is existing llvm "GlobalDCE" has different pre-assumption from SYCL device code.
I dumped device code (LLVM IR .bc format) with all device libraries being linked in and used following command to try to eliminate "dead" code:
opt -globaldce redudant.bc -o reduced.bc
However, the redundant functions from device libraries were not eliminated at all. The reason is all "external" linkage functions who has a function body in current module will be marked as "Alive" and won't be eliminated: https://github.com/intel/llvm/blob/sycl/llvm/lib/Transforms/IPO/GlobalDCE.cpp#L321
The functions from device libraries are "external" and have function body, so they will survive in "GlobalDCE".
The pre-assumption "GlobalDCE" is based on seems to make sense as the "GlobalDCE" pass doesn't know whether current module is completely linked or not, the exported functions may be used if current module linked with other modules later. In our scenario, "DeadFunctionElimination" will work after llvm-link finishes linking everything including device libraries and its input is a completely linked device image(In spv online link mode, some "_devicelib* references will be handled by sycl runtime") which means all SYCL functions who has a function body will not be used by other modules later. And the logic of "DeadFunctionsElimination" is much simpler than existing "GlobalDCE", we only need to go through all spirv functions and if the function doesn't have any "user" in current module, we can remove it. The only exception are functions with "referenced-indirectly" attribute, we need to keep them to support device function pointers.
Existing llvm "GlobalDCE" also includes something which are not necessary in SYCL device scenario such as handling virtual functions, scanning vtable load, handling ifuncs...
So, in order to enable "DeadFunctionElimination" in sycl toolchain, we may have following ways:
- Implementing "DeadFunctionElimination" in sycl-post.cpp which seems not to be a good idea
- Splitting the "DeadFunctionElimination" into a new pass such as "SYCLDeadFunctionEliminationPass" and run the pass in sycl-post-link tool which is the same way as "SpecConstantsPass"
- Modifying the existing llvm "GlobalDCE" pass and make it to meet our requirement for SYCL. It seems that we may need to modify much community code and introduce some sycl specific code.
- We add a "SYCLGlobalDCE" and don't touch existing community's "GlobalDCE", the "SYCLGlobalDCE" is much simpler. This is similar to "sycl dead argument elimination".
In 3 and 4, we need to enable opt after completely linked device image is generate by llvm-link.
Which one do you prefer or do you have any other good ideas?
Thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to just have a small pass cleaning up the code first before using plain GlobalDCE?
Similar to for example https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLKernelFilter.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @keryell
It seems to be a different scenario. In compilation phase, we can run "GlobalDCE" to each module no matter it is "completely linked" or not.
"DeadFunctionElimination" introduced by this patch aims to eliminate SPIRV functions which are not called in device code(mainly for cleaning up unused functions introduced by device libraries), so its input must be a "completely linked" device image. We can only get such "completely linked" device image when llvm-link finishes linking everything.
But I think moving the functionality to a small SYCL pass should be the right direction. At least, we can make sycl-post-link.cpp more clean by doing so.
Thanks very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think opt is the right tool to do LLVM IR transformations. Adding "DeadFunctionElimination" functionality to sycl-post-link tool seems like a bad idea as it duplicates already existing opt functionality.
I think we should remove "IR transformations" functionality from the sycl-post-link tool and use dedicated opt tool instead. I don't have specific instructions how to re-write the code, but we should work in this direction.
I agree with @bader here. Originally the post-link tool's purpose was to handle device code splitting. Then spec constants handling was added as it depended on the splitting. Now it turns more and more into an LTO driver. This is not the purpose of the tool.
I'm not sure if we can or it makes sense to reuse the LLVM LTO infra for our device code - maybe @andykaylor has a perspective. To me using opt
-based Driver job to run necessary LTO passes like the GlobalDCE before handing off the optimized IR to SYCL post-link is the right way to go at this point. Later we can decide to reuse LTO infra and adjust accordingly. Linking with default device libraries should also be factored out of the post-link into a driver job, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @kbobrovs
Thank you for suggestion.
Currently, we have enabled functionality to analyze device code to get required device libraries(spv file) in sycl-post-link tool too which is not the purpose of sycl-post-link tool either.
I am not sure whether we can enable llvm "opt" tools in device compilation workflow, if "opt" can be enabled, we can put "dead function elimination" or "device code device library requirement analysis" into llvm passes and run them via opt offline. This will require some driver work, we need to add "opt" in driver workflow but will make sycl-post-link clean.
Close this PR now since we have a better way: #2783. |
…led (#2723) This change allows to preserve the correct builtin mangling in reverse translation. Original commit: KhronosGroup/SPIRV-LLVM-Translator@33150bf7b08a0d4
Signed-off-by: gejin ge.jin@intel.com
We have decided to split #2277 into 2 parts:
Currently, part1 has been finished, we have linked all NON-fp64 device libraries with user's device code in offline compilation phase. However, this will lead to code redundancy in final device image since we will pull everything from device libraries into device image regardless of whether those device libraries functions are really used or not.
We expect the related perf impact will be obvious to some application with "small" kernel. To address this problem, we added "DeadFunctionElimination" in offline compilation, the procedure is:
// normal device compilation
........
// llvm-link link user's device image and sycl device libraries by default.
llvm-link user-device-image.bc libsycl-cmath.bc libsycl-complex.bc libsycl-crt.bc... -o final-device-image.bc
// Use sycl-post-link to eliminate "dead" functions in final-device-image.bc
sycl-post-link final-device-image.bc --dead-function-eliminate .... -o reduced-device-image.o
//other build steps including sycl-post-link(code-split, spec-const...), llvm-spirv translation....