Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported calling convention: AMDGPU_KERNEL encountered when codegen some amdgpu (opencl) bitcode #60313

Open
littlewu2508 opened this issue Jan 26, 2023 · 8 comments
Assignees

Comments

@littlewu2508
Copy link

I'm not sure whether this is a bug or a missing feature, but please look at https://github.com/littlewu2508/LLVMAMDGPUcodegenbug, which is the reproducing method.

The issue is, during codegen phase, some simple opencl program meets "Unsupported calling convention: AMDGPU_KERNEL", while using the ROCm-forked llvm, the issue does not exists. Using gdb, I found that with vanilla llvm, one of the instructions I in

has OpCode=56 and SubclassData=365, which leads to 91 (CallingConv::AMDGPU_KERNEL) in followed steps (LowerCallTo, LowerCall, CCAssignFnForCall, etc), and finally get Unsupported calling convention issue. Gdb trace:
llvm-14.0.5-dbg-cmdline.log.gz

For ROCm forked llvm, there's no issue. My guess is that source1.cl is calling amdgpu kernel functions, but somehow it no I with 91 (CallingConv::AMDGPU_KERNEL) appearing, only 0 (CallingConv::C) is seen, so no unsupported calling convention issues (gdb trace:
ROCm-llvm-5.1.3-dbg-cmdline.log.gz). But I failed to find out which change they use to get things worked, due to lack of compiler knowledge, and inability to bisect ROCm forked llvm (encountered many build issues when bisecting).

Forgive me of using llvm-14 because I built the debug version of llvm/clang to start investigating th issue half a year earlier (original issue: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45). I can confirm that this situation still remains now for both llvm-15.0.7 and llvm-16

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 26, 2023

@llvm/issue-subscribers-backend-amdgpu

@littlewu2508
Copy link
Author

BTW, replacing triple from amdgpu into x86-64, there's no issue of-course

--- a/run.sh
+++ b/run.sh
@@ -1,7 +1,7 @@
 #!/bin/bash -x

-AMDGPU_ARCH=gfx803
-TRIPLE=amdgcn-amd-amdhsa
+AMDGPU_ARCH=x86-64
+TRIPLE=x86_64-pc-linux-gnu

and execute ./run.sh

@arsenm
Copy link
Contributor

arsenm commented Jan 26, 2023

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

@littlewu2508
Copy link
Author

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Thanks for the explanation!

I'd like to know the hack refer to which commit(s)? Since I'm packaging ROCm stack, if users met such problems they can simply apply that patch upon common llvm releases as a temporary workaround.

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

Understood. So, is there any plan for ROCm stack to remove such kind of unsupported call in GPU codes? Because when distribution (Debian, Gentoo, Fedora) packages https://github.com/RadeonOpenCompute/RoCm-CompilerSupport/, there's always 4 failing tests due to this issue.

@arsenm
Copy link
Contributor

arsenm commented Jan 27, 2023

This is because the ugly hack we use to postprocess the IR to handle calls to kernels by splitting into a kernel and a stub function is not upstream. You can avoid this by having a separate kernel function and implementation in a callable function.

Thanks for the explanation!

I'd like to know the hack refer to which commit(s)? Since I'm packaging ROCm stack, if users met such problems they can simply apply that patch upon common llvm releases as a temporary workaround.

It's this pass that's been copied around for years https://github.com/RadeonOpenCompute/llvm-project/blob/amd-stg-open/llvm/lib/Target/AMDGPU/AMDGPULowerKernelCalls.cpp

Really clang needs to introduce 2 separate functions and come up with a different name mangling scheme for the two. @cdevadas was looking at this at some point. I still think this was a terrible feature the spec should have disallowed

Understood. So, is there any plan for ROCm stack to remove such kind of unsupported call in GPU codes? Because when distribution (Debian, Gentoo, Fedora) packages https://github.com/RadeonOpenCompute/RoCm-CompilerSupport/, there's always 4 failing tests due to this issue.

The hope was it would be deleted and upstream would do something better in clang, which https://reviews.llvm.org/D120566 started towards

@littlewu2508
Copy link
Author

I'm not sure whether this is related to #62066, but seems that AMDGPU openmp is also bothered by unsupported libcall legalization in llvm::SITargetLowering::LowerCall

@arsenm
Copy link
Contributor

arsenm commented Aug 9, 2023

I'm not sure whether this is related to #62066, but seems that AMDGPU openmp is also bothered by unsupported libcall legalization in llvm::SITargetLowering::LowerCall

This would be unrelated

@cdevadas
Copy link
Collaborator

cdevadas commented Sep 5, 2024

lalaniket8 is working on bringing this feature in the upstream compiler at clang codegen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants