[CIR][CUDA] Generate kernel calls #1348

AdUhTkJm · 2025-02-14T10:31:05Z

Now we could generate calls to __global__ functions.

Most work is already done in AST. It rewrites fn<<<2, 2>>>() to something like __cudaPushCallConfiguration(dim3(2, 1, 1), dim3(2, 1, 1), 0, nullptr), which returns a bool. We calls the device stub as a normal function when the call returns true.

bcardosolopes

Awesome

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in #1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In #1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Now we could generate calls to `__global__` functions. Most work is already done in AST. It rewrites `fn<<<2, 2>>>()` to something like `__cudaPushCallConfiguration(dim3(2, 1, 1), dim3(2, 1, 1), 0, nullptr)`, which returns a bool. We calls the device stub as a normal function when the call returns true.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in #1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In #1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

AdUhTkJm requested review from lanza and bcardosolopes as code owners February 14, 2025 10:31

[CIR][CUDA] Generate kernel calls

420493e

AdUhTkJm force-pushed the main branch from c364337 to 420493e Compare February 14, 2025 10:31

bcardosolopes approved these changes Feb 14, 2025

View reviewed changes

bcardosolopes merged commit cf491db into llvm:main Feb 14, 2025
6 checks passed

AdUhTkJm mentioned this pull request Mar 9, 2025

[CIR][CUDA] Miscellanous bugfixes #1462

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][CUDA] Generate kernel calls #1348

[CIR][CUDA] Generate kernel calls #1348

Uh oh!

AdUhTkJm commented Feb 14, 2025

Uh oh!

bcardosolopes left a comment

Uh oh!

Uh oh!

Uh oh!

[CIR][CUDA] Generate kernel calls #1348

[CIR][CUDA] Generate kernel calls #1348

Uh oh!

Conversation

AdUhTkJm commented Feb 14, 2025

Uh oh!

bcardosolopes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!