-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add jobs using clang as CUDA compiler #493
Conversation
@@ -674,7 +674,7 @@ __constexpr_isfinite(_A1 __lcpp_x) noexcept | |||
return isfinite(__lcpp_x); | |||
} | |||
|
|||
#if defined(_MSC_VER) || defined(__CUDACC_RTC__) | |||
#if defined(_MSC_VER) || defined(__CUDACC_RTC__) || defined(_LIBCUDACXX_COMPILER_CLANG_CUDA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco I have no idea what I'm doing. Please, take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that clang builtins are not available on device:
__global__ void kernel(double *ptr) {
ptr[0] = __builtin_logb(42.0);
}
int main() {
kernel<<<1, 1>>>(nullptr);
}
results in:
ptxas fatal : Unresolved extern function 'logb'
clang: error: ptxas command failed with exit code 255 (use -v to see invocation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have the GPU-side standard library these builtins would normally end up calling. Some of the math calls do end up being replaced by their __nv_*
counterparts, but there's none for logb
and that's why you see the unresolved reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we could contribute? I guess there is a special list of builtins and how they are delegated?
#ifdef TEST_COMPILER_CLANG_CUDA | ||
#pragma clang diagnostic ignored "-Wunneeded-internal-declaration" | ||
#endif // TEST_COMPILER_CLANG_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we should have a _LIBCUDACXX_DISABLE_CLANG_CUDA_DIAGNOSTIC
macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need one for gcc / clang diagnostics. I can cook something up, but it should not be required for this PR
@jrhemstad I believe this is ready to be merged whenever we get our hands on a new sccache binary |
clang is so fast that I'm fine merging this even without sccache. |
I mean we could just go with the release as with windows, but yeah it wont prolong our CI time compared to windows |
Interesting. I have not been paying attention to relative compilation speed of clang vs nvcc for CUDA, but over time clang did get slower. Early on we were a bit faster than NVCC, then, I think around CUDA-10 time-frame we became somewhat slower. I'd be very curious to see the build time comparison on something non-trivial, like thrust tests. I have another favor to ask. As you were making the build work with clang, what were the the issues/annoyances you ran into, in addition to the deduction guideline one? Positives are welcome, too. :-) With my own world view being heavily clang-tinted, It would be great to hear some feedback from folks who mostly work with nvcc. |
else: | ||
self.cxx.link_flags += ['-lc++'] | ||
# Device code does not have binary components, don't link libc++ | ||
# elif self.cxx.type != 'nvcc' and self.cxx.type != 'pgi': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just remove completely
I was quite surprised how easy it was to get clang-cuda working with our test suite. My hat is off for that 👏 Looking at the gymnastics I had to do to try and replace I personally do not have any numbers about compile times, but I spend the last few weeks on getting windows to pass so I am heavily tinted there ;) |
Description
Adds jobs using clang as the CUDA compiler. Only adding a job for the newest version of the CTK and clang.
The clang-cuda jobs are a bit shoehorned in because they don't fit the existing job structure right now. This will take a bit of massaging to make it nicer, but I'll probably defer that to future work after the cmake presets are done.
-x cuda
generated when using clang as cuda compilerIn order to build with clang as the CUDA compiler, you would do:
This is only tested to work on the
cuda12.2-llvm16
devcontainer.closes #344
Checklist