Add jobs using clang as CUDA compiler #493

jrhemstad · 2023-09-26T19:29:01Z

Description

Adds jobs using clang as the CUDA compiler. Only adding a job for the newest version of the CTK and clang.

The clang-cuda jobs are a bit shoehorned in because they don't fit the existing job structure right now. This will take a bit of massaging to make it nicer, but I'll probably defer that to future work after the cmake presets are done.

Thrust clang-cuda job
CUB clang-cuda job
libcu++ clang-cuda job
@robertmaynard to fix sccache failure on -x cuda generated when using clang as cuda compiler

In order to build with clang as the CUDA compiler, you would do:

CMAKE_CUDA_COMPILER=clang++ ci/build_thrust.sh clang++ 17 70

This is only tested to work on the cuda12.2-llvm16 devcontainer.

closes #344

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

…cuda-ci

gevtushenko · 2023-09-27T00:54:04Z

libcudacxx/include/cuda/std/detail/libcxx/include/cmath

@@ -674,7 +674,7 @@ __constexpr_isfinite(_A1 __lcpp_x) noexcept
    return isfinite(__lcpp_x);
 }

-#if defined(_MSC_VER) || defined(__CUDACC_RTC__)
+#if defined(_MSC_VER) || defined(__CUDACC_RTC__) || defined(_LIBCUDACXX_COMPILER_CLANG_CUDA)


@miscco I have no idea what I'm doing. Please, take a look.

It seems that clang builtins are not available on device:

__global__ void kernel(double *ptr) { ptr[0] = __builtin_logb(42.0); } int main() { kernel<<<1, 1>>>(nullptr); }

results in:

ptxas fatal : Unresolved extern function 'logb' clang: error: ptxas command failed with exit code 255 (use -v to see invocation)

We do not have the GPU-side standard library these builtins would normally end up calling. Some of the math calls do end up being replaced by their __nv_* counterparts, but there's none for logb and that's why you see the unresolved reference.

Is this something we could contribute? I guess there is a special list of builtins and how they are delegated?

…cuda-ci

jrhemstad · 2023-09-28T18:18:18Z

...r.requirements/indirectcallable/indirectinvocable/indirect_binary_predicate.compile.pass.cpp

+#ifdef TEST_COMPILER_CLANG_CUDA
+#pragma clang diagnostic ignored "-Wunneeded-internal-declaration"
+#endif // TEST_COMPILER_CLANG_CUDA


Seems like we should have a _LIBCUDACXX_DISABLE_CLANG_CUDA_DIAGNOSTIC macro?

I believe we need one for gcc / clang diagnostics. I can cook something up, but it should not be required for this PR

miscco · 2023-10-01T17:19:31Z

@jrhemstad I believe this is ready to be merged whenever we get our hands on a new sccache binary

jrhemstad · 2023-10-02T14:46:24Z

@jrhemstad I believe this is ready to be merged whenever we get our hands on a new sccache binary

clang is so fast that I'm fine merging this even without sccache.

miscco · 2023-10-02T16:01:41Z

I mean we could just go with the release as with windows, but yeah it wont prolong our CI time compared to windows

Artem-B · 2023-10-02T16:45:06Z

clang is so fast that I'm fine merging this even without sccache.

Interesting. I have not been paying attention to relative compilation speed of clang vs nvcc for CUDA, but over time clang did get slower. Early on we were a bit faster than NVCC, then, I think around CUDA-10 time-frame we became somewhat slower.

I'd be very curious to see the build time comparison on something non-trivial, like thrust tests.

I have another favor to ask. As you were making the build work with clang, what were the the issues/annoyances you ran into, in addition to the deduction guideline one? Positives are welcome, too. :-) With my own world view being heavily clang-tinted, It would be great to hear some feedback from folks who mostly work with nvcc.

libcudacxx/.upstream-tests/test/CMakeLists.txt

miscco · 2023-10-02T17:36:09Z

libcudacxx/.upstream-tests/utils/libcudacxx/test/config.py

-            else:
-                self.cxx.link_flags += ['-lc++']
+        # Device code does not have binary components, don't link libc++
+        # elif self.cxx.type != 'nvcc' and self.cxx.type != 'pgi':


maybe just remove completely

miscco · 2023-10-02T17:49:43Z

I have another favor to ask. As you were making the build work with clang, what were the the issues/annoyances you ran into, in addition to the deduction guideline one?

I was quite surprised how easy it was to get clang-cuda working with our test suite. My hat is off for that 👏

Looking at the gymnastics I had to do to try and replace __managed__ I believe it is something that would be beneficial, especially with our newer architectures.

I personally do not have any numbers about compile times, but I spend the last few weeks on getting windows to pass so I am heavily tinted there ;)

* Restore disabling benchmarks from ci scripts (removed in #493)

jrhemstad and others added 26 commits September 26, 2023 19:02

Allow setting CUDA compiler via CMAKE_CUDA_COMPILER envvar.

1e5c6b3

Move nvcc version check to CUB script.

388af5b

Add clang-cuda job to matrix.

b625e93

Add compiler field to matrix for clang-cuda.

260805f

Add Thrust clang-cuda job.

5ac0622

Fix formatting.

a428b21

s/need/needs/

3da71f4

Can't spell good.

25efa7c

[skip-tests] Add clang cuda job to status check job.

ce622cf

Disable other jobs for now.

13c95ba

Disable other jobs in status check.

800913c

Add output to compute matrix job.

5667982

Missin quote.

8e66201

Fix logic for enabling CUB benchmarks.

8a2560b

Fix reference to cuda version in job name.

dc550e7

make clang-cuda job matrix over libs.

afd8f13

Fix build script to use matrix lib value.

abb8235

Fix job name in status check.

873db9e

Fix formatting.

93a10e5

Fix job name.

26938c2

Generate custom matrix with cartesian product of libs.

1356a4e

Add hacks that allow clang-cuda to work.

7437913

Merge branch 'main' into clang-cuda-ci

81c8efa

Merge branch 'clang-cuda-ci' of github.com:jrhemstad/cccl into clang-…

22fcb5e

…cuda-ci

Do not build RDC tests for Clang CUDA

4d4616b

Attempt to fix thrust::complex for Clang-CUDA

4e204b2

gevtushenko reviewed Sep 27, 2023

View reviewed changes

miscco added 3 commits September 27, 2023 08:12

Fix macro definitions that are nvcc specific

9acf004

Add missing header that is otherwise coming from the cuda side

ff7a43a

Fix invalid initialization order in constructor

1f3e8c4

jrhemstad added 2 commits September 28, 2023 18:11

Merge branch 'clang-cuda-ci' of github.com:jrhemstad/cccl into clang-…

8980e65

…cuda-ci

Re-enable other jobs in status check.

4be1ed2

jrhemstad commented Sep 28, 2023

View reviewed changes

jrhemstad and others added 8 commits September 28, 2023 18:42

Update clang-cuda job names.

c8154ee

Try not to add invalid flag to clang

7501721

try to fix is_nothrow_invocable test

fa10123

Mark is_swappable test as potentially passing

16b6e58

Make MSVC pass

34e270a

Unfail test that seems to pass

894c986

Fix test for nvrtc

b95c9e6

Fix fail test

42519b6

miscco requested a review from gevtushenko October 1, 2023 17:18

miscco reviewed Oct 2, 2023

View reviewed changes

wmaxey approved these changes Oct 2, 2023

View reviewed changes

jrhemstad mentioned this pull request Oct 4, 2023

Migrate CI configs to CMake presets. #324

Merged

miscco and others added 4 commits October 10, 2023 10:59

Address review comments

995fea5

Do not pass warnings flags similar to nvcc for clang-cuda

60682bf

Merge branch 'main' into clang-cuda-ci

e1ca980

Merge branch 'main' into pr/jrhemstad/493

46f0941

jrhemstad merged commit d372fbc into NVIDIA:main Oct 11, 2023
467 checks passed

This was referenced Nov 8, 2023

Feature: clang CUDA (device compilation) support #991

Closed

Enable clang to compile <atomic> in device code #1020

Closed

elstehle pushed a commit to elstehle/cccl that referenced this pull request Nov 16, 2023

Disable failing test for gcc7 as that is broken (NVIDIA#493)

9d2147e

wmaxey added a commit that referenced this pull request Feb 29, 2024

Restore disabling benchmarks from ci scripts (removed in #493)

de76435

wmaxey added a commit that referenced this pull request Mar 1, 2024

Restore disabling benchmarks from ci scripts (removed in #493) (#1458)

2acbea2

* Restore disabling benchmarks from ci scripts (removed in #493)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jobs using clang as CUDA compiler #493

Add jobs using clang as CUDA compiler #493

jrhemstad commented Sep 26, 2023 •

edited

Loading

gevtushenko Sep 27, 2023

gevtushenko Sep 27, 2023

Artem-B Sep 27, 2023

miscco Sep 28, 2023

jrhemstad Sep 28, 2023

miscco Sep 28, 2023

miscco commented Oct 1, 2023

jrhemstad commented Oct 2, 2023

miscco commented Oct 2, 2023

Artem-B commented Oct 2, 2023

miscco Oct 2, 2023

miscco commented Oct 2, 2023

Add jobs using clang as CUDA compiler #493

Add jobs using clang as CUDA compiler #493

Conversation

jrhemstad commented Sep 26, 2023 • edited Loading

Description

Checklist

gevtushenko Sep 27, 2023

Choose a reason for hiding this comment

gevtushenko Sep 27, 2023

Choose a reason for hiding this comment

Artem-B Sep 27, 2023

Choose a reason for hiding this comment

miscco Sep 28, 2023

Choose a reason for hiding this comment

jrhemstad Sep 28, 2023

Choose a reason for hiding this comment

miscco Sep 28, 2023

Choose a reason for hiding this comment

miscco commented Oct 1, 2023

jrhemstad commented Oct 2, 2023

miscco commented Oct 2, 2023

Artem-B commented Oct 2, 2023

miscco Oct 2, 2023

Choose a reason for hiding this comment

miscco commented Oct 2, 2023

jrhemstad commented Sep 26, 2023 •

edited

Loading