Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cub::DeviceTransform unwrapping cudax:async_buffer::iterator #4083

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Mar 11, 2025

This PR fixes a performance issue when cub::DeviceTransform is used with cudax:async_buffer, by fixing thrust::is_contiguous_iterator.

Adding tests requires CUB and Thrust unit tests to have access to cudax. Please advice me on how to correctly link against cudax.

@bernhardmgruber bernhardmgruber requested review from a team as code owners March 11, 2025 11:56
@bernhardmgruber bernhardmgruber force-pushed the transform_async_buffer branch from 5b400f5 to f0ea8dd Compare March 11, 2025 12:01
Copy link
Contributor

🟨 CI finished in 1h 16m: Pass: 26%/93 | Total: 22h 34m | Avg: 14m 33s | Max: 1h 14m | Hits: 52%/39457
  • 🟥 cub: Pass: 0%/45 | Total: 1h 56m | Avg: 2m 35s | Max: 9m 29s

    🟥 cpu
      🟥 amd64              Pass:   0%/43  | Total:  1h 52m | Avg:  2m 37s | Max:  9m 29s
      🟥 arm64              Pass:   0%/2   | Total:  3m 40s | Avg:  1m 50s | Max:  1m 54s
    🟥 ctk
      🟥 12.0               Pass:   0%/5   | Total: 17m 48s | Avg:  3m 33s | Max:  8m 40s
      🟥 12.5               Pass:   0%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 19s
      🟥 12.8               Pass:   0%/38  | Total:  1h 30m | Avg:  2m 22s | Max:  9m 29s
    🟥 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  4m 42s | Avg:  2m 21s | Max:  2m 23s
      🟥 nvcc12.0           Pass:   0%/5   | Total: 17m 48s | Avg:  3m 33s | Max:  8m 40s
      🟥 nvcc12.5           Pass:   0%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 19s
      🟥 nvcc12.8           Pass:   0%/36  | Total:  1h 25m | Avg:  2m 22s | Max:  9m 29s
    🟥 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  4m 42s | Avg:  2m 21s | Max:  2m 23s
      🟥 nvcc               Pass:   0%/43  | Total:  1h 51m | Avg:  2m 35s | Max:  9m 29s
    🟥 cxx
      🟥 Clang14            Pass:   0%/4   | Total:  9m 48s | Avg:  2m 27s | Max:  2m 36s
      🟥 Clang15            Pass:   0%/2   | Total:  4m 54s | Avg:  2m 27s | Max:  2m 32s
      🟥 Clang16            Pass:   0%/2   | Total:  4m 57s | Avg:  2m 28s | Max:  2m 31s
      🟥 Clang17            Pass:   0%/2   | Total:  4m 45s | Avg:  2m 22s | Max:  2m 25s
      🟥 Clang18            Pass:   0%/7   | Total: 11m 34s | Avg:  1m 39s | Max:  2m 31s
      🟥 GCC7               Pass:   0%/2   | Total:  4m 25s | Avg:  2m 12s | Max:  2m 18s
      🟥 GCC8               Pass:   0%/1   | Total:  2m 12s | Avg:  2m 12s | Max:  2m 12s
      🟥 GCC9               Pass:   0%/2   | Total:  4m 18s | Avg:  2m 09s | Max:  2m 12s
      🟥 GCC10              Pass:   0%/2   | Total:  4m 24s | Avg:  2m 12s | Max:  2m 13s
      🟥 GCC11              Pass:   0%/2   | Total:  4m 27s | Avg:  2m 13s | Max:  2m 15s
      🟥 GCC12              Pass:   0%/2   | Total:  4m 46s | Avg:  2m 23s | Max:  2m 28s
      🟥 GCC13              Pass:   0%/11  | Total: 11m 05s | Avg:  1m 00s | Max:  2m 24s
      🟥 MSVC14.29          Pass:   0%/2   | Total: 18m 09s | Avg:  9m 04s | Max:  9m 29s
      🟥 MSVC14.42          Pass:   0%/2   | Total: 18m 06s | Avg:  9m 03s | Max:  9m 06s
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 19s
    🟥 cxx_family
      🟥 Clang              Pass:   0%/17  | Total: 35m 58s | Avg:  2m 06s | Max:  2m 36s
      🟥 GCC                Pass:   0%/22  | Total: 35m 37s | Avg:  1m 37s | Max:  2m 28s
      🟥 MSVC               Pass:   0%/4   | Total: 36m 15s | Avg:  9m 03s | Max:  9m 29s
      🟥 NVHPC              Pass:   0%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 19s
    🟥 gpu
      🟥 h100               Pass:   0%/3   | Total:  2m 14s | Avg:  0m 44s | Max:  2m 14s
      🟥 rtx2080            Pass:   0%/34  | Total:  1h 49m | Avg:  3m 12s | Max:  9m 29s
      🟥 rtxa6000           Pass:   0%/8   | Total:  4m 51s | Avg:  0m 36s | Max:  2m 31s
    🟥 jobs
      🟥 Build              Pass:   0%/37  | Total:  1h 56m | Avg:  3m 08s | Max:  9m 29s
      🟥 DeviceLaunch       Pass:   0%/1  
      🟥 GraphCapture       Pass:   0%/1  
      🟥 HostLaunch         Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/3  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total:  2m 14s | Avg:  0m 44s | Max:  2m 14s
      🟥 90;90a;100         Pass:   0%/1   | Total:  2m 24s | Avg:  2m 24s | Max:  2m 24s
    🟥 std
      🟥 17                 Pass:   0%/20  | Total:  1h 08m | Avg:  3m 25s | Max:  9m 29s
      🟥 20                 Pass:   0%/25  | Total: 48m 01s | Avg:  1m 55s | Max:  9m 00s
    
  • 🟨 thrust: Pass: 48%/45 | Total: 19h 12m | Avg: 25m 37s | Max: 1h 14m | Hits: 52%/39137

    🟨 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 26m | Avg: 41m 14s | Max:  1h 05m | Hits:  53%/8891  
      🟨 12.5               Pass:  50%/2   | Total:  1h 30m | Avg: 45m 04s | Max:  1h 07m | Hits:  30%/1779  
      🟨 12.8               Pass:  42%/38  | Total: 14h 16m | Avg: 22m 32s | Max:  1h 14m | Hits:  52%/28467 
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  5m 45s | Avg:  2m 52s | Max:  2m 54s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 26m | Avg: 41m 14s | Max:  1h 05m | Hits:  53%/8891  
      🟨 nvcc12.5           Pass:  50%/2   | Total:  1h 30m | Avg: 45m 04s | Max:  1h 07m | Hits:  30%/1779  
      🟨 nvcc12.8           Pass:  44%/36  | Total: 14h 10m | Avg: 23m 38s | Max:  1h 14m | Hits:  52%/28467 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 19m | Avg: 34m 53s | Max: 35m 10s | Hits:  58%/7116  
      🟨 Clang15            Pass:  50%/2   | Total: 48m 04s | Avg: 24m 02s | Max: 34m 12s | Hits:  49%/1779  
      🟨 Clang16            Pass:  50%/2   | Total: 45m 51s | Avg: 22m 55s | Max: 33m 06s | Hits:  49%/1779  
      🟨 Clang17            Pass:  50%/2   | Total: 45m 02s | Avg: 22m 31s | Max: 33m 37s | Hits:  48%/1779  
      🟨 Clang18            Pass:  14%/7   | Total:  1h 02m | Avg:  8m 59s | Max: 34m 46s | Hits:  49%/1779  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 17s | Max: 36m 05s | Hits:  62%/3560  
      🟩 GCC8               Pass: 100%/1   | Total: 32m 11s | Avg: 32m 11s | Max: 32m 11s | Hits:  49%/1780  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 12m | Avg: 36m 26s | Max: 36m 30s | Hits:  55%/3560  
      🟨 GCC10              Pass:  50%/2   | Total: 46m 31s | Avg: 23m 15s | Max: 34m 20s | Hits:  49%/1780  
      🟨 GCC11              Pass:  50%/2   | Total: 49m 23s | Avg: 24m 41s | Max: 35m 49s | Hits:  47%/1780  
      🟨 GCC12              Pass:  50%/2   | Total: 51m 49s | Avg: 25m 54s | Max: 38m 15s | Hits:  49%/1780  
      🟨 GCC13              Pass:  30%/10  | Total:  2h 03m | Avg: 12m 19s | Max: 34m 31s | Hits:  66%/5340  
      🟨 MSVC14.29          Pass:  50%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m | Hits:  33%/1773  
      🟨 MSVC14.42          Pass:  33%/3   | Total:  2h 24m | Avg: 48m 09s | Max:  1h 14m | Hits:  21%/1773  
      🟨 NVHPC24.7          Pass:  50%/2   | Total:  1h 30m | Avg: 45m 04s | Max:  1h 07m | Hits:  30%/1779  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 28s | Avg: 20m 44s | Max: 29m 59s | Hits:  74%/3560  
    🟨 cpu
      🟨 amd64              Pass:  51%/43  | Total: 18h 50m | Avg: 26m 17s | Max:  1h 14m | Hits:  52%/39137 
      🟥 arm64              Pass:   0%/2   | Total: 22m 11s | Avg: 11m 05s | Max: 11m 47s
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  5m 45s | Avg:  2m 52s | Max:  2m 54s
      🟨 nvcc               Pass:  51%/43  | Total: 19h 07m | Avg: 26m 40s | Max:  1h 14m | Hits:  52%/39137 
    🟨 cxx_family
      🟨 Clang              Pass:  47%/17  | Total:  5h 41m | Avg: 20m 05s | Max: 35m 10s | Hits:  53%/14232 
      🟨 GCC                Pass:  52%/21  | Total:  7h 26m | Avg: 21m 16s | Max: 38m 15s | Hits:  57%/19580 
      🟨 MSVC               Pass:  40%/5   | Total:  4h 34m | Avg: 54m 57s | Max:  1h 14m | Hits:  27%/3546  
      🟨 NVHPC              Pass:  50%/2   | Total:  1h 30m | Avg: 45m 04s | Max:  1h 07m | Hits:  30%/1779  
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  8m 55s
      🟨 rtx2080            Pass:  60%/33  | Total: 16h 42m | Avg: 30m 23s | Max:  1h 10m | Hits:  49%/35577 
      🟨 rtx4090            Pass:  20%/10  | Total:  2h 21m | Avg: 14m 07s | Max:  1h 14m | Hits:  74%/3560  
    🟨 jobs
      🟨 Build              Pass:  55%/38  | Total: 19h 01m | Avg: 30m 02s | Max:  1h 14m | Hits:  49%/37357 
      🟥 TestCPU            Pass:   0%/3  
      🟨 TestGPU            Pass:  25%/4   | Total: 11m 29s | Avg:  2m 52s | Max: 11m 29s | Hits:  99%/1780  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  8m 55s | Avg:  4m 27s | Max:  8m 55s
      🟥 90;90a;100         Pass:   0%/1   | Total: 12m 55s | Avg: 12m 55s | Max: 12m 55s
    🟨 std
      🟨 17                 Pass:  90%/20  | Total: 13h 14m | Avg: 39m 42s | Max:  1h 10m | Hits:  49%/32019 
      🟨 20                 Pass:   8%/23  | Total:  5h 17m | Avg: 13m 48s | Max:  1h 14m | Hits:  53%/3558  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 23m 16s | Avg: 11m 38s | Max: 21m 03s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 21m 03s | Avg: 21m 03s | Max: 21m 03s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1


namespace cudax = cuda::experimental;

C2H_TEST("DeviceTransform::Transform cudax::async_device_buffer", "[device][device_transform]", algorithms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto we should move to cudax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of disagree. This test is more about cub::DeviceTransform correctly working on the async_device_vector, than whether the vector can be transformed.

@bernhardmgruber bernhardmgruber force-pushed the transform_async_buffer branch from f0ea8dd to ef41394 Compare March 11, 2025 17:04
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner March 11, 2025 17:04
Copy link
Contributor

🟨 CI finished in 1h 19m: Pass: 43%/115 | Total: 2d 13h | Avg: 32m 05s | Max: 1h 17m | Hits: 59%/56460
  • 🟥 cub: Pass: 0%/45 | Total: 1d 13h | Avg: 50m 19s | Max: 1h 17m

    🟥 cpu
      🟥 amd64              Pass:   0%/43  | Total:  1d 11h | Avg: 49m 47s | Max:  1h 17m
      🟥 arm64              Pass:   0%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m
    🟥 ctk
      🟥 12.0               Pass:   0%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 05m
      🟥 12.5               Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
      🟥 12.8               Pass:   0%/38  | Total:  1d 06h | Avg: 48m 04s | Max:  1h 17m
    🟥 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  1h 59m | Avg: 59m 53s | Max:  1h 00m
      🟥 nvcc12.0           Pass:   0%/5   | Total:  5h 02m | Avg:  1h 00m | Max:  1h 05m
      🟥 nvcc12.5           Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
      🟥 nvcc12.8           Pass:   0%/36  | Total:  1d 04h | Avg: 47m 25s | Max:  1h 17m
    🟥 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  1h 59m | Avg: 59m 53s | Max:  1h 00m
      🟥 nvcc               Pass:   0%/43  | Total:  1d 11h | Avg: 49m 52s | Max:  1h 17m
    🟥 cxx
      🟥 Clang14            Pass:   0%/4   | Total:  3h 59m | Avg: 59m 47s | Max:  1h 04m
      🟥 Clang15            Pass:   0%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m
      🟥 Clang16            Pass:   0%/2   | Total:  1h 56m | Avg: 58m 10s | Max: 58m 50s
      🟥 Clang17            Pass:   0%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 05m
      🟥 Clang18            Pass:   0%/7   | Total:  4h 57m | Avg: 42m 33s | Max:  1h 02m
      🟥 GCC7               Pass:   0%/2   | Total:  1h 56m | Avg: 58m 17s | Max: 58m 53s
      🟥 GCC8               Pass:   0%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m
      🟥 GCC9               Pass:   0%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 04m
      🟥 GCC10              Pass:   0%/2   | Total:  1h 59m | Avg: 59m 55s | Max:  1h 02m
      🟥 GCC11              Pass:   0%/2   | Total:  1h 52m | Avg: 56m 19s | Max: 57m 08s
      🟥 GCC12              Pass:   0%/2   | Total:  1h 57m | Avg: 58m 50s | Max: 59m 15s
      🟥 GCC13              Pass:   0%/11  | Total:  4h 39m | Avg: 25m 26s | Max:  1h 11m
      🟥 MSVC14.29          Pass:   0%/2   | Total:  2h 22m | Avg:  1h 11m | Max:  1h 17m
      🟥 MSVC14.42          Pass:   0%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 16m
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
    🟥 cxx_family
      🟥 Clang              Pass:   0%/17  | Total: 15h 03m | Avg: 53m 08s | Max:  1h 05m
      🟥 GCC                Pass:   0%/22  | Total: 15h 33m | Avg: 42m 26s | Max:  1h 11m
      🟥 MSVC               Pass:   0%/4   | Total:  4h 52m | Avg:  1h 13m | Max:  1h 17m
      🟥 NVHPC              Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 07m
    🟥 gpu
      🟥 h100               Pass:   0%/3   | Total: 25m 46s | Avg:  8m 35s | Max: 25m 46s
      🟥 rtx2080            Pass:   0%/34  | Total:  1d 11h | Avg:  1h 02m | Max:  1h 17m
      🟥 rtxa6000           Pass:   0%/8   | Total:  1h 54m | Avg: 14m 16s | Max: 59m 18s
    🟥 jobs
      🟥 Build              Pass:   0%/37  | Total:  1d 13h | Avg:  1h 01m | Max:  1h 17m
      🟥 DeviceLaunch       Pass:   0%/1  
      🟥 GraphCapture       Pass:   0%/1  
      🟥 HostLaunch         Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/3  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total: 25m 46s | Avg:  8m 35s | Max: 25m 46s
      🟥 90;90a;100         Pass:   0%/1   | Total:  1h 11m | Avg:  1h 11m | Max:  1h 11m
    🟥 std
      🟥 17                 Pass:   0%/20  | Total: 21h 01m | Avg:  1h 03m | Max:  1h 17m
      🟥 20                 Pass:   0%/25  | Total: 16h 43m | Avg: 40m 07s | Max:  1h 13m
    
  • 🟨 thrust: Pass: 55%/45 | Total: 19h 58m | Avg: 26m 37s | Max: 1h 13m | Hits: 51%/44418

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 59m 33s | Avg: 29m 46s | Max: 30m 17s | Hits:  48%/3554  
      🔍 nvcc               Pass:  53%/43  | Total: 18h 58m | Avg: 26m 28s | Max:  1h 13m | Hits:  51%/40864 
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total: 13h 49m | Avg: 41m 28s | Max:  1h 13m | Hits:  47%/35531 
      🔍 20                 Pass:  13%/23  | Total:  5h 27m | Avg: 14m 13s | Max:  1h 04m | Hits:  55%/5331  
    🟨 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 19m | Avg: 39m 50s | Max:  1h 04m | Hits:  50%/8881  
      🟨 12.5               Pass:  50%/2   | Total:  1h 35m | Avg: 47m 54s | Max:  1h 13m | Hits:  30%/1777  
      🟨 12.8               Pass:  50%/38  | Total: 15h 03m | Avg: 23m 46s | Max:  1h 10m | Hits:  52%/33760 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 59m 33s | Avg: 29m 46s | Max: 30m 17s | Hits:  48%/3554  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 19m | Avg: 39m 50s | Max:  1h 04m | Hits:  50%/8881  
      🟨 nvcc12.5           Pass:  50%/2   | Total:  1h 35m | Avg: 47m 54s | Max:  1h 13m | Hits:  30%/1777  
      🟨 nvcc12.8           Pass:  47%/36  | Total: 14h 03m | Avg: 23m 26s | Max:  1h 10m | Hits:  52%/30206 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 32s | Max: 33m 35s | Hits:  57%/7108  
      🟨 Clang15            Pass:  50%/2   | Total: 49m 05s | Avg: 24m 32s | Max: 36m 02s | Hits:  49%/1777  
      🟨 Clang16            Pass:  50%/2   | Total: 46m 56s | Avg: 23m 28s | Max: 35m 02s | Hits:  49%/1777  
      🟨 Clang17            Pass:  50%/2   | Total: 46m 32s | Avg: 23m 16s | Max: 34m 21s | Hits:  49%/1777  
      🟨 Clang18            Pass:  42%/7   | Total:  1h 58m | Avg: 16m 51s | Max: 35m 47s | Hits:  50%/5331  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 31s | Max: 34m 40s | Hits:  58%/3556  
      🟩 GCC8               Pass: 100%/1   | Total: 33m 58s | Avg: 33m 58s | Max: 33m 58s | Hits:  49%/1778  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 10m | Avg: 35m 16s | Max: 35m 22s | Hits:  60%/3556  
      🟨 GCC10              Pass:  50%/2   | Total: 48m 27s | Avg: 24m 13s | Max: 35m 12s | Hits:  49%/1778  
      🟨 GCC11              Pass:  50%/2   | Total: 49m 34s | Avg: 24m 47s | Max: 35m 52s | Hits:  49%/1778  
      🟨 GCC12              Pass:  50%/2   | Total: 51m 22s | Avg: 25m 41s | Max: 39m 13s | Hits:  49%/1778  
      🟨 GCC13              Pass:  30%/10  | Total:  2h 06m | Avg: 12m 36s | Max: 37m 10s | Hits:  65%/5334  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m | Hits:  30%/3542  
      🟨 MSVC14.42          Pass:  33%/3   | Total:  2h 15m | Avg: 45m 09s | Max:  1h 10m | Hits:  21%/1771  
      🟨 NVHPC24.7          Pass:  50%/2   | Total:  1h 35m | Avg: 47m 54s | Max:  1h 13m | Hits:  30%/1777  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 34s | Avg: 20m 47s | Max: 30m 16s | Hits:  74%/3556  
    🟨 cpu
      🟨 amd64              Pass:  58%/43  | Total: 19h 34m | Avg: 27m 18s | Max:  1h 13m | Hits:  51%/44418 
      🟥 arm64              Pass:   0%/2   | Total: 23m 48s | Avg: 11m 54s | Max: 12m 32s
    🟨 cxx_family
      🟨 Clang              Pass:  58%/17  | Total:  6h 30m | Avg: 22m 59s | Max: 36m 02s | Hits:  53%/17770 
      🟨 GCC                Pass:  52%/21  | Total:  7h 29m | Avg: 21m 22s | Max: 39m 13s | Hits:  57%/19558 
      🟨 MSVC               Pass:  60%/5   | Total:  4h 22m | Avg: 52m 32s | Max:  1h 10m | Hits:  27%/5313  
      🟨 NVHPC              Pass:  50%/2   | Total:  1h 35m | Avg: 47m 54s | Max:  1h 13m | Hits:  30%/1777  
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  8m 02s | Avg:  4m 01s | Max:  8m 02s
      🟨 rtx2080            Pass:  69%/33  | Total: 17h 38m | Avg: 32m 05s | Max:  1h 13m | Hits:  49%/40862 
      🟨 rtx4090            Pass:  20%/10  | Total:  2h 11m | Avg: 13m 07s | Max:  1h 04m | Hits:  74%/3556  
    🟨 jobs
      🟨 Build              Pass:  63%/38  | Total: 19h 47m | Avg: 31m 14s | Max:  1h 13m | Hits:  49%/42640 
      🟥 TestCPU            Pass:   0%/3  
      🟨 TestGPU            Pass:  25%/4   | Total: 11m 18s | Avg:  2m 49s | Max: 11m 18s | Hits:  99%/1778  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  8m 02s | Avg:  4m 01s | Max:  8m 02s
      🟥 90;90a;100         Pass:   0%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 30m | Avg: 6m 49s | Max: 18m 30s | Hits: 92%/11722

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  2h 14m | Avg:  7m 29s | Max: 18m 30s | Hits:  91%/9406  
      🟩 arm64              Pass: 100%/4   | Total: 15m 15s | Avg:  3m 48s | Max:  3m 56s | Hits:  96%/2316  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 13m 18s | Avg: 13m 18s | Max: 13m 18s | Hits:  57%/277   
      🟩 12.5               Pass: 100%/2   | Total: 18m 39s | Avg:  9m 19s | Max:  9m 39s | Hits:  57%/742   
      🟩 12.8               Pass: 100%/19  | Total:  1h 58m | Avg:  6m 13s | Max: 18m 30s | Hits:  95%/10703 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 13m 18s | Avg: 13m 18s | Max: 13m 18s | Hits:  57%/277   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 39s | Avg:  9m 19s | Max:  9m 39s | Hits:  57%/742   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 58m | Avg:  6m 13s | Max: 18m 30s | Hits:  95%/10703 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 30m | Avg:  6m 49s | Max: 18m 30s | Hits:  92%/11722 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s | Hits:  96%/581   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 14s | Avg:  4m 14s | Max:  4m 14s | Hits:  96%/579   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 37s | Avg:  4m 37s | Max:  4m 37s | Hits:  96%/579   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 30s | Avg:  4m 30s | Max:  4m 30s | Hits:  96%/579   
      🟩 Clang18            Pass: 100%/4   | Total: 25m 12s | Avg:  6m 18s | Max: 13m 07s | Hits:  97%/2316  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 21s | Avg:  4m 21s | Max:  4m 21s | Hits:  95%/581   
      🟩 GCC11              Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s | Hits:  93%/579   
      🟩 GCC12              Pass: 100%/2   | Total: 22m 53s | Avg: 11m 26s | Max: 18m 30s | Hits:  97%/1158  
      🟩 GCC13              Pass: 100%/6   | Total: 30m 17s | Avg:  5m 02s | Max: 11m 18s | Hits:  96%/3474  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 18s | Avg: 13m 18s | Max: 13m 18s | Hits:  57%/277   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 12m 36s | Avg: 12m 36s | Max: 12m 36s | Hits:  57%/277   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 39s | Avg:  9m 19s | Max:  9m 39s | Hits:  57%/742   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 42m 41s | Avg:  5m 20s | Max: 13m 07s | Hits:  96%/4634  
      🟩 GCC                Pass: 100%/10  | Total:  1h 02m | Avg:  6m 17s | Max: 18m 30s | Hits:  96%/5792  
      🟩 MSVC               Pass: 100%/2   | Total: 25m 54s | Avg: 12m 57s | Max: 13m 18s | Hits:  57%/554   
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 39s | Avg:  9m 19s | Max:  9m 39s | Hits:  57%/742   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 15m 01s | Avg:  7m 30s | Max: 11m 18s | Hits:  97%/1158  
      🟩 rtx2080            Pass: 100%/20  | Total:  2h 15m | Avg:  6m 45s | Max: 18m 30s | Hits:  91%/10564 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 47m | Avg:  5m 38s | Max: 13m 18s | Hits:  90%/9985  
      🟩 Test               Pass: 100%/3   | Total: 42m 55s | Avg: 14m 18s | Max: 18m 30s | Hits:  99%/1737  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 18m 50s | Avg:  6m 16s | Max: 11m 18s | Hits:  97%/1737  
      🟩 90a                Pass: 100%/1   | Total:  3m 38s | Avg:  3m 38s | Max:  3m 38s | Hits:  95%/579   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 20m 31s | Avg:  5m 07s | Max:  9m 00s | Hits:  89%/2108  
      🟩 20                 Pass: 100%/18  | Total:  2h 09m | Avg:  7m 11s | Max: 18m 30s | Hits:  92%/9614  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 16m 43s | Avg: 8m 21s | Max: 14m 09s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 16m 43s | Avg:  8m 21s | Max: 14m 09s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 34s | Avg:  2m 34s | Max:  2m 34s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 14m 09s | Avg: 14m 09s | Max: 14m 09s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 115)

# Runner
79 linux-amd64-cpu16
11 windows-amd64-cpu16
8 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-amd64-gpu-rtx2080-latest-1
4 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@bernhardmgruber bernhardmgruber force-pushed the transform_async_buffer branch 3 times, most recently from d2a5c30 to b275629 Compare March 12, 2025 20:00
@bernhardmgruber bernhardmgruber force-pushed the transform_async_buffer branch from b275629 to fb1d7bf Compare March 12, 2025 20:00
Copy link
Contributor

🟨 CI finished in 1h 06m: Pass: 42%/115 | Total: 2d 11h | Avg: 31m 10s | Max: 1h 05m | Hits: 66%/53046
  • 🟨 cub: Pass: 6%/45 | Total: 1d 13h | Avg: 49m 28s | Max: 1h 05m | Hits: 86%/3654

    🚨 sm: 90;90a;100 🚨
      🟩 90                 Pass: 100%/3   | Total:  1h 11m | Avg: 23m 56s | Max: 26m 17s | Hits:  86%/3654  
      🔥 90;90a;100         Pass:   0%/1   | Total:  1h 05m | Avg:  1h 05m | Max:  1h 05m
    🟨 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 11m | Avg: 23m 56s | Max: 26m 17s | Hits:  86%/3654  
      🟥 rtx2080            Pass:   0%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 05m
      🟥 rtxa6000           Pass:   0%/8   | Total:  1h 46m | Avg: 13m 16s | Max:  1h 00m
    🟨 cpu
      🟨 amd64              Pass:   6%/43  | Total:  1d 11h | Avg: 48m 55s | Max:  1h 05m | Hits:  86%/3654  
      🟥 arm64              Pass:   0%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟨 ctk
      🟥 12.0               Pass:   0%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 00m
      🟥 12.5               Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 12.8               Pass:   7%/38  | Total:  1d 06h | Avg: 47m 28s | Max:  1h 05m | Hits:  86%/3654  
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 nvcc12.0           Pass:   0%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 00m
      🟥 nvcc12.5           Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 nvcc12.8           Pass:   8%/36  | Total:  1d 04h | Avg: 46m 45s | Max:  1h 05m | Hits:  86%/3654  
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 nvcc               Pass:   6%/43  | Total:  1d 11h | Avg: 48m 58s | Max:  1h 05m | Hits:  86%/3654  
    🟨 cxx
      🟥 Clang14            Pass:   0%/4   | Total:  4h 02m | Avg:  1h 00m | Max:  1h 00m
      🟥 Clang15            Pass:   0%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 05m
      🟥 Clang16            Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 Clang17            Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 Clang18            Pass:   0%/7   | Total:  5h 02m | Avg: 43m 14s | Max:  1h 01m
      🟥 GCC7               Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 GCC8               Pass:   0%/1   | Total: 46m 15s | Avg: 46m 15s | Max: 46m 15s
      🟥 GCC9               Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 GCC10              Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 GCC11              Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟥 GCC12              Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
      🟨 GCC13              Pass:  27%/11  | Total:  5h 04m | Avg: 27m 42s | Max:  1h 05m | Hits:  86%/3654  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 59m | Avg: 59m 41s | Max: 59m 42s
      🟥 MSVC14.42          Pass:   0%/2   | Total:  1h 59m | Avg: 59m 37s | Max: 59m 38s
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
    🟨 cxx_family
      🟥 Clang              Pass:   0%/17  | Total: 15h 12m | Avg: 53m 39s | Max:  1h 05m
      🟨 GCC                Pass:  13%/22  | Total: 15h 54m | Avg: 43m 24s | Max:  1h 05m | Hits:  86%/3654  
      🟥 MSVC               Pass:   0%/4   | Total:  3h 58m | Avg: 59m 39s | Max: 59m 42s
      🟥 NVHPC              Pass:   0%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 00m
    🟨 jobs
      🟨 Build              Pass:   2%/37  | Total:  1d 12h | Avg: 58m 56s | Max:  1h 05m | Hits:  61%/1218  
      🟥 DeviceLaunch       Pass:   0%/1  
      🟥 GraphCapture       Pass:   0%/1  
      🟨 HostLaunch         Pass:  33%/3   | Total: 24m 03s | Avg:  8m 01s | Max: 24m 03s | Hits:  99%/1218  
      🟨 TestGPU            Pass:  33%/3   | Total: 21m 29s | Avg:  7m 09s | Max: 21m 29s | Hits:  99%/1218  
    🟨 std
      🟥 17                 Pass:   0%/20  | Total: 19h 57m | Avg: 59m 51s | Max:  1h 05m
      🟨 20                 Pass:  12%/25  | Total: 17h 09m | Avg: 41m 10s | Max:  1h 05m | Hits:  86%/3654  
    
  • 🟨 thrust: Pass: 46%/45 | Total: 18h 59m | Avg: 25m 19s | Max: 59m 28s | Hits: 55%/37328

    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 58m 06s | Avg: 29m 03s | Max: 30m 09s | Hits:  49%/3554  
      🔍 nvcc               Pass:  44%/43  | Total: 18h 01m | Avg: 25m 09s | Max: 59m 28s | Hits:  56%/33774 
    🟨 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 58m 06s | Avg: 29m 03s | Max: 30m 09s | Hits:  49%/3554  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  3h 12m | Avg: 38m 25s | Max: 59m 28s | Hits:  51%/7110  
      🟥 nvcc12.5           Pass:   0%/2   | Total:  1h 17m | Avg: 38m 53s | Max: 55m 03s
      🟨 nvcc12.8           Pass:  41%/36  | Total: 13h 31m | Avg: 22m 32s | Max: 59m 18s | Hits:  57%/26664 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 09m | Avg: 32m 22s | Max: 32m 38s | Hits:  57%/7108  
      🟨 Clang15            Pass:  50%/2   | Total: 44m 33s | Avg: 22m 16s | Max: 33m 02s | Hits:  49%/1777  
      🟨 Clang16            Pass:  50%/2   | Total: 44m 26s | Avg: 22m 13s | Max: 32m 43s | Hits:  49%/1777  
      🟨 Clang17            Pass:  50%/2   | Total: 47m 38s | Avg: 23m 49s | Max: 34m 47s | Hits:  49%/1777  
      🟨 Clang18            Pass:  42%/7   | Total:  1h 57m | Avg: 16m 46s | Max: 35m 43s | Hits:  51%/5331  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 05m | Avg: 32m 36s | Max: 32m 37s | Hits:  57%/3556  
      🟩 GCC8               Pass: 100%/1   | Total: 36m 50s | Avg: 36m 50s | Max: 36m 50s | Hits:  49%/1778  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 11m | Avg: 35m 45s | Max: 35m 58s | Hits:  59%/3556  
      🟨 GCC10              Pass:  50%/2   | Total: 47m 47s | Avg: 23m 53s | Max: 35m 11s | Hits:  49%/1778  
      🟨 GCC11              Pass:  50%/2   | Total: 49m 01s | Avg: 24m 30s | Max: 36m 39s | Hits:  49%/1778  
      🟨 GCC12              Pass:  50%/2   | Total: 47m 55s | Avg: 23m 57s | Max: 35m 38s | Hits:  49%/1778  
      🟨 GCC13              Pass:  30%/10  | Total:  2h 03m | Avg: 12m 22s | Max: 36m 10s | Hits:  66%/5334  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 58m | Avg: 59m 23s | Max: 59m 28s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  1h 57m | Avg: 39m 11s | Max: 59m 13s
      🟥 NVHPC24.7          Pass:   0%/2   | Total:  1h 17m | Avg: 38m 53s | Max: 55m 03s
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 19s | Avg: 20m 39s | Max: 30m 37s | Hits:  74%/3556  
    🟨 cpu
      🟨 amd64              Pass:  48%/43  | Total: 18h 36m | Avg: 25m 57s | Max: 59m 28s | Hits:  55%/37328 
      🟥 arm64              Pass:   0%/2   | Total: 23m 25s | Avg: 11m 42s | Max: 12m 17s
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  3h 12m | Avg: 38m 25s | Max: 59m 28s | Hits:  51%/7110  
      🟥 12.5               Pass:   0%/2   | Total:  1h 17m | Avg: 38m 53s | Max: 55m 03s
      🟨 12.8               Pass:  44%/38  | Total: 14h 29m | Avg: 22m 53s | Max: 59m 18s | Hits:  56%/30218 
    🟨 cxx_family
      🟨 Clang              Pass:  58%/17  | Total:  6h 23m | Avg: 22m 33s | Max: 35m 43s | Hits:  53%/17770 
      🟨 GCC                Pass:  52%/21  | Total:  7h 22m | Avg: 21m 02s | Max: 36m 50s | Hits:  57%/19558 
      🟥 MSVC               Pass:   0%/5   | Total:  3h 56m | Avg: 47m 15s | Max: 59m 28s
      🟥 NVHPC              Pass:   0%/2   | Total:  1h 17m | Avg: 38m 53s | Max: 55m 03s
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  8m 13s
      🟨 rtx2080            Pass:  57%/33  | Total: 16h 46m | Avg: 30m 30s | Max: 59m 28s | Hits:  53%/33772 
      🟨 rtx4090            Pass:  20%/10  | Total:  2h 04m | Avg: 12m 28s | Max: 58m 20s | Hits:  74%/3556  
    🟨 jobs
      🟨 Build              Pass:  52%/38  | Total: 18h 48m | Avg: 29m 42s | Max: 59m 28s | Hits:  53%/35550 
      🟥 TestCPU            Pass:   0%/3  
      🟨 TestGPU            Pass:  25%/4   | Total: 10m 42s | Avg:  2m 40s | Max: 10m 42s | Hits:  99%/1778  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  8m 13s
      🟥 90;90a;100         Pass:   0%/1   | Total: 13m 13s | Avg: 13m 13s | Max: 13m 13s
    🟨 std
      🟨 17                 Pass:  80%/20  | Total: 12h 59m | Avg: 38m 58s | Max: 59m 28s | Hits:  52%/28441 
      🟨 20                 Pass:  13%/23  | Total:  5h 18m | Avg: 13m 51s | Max: 58m 20s | Hits:  56%/5331  
    
  • 🟩 cudax: Pass: 100%/22 | Total: 2h 21m | Avg: 6m 25s | Max: 14m 05s | Hits: 94%/11744

    🟩 cpu
      🟩 amd64              Pass: 100%/18  | Total:  2h 05m | Avg:  6m 59s | Max: 14m 05s | Hits:  94%/9424  
      🟩 arm64              Pass: 100%/4   | Total: 15m 30s | Avg:  3m 52s | Max:  4m 04s | Hits:  96%/2320  
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s | Hits:  57%/278   
      🟩 12.5               Pass: 100%/2   | Total: 13m 11s | Avg:  6m 35s | Max:  6m 40s | Hits:  91%/744   
      🟩 12.8               Pass: 100%/19  | Total:  1h 55m | Avg:  6m 03s | Max: 14m 05s | Hits:  95%/10722 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s | Hits:  57%/278   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 13m 11s | Avg:  6m 35s | Max:  6m 40s | Hits:  91%/744   
      🟩 nvcc12.8           Pass: 100%/19  | Total:  1h 55m | Avg:  6m 03s | Max: 14m 05s | Hits:  95%/10722 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  2h 21m | Avg:  6m 25s | Max: 14m 05s | Hits:  94%/11744 
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s | Hits:  96%/582   
      🟩 Clang15            Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s | Hits:  96%/580   
      🟩 Clang16            Pass: 100%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s | Hits:  96%/580   
      🟩 Clang17            Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s | Hits:  96%/580   
      🟩 Clang18            Pass: 100%/4   | Total: 24m 16s | Avg:  6m 04s | Max: 12m 16s | Hits:  97%/2320  
      🟩 GCC10              Pass: 100%/1   | Total:  4m 13s | Avg:  4m 13s | Max:  4m 13s | Hits:  95%/582   
      🟩 GCC11              Pass: 100%/1   | Total:  4m 12s | Avg:  4m 12s | Max:  4m 12s | Hits:  95%/580   
      🟩 GCC12              Pass: 100%/2   | Total: 17m 24s | Avg:  8m 42s | Max: 12m 52s | Hits:  97%/1160  
      🟩 GCC13              Pass: 100%/6   | Total: 33m 20s | Avg:  5m 33s | Max: 14m 05s | Hits:  96%/3480  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 09s | Avg: 13m 09s | Max: 13m 09s | Hits:  57%/278   
      🟩 MSVC14.42          Pass: 100%/1   | Total: 13m 23s | Avg: 13m 23s | Max: 13m 23s | Hits:  57%/278   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 13m 11s | Avg:  6m 35s | Max:  6m 40s | Hits:  91%/744   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 42m 33s | Avg:  5m 19s | Max: 12m 16s | Hits:  96%/4642  
      🟩 GCC                Pass: 100%/10  | Total: 59m 09s | Avg:  5m 54s | Max: 14m 05s | Hits:  96%/5802  
      🟩 MSVC               Pass: 100%/2   | Total: 26m 32s | Avg: 13m 16s | Max: 13m 23s | Hits:  57%/556   
      🟩 NVHPC              Pass: 100%/2   | Total: 13m 11s | Avg:  6m 35s | Max:  6m 40s | Hits:  91%/744   
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max: 14m 05s | Hits:  97%/1160  
      🟩 rtx2080            Pass: 100%/20  | Total:  2h 03m | Avg:  6m 11s | Max: 13m 23s | Hits:  94%/10584 
    🟩 jobs
      🟩 Build              Pass: 100%/19  | Total:  1h 42m | Avg:  5m 22s | Max: 13m 23s | Hits:  93%/10004 
      🟩 Test               Pass: 100%/3   | Total: 39m 13s | Avg: 13m 04s | Max: 14m 05s | Hits:  99%/1740  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 21m 32s | Avg:  7m 10s | Max: 14m 05s | Hits:  97%/1740  
      🟩 90a                Pass: 100%/1   | Total:  3m 44s | Avg:  3m 44s | Max:  3m 44s | Hits:  95%/580   
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 18m 06s | Avg:  4m 31s | Max:  6m 31s | Hits:  95%/2112  
      🟩 20                 Pass: 100%/18  | Total:  2h 03m | Avg:  6m 51s | Max: 14m 05s | Hits:  94%/9632  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 16m 27s | Avg: 8m 13s | Max: 13m 54s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 16m 27s | Avg:  8m 13s | Max: 13m 54s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 33s | Avg:  2m 33s | Max:  2m 33s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 13m 54s | Avg: 13m 54s | Max: 13m 54s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 115)

# Runner
79 linux-amd64-cpu16
11 windows-amd64-cpu16
8 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-amd64-gpu-rtx2080-latest-1
4 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants