Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport to 2.8: Add b200 policies for cub.device.partition.flagged,if (#3617) #3736

Merged

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

Co-authored-by: gonidelis <ggonidelis@nvidia.com>
@bernhardmgruber bernhardmgruber marked this pull request as ready for review February 7, 2025 08:52
@bernhardmgruber bernhardmgruber requested review from a team as code owners February 7, 2025 08:52
@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Feb 7, 2025
@bernhardmgruber bernhardmgruber enabled auto-merge (squash) February 7, 2025 10:31
Copy link
Contributor

github-actions bot commented Feb 7, 2025

🟩 CI finished in 1h 31m: Pass: 100%/95 | Total: 2d 14h | Avg: 39m 28s | Max: 1h 11m | Hits: 293%/10540
  • 🟩 cub: Pass: 100%/47 | Total: 1d 14h | Avg: 49m 15s | Max: 1h 11m | Hits: 413%/3132

    🟩 cpu
      🟩 amd64              Pass: 100%/45  | Total:  1d 12h | Avg: 48m 49s | Max:  1h 11m | Hits: 413%/3132  
      🟩 arm64              Pass: 100%/2   | Total:  1h 57m | Avg: 58m 47s | Max:  1h 02m
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  5h 50m | Avg: 50m 06s | Max:  1h 06m | Hits: 413%/783   
      🟩 12.5               Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 12.6               Pass: 100%/38  | Total:  1d 06h | Avg: 48m 27s | Max:  1h 11m | Hits: 413%/2349  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 58m | Avg: 59m 19s | Max: 59m 55s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  5h 50m | Avg: 50m 06s | Max:  1h 06m | Hits: 413%/783   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
      🟩 nvcc12.6           Pass: 100%/36  | Total:  1d 04h | Avg: 47m 51s | Max:  1h 11m | Hits: 413%/2349  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 58m | Avg: 59m 19s | Max: 59m 55s
      🟩 nvcc               Pass: 100%/45  | Total:  1d 12h | Avg: 48m 48s | Max:  1h 11m | Hits: 413%/3132  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  3h 28m | Avg: 52m 08s | Max: 57m 34s
      🟩 Clang10            Pass: 100%/1   | Total: 53m 40s | Avg: 53m 40s | Max: 53m 40s
      🟩 Clang11            Pass: 100%/1   | Total: 53m 25s | Avg: 53m 25s | Max: 53m 25s
      🟩 Clang12            Pass: 100%/1   | Total: 57m 44s | Avg: 57m 44s | Max: 57m 44s
      🟩 Clang13            Pass: 100%/1   | Total: 52m 13s | Avg: 52m 13s | Max: 52m 13s
      🟩 Clang14            Pass: 100%/1   | Total: 52m 38s | Avg: 52m 38s | Max: 52m 38s
      🟩 Clang15            Pass: 100%/1   | Total: 51m 54s | Avg: 51m 54s | Max: 51m 54s
      🟩 Clang16            Pass: 100%/1   | Total: 52m 46s | Avg: 52m 46s | Max: 52m 46s
      🟩 Clang17            Pass: 100%/1   | Total: 52m 02s | Avg: 52m 02s | Max: 52m 02s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 30m | Avg: 47m 14s | Max:  1h 02m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 38m | Avg: 49m 10s | Max: 50m 12s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 54m | Avg: 57m 14s | Max: 57m 46s
      🟩 GCC8               Pass: 100%/1   | Total: 51m 43s | Avg: 51m 43s | Max: 51m 43s
      🟩 GCC9               Pass: 100%/3   | Total:  2h 31m | Avg: 50m 26s | Max: 59m 42s
      🟩 GCC10              Pass: 100%/1   | Total: 51m 39s | Avg: 51m 39s | Max: 51m 39s
      🟩 GCC11              Pass: 100%/1   | Total: 58m 29s | Avg: 58m 29s | Max: 58m 29s
      🟩 GCC12              Pass: 100%/3   | Total:  1h 44m | Avg: 34m 44s | Max: 54m 36s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 25m | Avg: 33m 12s | Max:  1h 01m
      🟩 Intel2023.2.0      Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m | Hits: 413%/783   
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits: 413%/783   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 11m | Hits: 413%/1566  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 16h 05m | Avg: 50m 49s | Max:  1h 02m
      🟩 GCC                Pass: 100%/21  | Total: 14h 55m | Avg: 42m 39s | Max:  1h 01m
      🟩 Intel              Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 29m | Avg:  1h 07m | Max:  1h 11m | Hits: 413%/3132  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 49m 36s | Avg: 24m 48s | Max: 25m 49s
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 50m | Avg: 28m 48s | Max:  1h 01m
      🟩 v100               Pass: 100%/37  | Total:  1d 09h | Avg: 54m 59s | Max:  1h 11m | Hits: 413%/3132  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total:  1d 12h | Avg: 54m 26s | Max:  1h 11m | Hits: 413%/3132  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 16m 20s | Avg: 16m 20s | Max: 16m 20s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 33s | Avg: 14m 33s | Max: 14m 33s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 07m | Avg: 22m 34s | Max: 23m 47s
      🟩 TestGPU            Pass: 100%/2   | Total: 38m 34s | Avg: 19m 17s | Max: 19m 35s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 49m 36s | Avg: 24m 48s | Max: 25m 49s
      🟩 90a                Pass: 100%/1   | Total: 25m 11s | Avg: 25m 11s | Max: 25m 11s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  4h 18m | Avg: 51m 46s | Max: 57m 34s
      🟩 14                 Pass: 100%/4   | Total:  3h 49m | Avg: 57m 22s | Max:  1h 06m | Hits: 413%/783   
      🟩 17                 Pass: 100%/12  | Total: 11h 12m | Avg: 56m 04s | Max:  1h 07m | Hits: 413%/1566  
      🟩 20                 Pass: 100%/26  | Total: 19h 13m | Avg: 44m 22s | Max:  1h 11m | Hits: 412%/783   
    
  • 🟩 thrust: Pass: 100%/45 | Total: 23h 21m | Avg: 31m 08s | Max: 1h 05m | Hits: 242%/7408

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 39m 06s | Avg: 19m 33s | Max: 28m 34s
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 22h 20m | Avg: 31m 10s | Max:  1h 05m | Hits: 242%/7408  
      🟩 arm64              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 42s | Max: 33m 42s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  3h 27m | Avg: 29m 41s | Max: 55m 29s | Hits: 244%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 46m | Avg: 53m 18s | Max: 53m 20s
      🟩 12.6               Pass: 100%/36  | Total: 18h 07m | Avg: 30m 12s | Max:  1h 05m | Hits: 241%/5556  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 55m 39s | Avg: 27m 49s | Max: 28m 21s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  3h 27m | Avg: 29m 41s | Max: 55m 29s | Hits: 244%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 46m | Avg: 53m 18s | Max: 53m 20s
      🟩 nvcc12.6           Pass: 100%/34  | Total: 17h 11m | Avg: 30m 20s | Max:  1h 05m | Hits: 241%/5556  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 55m 39s | Avg: 27m 49s | Max: 28m 21s
      🟩 nvcc               Pass: 100%/43  | Total: 22h 26m | Avg: 31m 18s | Max:  1h 05m | Hits: 242%/7408  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 45m | Avg: 26m 26s | Max: 29m 50s
      🟩 Clang10            Pass: 100%/1   | Total: 32m 34s | Avg: 32m 34s | Max: 32m 34s
      🟩 Clang11            Pass: 100%/1   | Total: 30m 25s | Avg: 30m 25s | Max: 30m 25s
      🟩 Clang12            Pass: 100%/1   | Total: 28m 18s | Avg: 28m 18s | Max: 28m 18s
      🟩 Clang13            Pass: 100%/1   | Total: 30m 05s | Avg: 30m 05s | Max: 30m 05s
      🟩 Clang14            Pass: 100%/1   | Total: 31m 13s | Avg: 31m 13s | Max: 31m 13s
      🟩 Clang15            Pass: 100%/1   | Total: 32m 22s | Avg: 32m 22s | Max: 32m 22s
      🟩 Clang16            Pass: 100%/1   | Total: 33m 17s | Avg: 33m 17s | Max: 33m 17s
      🟩 Clang17            Pass: 100%/1   | Total: 30m 25s | Avg: 30m 25s | Max: 30m 25s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 43m | Avg: 23m 21s | Max: 33m 00s
      🟩 GCC6               Pass: 100%/2   | Total: 49m 30s | Avg: 24m 45s | Max: 27m 31s
      🟩 GCC7               Pass: 100%/2   | Total: 59m 32s | Avg: 29m 46s | Max: 33m 42s
      🟩 GCC8               Pass: 100%/1   | Total: 28m 57s | Avg: 28m 57s | Max: 28m 57s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 30m | Avg: 30m 02s | Max: 35m 55s
      🟩 GCC10              Pass: 100%/1   | Total: 31m 08s | Avg: 31m 08s | Max: 31m 08s
      🟩 GCC11              Pass: 100%/1   | Total: 30m 45s | Avg: 30m 45s | Max: 30m 45s
      🟩 GCC12              Pass: 100%/1   | Total: 37m 32s | Avg: 37m 32s | Max: 37m 32s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 01m | Avg: 22m 41s | Max: 35m 33s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 39m 13s | Avg: 39m 13s | Max: 39m 13s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 55m 29s | Avg: 55m 29s | Max: 55m 29s | Hits: 244%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 52m 56s | Avg: 52m 56s | Max: 52m 56s | Hits: 240%/1852  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 05m | Hits: 241%/3704  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 46m | Avg: 53m 18s | Max: 53m 20s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  8h 37m | Avg: 27m 15s | Max: 33m 17s
      🟩 GCC                Pass: 100%/19  | Total:  8h 28m | Avg: 26m 47s | Max: 37m 32s
      🟩 Intel              Pass: 100%/1   | Total: 39m 13s | Avg: 39m 13s | Max: 39m 13s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 48m | Avg: 57m 14s | Max:  1h 05m | Hits: 242%/7408  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 46m | Avg: 53m 18s | Max: 53m 20s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  2h 23m | Avg: 17m 53s | Max: 35m 33s
      🟩 v100               Pass: 100%/37  | Total: 20h 58m | Avg: 34m 00s | Max:  1h 05m | Hits: 242%/7408  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 22h 35m | Avg: 33m 53s | Max:  1h 05m | Hits: 242%/7408  
      🟩 TestCPU            Pass: 100%/2   | Total: 14m 59s | Avg:  7m 29s | Max:  7m 43s
      🟩 TestGPU            Pass: 100%/3   | Total: 31m 01s | Avg: 10m 20s | Max: 10m 32s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 36s | Avg: 20m 36s | Max: 20m 36s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  2h 00m | Avg: 24m 06s | Max: 27m 16s
      🟩 14                 Pass: 100%/4   | Total:  2h 26m | Avg: 36m 38s | Max: 55m 29s | Hits: 244%/1852  
      🟩 17                 Pass: 100%/12  | Total:  7h 27m | Avg: 37m 15s | Max: 55m 14s | Hits: 241%/3704  
      🟩 20                 Pass: 100%/22  | Total: 10h 48m | Avg: 29m 28s | Max:  1h 05m | Hits: 240%/1852  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 6m 45s | Avg: 3m 22s | Max: 4m 41s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  6m 45s | Avg:  3m 22s | Max:  4m 41s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
      🟩 Test               Pass: 100%/1   | Total:  4m 41s | Avg:  4m 41s | Max:  4m 41s
    
  • 🟩 python: Pass: 100%/1 | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 27m 02s | Avg: 27m 02s | Max: 27m 02s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 95)

# Runner
71 linux-amd64-cpu16
8 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit 87b3dae into NVIDIA:branch/2.8.x Feb 7, 2025
110 checks passed
@bernhardmgruber bernhardmgruber deleted the backport_tune_partition branch February 7, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants