Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python wrappers for c.parallel scan API #3592

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Jan 29, 2025

Description

Closes #3458

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@shwina shwina requested review from a team as code owners January 29, 2025 23:06
Copy link

copy-pr-bot bot commented Jan 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@shwina shwina marked this pull request as draft January 29, 2025 23:06
@shwina shwina force-pushed the add-scan-python-wrappers branch 2 times, most recently from d126cba to d86ce97 Compare February 2, 2025 16:16
Copy link

copy-pr-bot bot commented Feb 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@shwina
Copy link
Contributor Author

shwina commented Feb 2, 2025

/ok to test

Copy link
Contributor

github-actions bot commented Feb 2, 2025

🟨 CI finished in 1h 05m: Pass: 96%/90 | Total: 14h 21m | Avg: 9m 34s | Max: 59m 10s | Hits: 413%/12742
  • 🟨 cub: Pass: 95%/44 | Total: 7h 32m | Avg: 10m 17s | Max: 59m 10s | Hits: 539%/3512

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/42  | Total:  7h 22m | Avg: 10m 32s | Max: 59m 10s | Hits: 539%/3512  
      🟩 arm64              Pass: 100%/2   | Total:  9m 41s | Avg:  4m 50s | Max:  5m 04s
    🔍 ctk: 12.8 🔍
      🟩 12.0               Pass: 100%/5   | Total: 45m 39s | Avg:  9m 07s | Max: 24m 38s | Hits: 539%/878   
      🟩 12.5               Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max:  9m 52s
      🔍 12.8               Pass:  94%/37  | Total:  6h 27m | Avg: 10m 28s | Max: 59m 10s | Hits: 539%/2634  
    🔍 cudacxx: nvcc12.8 🔍
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 29s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 39s | Avg:  9m 07s | Max: 24m 38s | Hits: 539%/878   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max:  9m 52s
      🔍 nvcc12.8           Pass:  94%/35  | Total:  6h 18m | Avg: 10m 48s | Max: 59m 10s | Hits: 539%/2634  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 29s
      🔍 nvcc               Pass:  95%/42  | Total:  7h 23m | Avg: 10m 33s | Max: 59m 10s | Hits: 539%/3512  
    🔍 gpu: rtxa6000 🔍
      🟩 h100               Pass: 100%/2   | Total: 27m 36s | Avg: 13m 48s | Max: 23m 20s
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 34m | Avg:  9m 49s | Max: 59m 10s | Hits: 539%/3512  
      🔍 rtxa6000           Pass:  75%/8   | Total:  1h 30m | Avg: 11m 21s | Max: 20m 03s
    🔍 jobs: HostLaunch 🔍
      🟩 Build              Pass: 100%/37  | Total:  5h 50m | Avg:  9m 27s | Max: 59m 10s | Hits: 539%/3512  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 03s | Avg: 20m 03s | Max: 20m 03s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 54s | Avg: 14m 54s | Max: 14m 54s
      🔍 HostLaunch         Pass:  33%/3   | Total: 28m 29s | Avg:  9m 29s | Max: 23m 20s
      🟩 TestGPU            Pass: 100%/2   | Total: 39m 08s | Avg: 19m 34s | Max: 19m 53s
    🔍 std: 20 🔍
      🟩 17                 Pass: 100%/20  | Total:  3h 50m | Avg: 11m 30s | Max: 59m 10s | Hits: 539%/2634  
      🔍 20                 Pass:  91%/24  | Total:  3h 42m | Avg:  9m 16s | Max: 27m 13s | Hits: 539%/878   
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 50s | Avg:  5m 27s | Max:  5m 55s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  5m 59s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 40s | Avg:  5m 50s | Max:  5m 53s
      🟩 Clang17            Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 03s
      🟨 Clang18            Pass:  85%/7   | Total: 47m 30s | Avg:  6m 47s | Max: 19m 53s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 20s | Avg:  5m 40s | Max:  6m 02s
      🟩 GCC8               Pass: 100%/1   | Total: 59m 10s | Avg: 59m 10s | Max: 59m 10s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  6m 00s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 47s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  5m 48s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 43s | Avg:  6m 21s | Max:  6m 24s
      🟨 GCC13              Pass:  90%/10  | Total:  1h 47m | Avg: 10m 44s | Max: 23m 20s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 49m 17s | Avg: 24m 38s | Max: 24m 39s | Hits: 539%/1756  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 54m 15s | Avg: 27m 07s | Max: 27m 13s | Hits: 539%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max:  9m 52s
    🟨 cxx_family
      🟨 Clang              Pass:  94%/17  | Total:  1h 44m | Avg:  6m 08s | Max: 19m 53s
      🟨 GCC                Pass:  95%/21  | Total:  3h 45m | Avg: 10m 42s | Max: 59m 10s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 43m | Avg: 25m 53s | Max: 27m 13s | Hits: 539%/3512  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 34s | Avg:  9m 47s | Max:  9m 52s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 27m 36s | Avg: 13m 48s | Max: 23m 20s
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 28s | Avg:  6m 28s | Max:  6m 28s
    
  • 🟥 python: Pass: 0%/1 | Total: 3m 13s | Avg: 3m 13s | Max: 3m 13s

    🟥 cpu
      🟥 amd64              Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 ctk
      🟥 12.8               Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 cudacxx
      🟥 nvcc12.8           Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 cudacxx_family
      🟥 nvcc               Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 cxx
      🟥 GCC13              Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 cxx_family
      🟥 GCC                Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 gpu
      🟥 rtx2080            Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟥 jobs
      🟥 Test               Pass:   0%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    
  • 🟩 thrust: Pass: 100%/43 | Total: 6h 35m | Avg: 9m 11s | Max: 32m 00s | Hits: 365%/9230

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 16m 59s | Avg:  8m 29s | Max: 11m 12s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 24m | Avg:  9m 23s | Max: 32m 00s | Hits: 365%/9230  
      🟩 arm64              Pass: 100%/2   | Total: 10m 17s | Avg:  5m 08s | Max:  5m 19s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 45m 27s | Avg:  9m 05s | Max: 24m 31s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 30m 51s | Avg: 15m 25s | Max: 15m 46s
      🟩 12.8               Pass: 100%/36  | Total:  5h 18m | Avg:  8m 51s | Max: 32m 00s | Hits: 365%/7384  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 20s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 45m 27s | Avg:  9m 05s | Max: 24m 31s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 30m 51s | Avg: 15m 25s | Max: 15m 46s
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 08m | Avg:  9m 04s | Max: 32m 00s | Hits: 365%/7384  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 32s | Avg:  5m 16s | Max:  5m 20s
      🟩 nvcc               Pass: 100%/41  | Total:  6h 24m | Avg:  9m 22s | Max: 32m 00s | Hits: 365%/9230  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 20m 58s | Avg:  5m 14s | Max:  5m 30s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 52s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 12s | Avg:  6m 06s | Max:  6m 07s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 16s | Avg:  5m 38s | Max:  5m 42s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 58s | Avg:  6m 25s | Max: 10m 26s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  6m 03s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 51s | Avg:  5m 51s | Max:  5m 51s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  5m 36s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 53s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 10s | Avg:  6m 05s | Max:  6m 19s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 46s | Avg:  6m 23s | Max:  6m 33s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 00m | Avg:  7m 37s | Max: 11m 29s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 50m 06s | Avg: 25m 03s | Max: 25m 35s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 27m | Avg: 29m 04s | Max: 32m 00s | Hits: 365%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 30m 51s | Avg: 15m 25s | Max: 15m 46s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 40m | Avg:  5m 56s | Max: 10m 26s
      🟩 GCC                Pass: 100%/19  | Total:  2h 05m | Avg:  6m 37s | Max: 11m 29s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 17m | Avg: 27m 28s | Max: 32m 00s | Hits: 365%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total: 30m 51s | Avg: 15m 25s | Max: 15m 46s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 27m | Avg:  8m 05s | Max: 26m 04s | Hits: 365%/5538  
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 07m | Avg: 12m 47s | Max: 32m 00s | Hits: 365%/3692  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 13m | Avg:  8m 29s | Max: 29m 10s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/3   | Total: 48m 03s | Avg: 16m 01s | Max: 32m 00s | Hits: 365%/1846  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 07s | Avg: 11m 02s | Max: 11m 29s
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  5m 57s | Avg:  5m 57s | Max:  5m 57s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 03m | Avg:  9m 11s | Max: 26m 04s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/21  | Total:  3h 14m | Avg:  9m 15s | Max: 32m 00s | Hits: 365%/3692  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 00s | Avg: 5m 00s | Max: 7m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 10m 00s | Avg:  5m 00s | Max:  7m 55s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 05s | Avg:  2m 05s | Max:  2m 05s
      🟩 Test               Pass: 100%/1   | Total:  7m 55s | Avg:  7m 55s | Max:  7m 55s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@shwina shwina force-pushed the add-scan-python-wrappers branch 2 times, most recently from d9db737 to 876a6f2 Compare February 3, 2025 19:51
@shwina shwina force-pushed the add-scan-python-wrappers branch from 876a6f2 to 6294d93 Compare February 3, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

cuda.parallel: Add Python wrappers for scan
1 participant