Add cuda.parallel.experimental.iterators._strided with NdArrayIterator #4072

oleksandr-pavlyk · 2025-03-10T15:41:45Z

The NdArrayIterator is an input iterator which traverses the array elements of a strided nd-array in the same order of corresponding flat array, but without making copies.

This iterator enables two test_segmented_reduce_api.py examples: "segmented-reduce-columnwise-maximum" and "segmented-reduce-multiaxis-sum".

Description

closes

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

The NdArrayIterator is an input iterator which traverses the array elements of a strided nd-array in the same order of corresponding flat array, but without making copies. This iterator enables two `test_segmented_reduce_api.py` examples: "segmented-reduce-columnwise-maximum" and "segmented-reduce-multiaxis-sum".

copy-pr-bot · 2025-03-10T15:41:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

oleksandr-pavlyk · 2025-03-10T15:42:02Z

/ok to test

github-actions · 2025-03-10T16:44:33Z

🟩 CI finished in 1h 00m: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 ctk
  🟩 12.8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 cudacxx
  🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 gpu
  🟩 rtx2080            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 1)

#	Runner
1	`linux-amd64-gpu-rtx2080-latest-1`

python/cuda_parallel/tests/test_segmented_reduce_api.py

leofang · 2025-03-10T18:32:48Z

python/cuda_parallel/cuda/parallel/experimental/iterators/_strided.py

+
+
+@lru_cache
+def strided_view_iterator_numba_type(value_type: types.Type, ndim: int):


This seems like a useful machinery that we could reuse for numba-cuda to recognize cuda.core.StridedMemoryView. Let's discuss this in an ongoing internal chat.

xref: NVIDIA/numba-cuda#153

…i new examples

…segmented_reduce_api

oleksandr-pavlyk · 2025-03-11T00:52:58Z

/ok to test

github-actions · 2025-03-11T01:56:55Z

🟩 CI finished in 1h 02m: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

🟩 cpu
  🟩 amd64              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 ctk
  🟩 12.8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 cudacxx
  🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 cudacxx_family
  🟩 nvcc               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 cxx
  🟩 GCC13              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 cxx_family
  🟩 GCC                Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 gpu
  🟩 rtx2080            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
🟩 jobs
  🟩 Test               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m

👃 Inspect Changes

Modifications in project?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

Modifications in project or dependencies?

	Project
	CCCL Infrastructure
	libcu++
	CUB
	Thrust
	CUDA Experimental
+/-	python
	CCCL C Parallel Library
	Catch2Helper

🏃‍ Runner counts (total jobs: 1)

#	Runner
1	`linux-amd64-gpu-rtx2080-latest-1`

leofang reviewed Mar 10, 2025

View reviewed changes

oleksandr-pavlyk added 3 commits March 10, 2025 14:09

Avoid unnecessary host-to-device transfer in test_segmented_reduce_ap…

b24fe79

…i new examples

Avoid unnecessary host-to-device transfers in prior examples in test_…

e527540

…segmented_reduce_api

Merge branch 'main' into add-strided-iterator

a1beb2d

leofang mentioned this pull request Mar 11, 2025

[FEA] Make cuda.core.utils.StridedMemoryView recognized by numba-cuda NVIDIA/numba-cuda#153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuda.parallel.experimental.iterators._strided with NdArrayIterator #4072

Add cuda.parallel.experimental.iterators._strided with NdArrayIterator #4072

oleksandr-pavlyk commented Mar 10, 2025 •

edited

Loading

copy-pr-bot bot commented Mar 10, 2025

oleksandr-pavlyk commented Mar 10, 2025

github-actions bot commented Mar 10, 2025

🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

leofang Mar 10, 2025

leofang Mar 10, 2025

oleksandr-pavlyk commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)



		@lru_cache
		def strided_view_iterator_numba_type(value_type: types.Type, ndim: int):

Add cuda.parallel.experimental.iterators._strided with NdArrayIterator #4072

Are you sure you want to change the base?

Add cuda.parallel.experimental.iterators._strided with NdArrayIterator #4072

Conversation

oleksandr-pavlyk commented Mar 10, 2025 • edited Loading

Description

Checklist

copy-pr-bot bot commented Mar 10, 2025

oleksandr-pavlyk commented Mar 10, 2025

github-actions bot commented Mar 10, 2025

🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

leofang Mar 10, 2025

Choose a reason for hiding this comment

leofang Mar 10, 2025

Choose a reason for hiding this comment

oleksandr-pavlyk commented Mar 11, 2025

github-actions bot commented Mar 11, 2025

🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

👃 Inspect Changes

Modifications in project?

Modifications in project or dependencies?

🏃‍ Runner counts (total jobs: 1)

oleksandr-pavlyk commented Mar 10, 2025 •

edited

Loading