Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variadic ctor and CTAD to zip_iterator #4113

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

Fixes: #4110

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner March 12, 2025 15:39
Comment on lines 201 to 202
template <typename... Iterators,
::cuda::std::enable_if_t<(::cuda::std::input_or_output_iterator<Iterators> && ...), int> = 0>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be SFINAEd by checking that we can actually construct the IterTuple

Copy link
Contributor

🟨 CI finished in 1h 20m: Pass: 5%/93 | Total: 1d 08h | Avg: 21m 06s | Max: 1h 18m | Hits: 72%/2665
  • 🟥 thrust: Pass: 0%/45 | Total: 6h 09m | Avg: 8m 12s | Max: 1h 05m

    🟥 cmake_options
      🟥 -DTHRUST_DISPATCH_TYPE=Force32bit Pass:   0%/2   | Total:  4m 56s | Avg:  2m 28s | Max:  4m 56s
    🟥 cpu
      🟥 amd64              Pass:   0%/43  | Total:  5h 59m | Avg:  8m 22s | Max:  1h 05m
      🟥 arm64              Pass:   0%/2   | Total:  9m 42s | Avg:  4m 51s | Max:  4m 56s
    🟥 ctk
      🟥 12.0               Pass:   0%/5   | Total:  1h 07m | Avg: 13m 35s | Max: 48m 47s
      🟥 12.5               Pass:   0%/2   | Total: 15m 24s | Avg:  7m 42s | Max:  8m 19s
      🟥 12.8               Pass:   0%/38  | Total:  4h 46m | Avg:  7m 31s | Max:  1h 05m
    🟥 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  6m 05s | Avg:  3m 02s | Max:  3m 05s
      🟥 nvcc12.0           Pass:   0%/5   | Total:  1h 07m | Avg: 13m 35s | Max: 48m 47s
      🟥 nvcc12.5           Pass:   0%/2   | Total: 15m 24s | Avg:  7m 42s | Max:  8m 19s
      🟥 nvcc12.8           Pass:   0%/36  | Total:  4h 40m | Avg:  7m 46s | Max:  1h 05m
    🟥 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  6m 05s | Avg:  3m 02s | Max:  3m 05s
      🟥 nvcc               Pass:   0%/43  | Total:  6h 03m | Avg:  8m 27s | Max:  1h 05m
    🟥 cxx
      🟥 Clang14            Pass:   0%/4   | Total: 18m 52s | Avg:  4m 43s | Max:  4m 55s
      🟥 Clang15            Pass:   0%/2   | Total:  9m 12s | Avg:  4m 36s | Max:  4m 39s
      🟥 Clang16            Pass:   0%/2   | Total:  9m 24s | Avg:  4m 42s | Max:  4m 46s
      🟥 Clang17            Pass:   0%/2   | Total:  9m 26s | Avg:  4m 43s | Max:  4m 50s
      🟥 Clang18            Pass:   0%/7   | Total: 20m 14s | Avg:  2m 53s | Max:  4m 46s
      🟥 GCC7               Pass:   0%/2   | Total:  9m 05s | Avg:  4m 32s | Max:  4m 46s
      🟥 GCC8               Pass:   0%/1   | Total:  4m 31s | Avg:  4m 31s | Max:  4m 31s
      🟥 GCC9               Pass:   0%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  4m 44s
      🟥 GCC10              Pass:   0%/2   | Total:  9m 08s | Avg:  4m 34s | Max:  4m 44s
      🟥 GCC11              Pass:   0%/2   | Total:  8m 58s | Avg:  4m 29s | Max:  4m 40s
      🟥 GCC12              Pass:   0%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 50s
      🟥 GCC13              Pass:   0%/10  | Total: 27m 43s | Avg:  2m 46s | Max:  5m 38s
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 37m | Avg: 48m 53s | Max: 48m 59s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  1h 51m | Avg: 37m 05s | Max:  1h 05m
      🟥 NVHPC24.7          Pass:   0%/2   | Total: 15m 24s | Avg:  7m 42s | Max:  8m 19s
    🟥 cxx_family
      🟥 Clang              Pass:   0%/17  | Total:  1h 07m | Avg:  3m 56s | Max:  4m 55s
      🟥 GCC                Pass:   0%/21  | Total:  1h 17m | Avg:  3m 42s | Max:  5m 38s
      🟥 MSVC               Pass:   0%/5   | Total:  3h 29m | Avg: 41m 48s | Max:  1h 05m
      🟥 NVHPC              Pass:   0%/2   | Total: 15m 24s | Avg:  7m 42s | Max:  8m 19s
    🟥 gpu
      🟥 h100               Pass:   0%/2   | Total:  3m 19s | Avg:  1m 39s | Max:  3m 19s
      🟥 rtx2080            Pass:   0%/33  | Total:  4h 46m | Avg:  8m 40s | Max: 48m 59s
      🟥 rtx4090            Pass:   0%/10  | Total:  1h 19m | Avg:  7m 59s | Max:  1h 05m
    🟥 jobs
      🟥 Build              Pass:   0%/38  | Total:  6h 09m | Avg:  9m 43s | Max:  1h 05m
      🟥 TestCPU            Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/4  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  3m 19s | Avg:  1m 39s | Max:  3m 19s
      🟥 90;90a;100         Pass:   0%/1   | Total:  4m 22s | Avg:  4m 22s | Max:  4m 22s
    🟥 std
      🟥 17                 Pass:   0%/20  | Total:  3h 45m | Avg: 11m 15s | Max: 48m 59s
      🟥 20                 Pass:   0%/23  | Total:  2h 19m | Avg:  6m 03s | Max:  1h 05m
    
  • 🟨 cub: Pass: 4%/45 | Total: 1d 01h | Avg: 33m 38s | Max: 1h 18m | Hits: 69%/2345

    🟨 cpu
      🟨 amd64              Pass:   4%/43  | Total: 23h 49m | Avg: 33m 14s | Max:  1h 18m | Hits:  69%/2345  
      🟥 arm64              Pass:   0%/2   | Total:  1h 24m | Avg: 42m 00s | Max: 42m 45s
    🟨 ctk
      🟥 12.0               Pass:   0%/5   | Total:  3h 28m | Avg: 41m 46s | Max:  1h 05m
      🟨 12.5               Pass:  50%/2   | Total:  1h 46m | Avg: 53m 10s | Max:  1h 11m | Hits:  69%/1127  
      🟨 12.8               Pass:   2%/38  | Total: 19h 58m | Avg: 31m 32s | Max:  1h 18m | Hits:  69%/1218  
    🟨 cudacxx
      🟥 ClangCUDA18        Pass:   0%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
      🟥 nvcc12.0           Pass:   0%/5   | Total:  3h 28m | Avg: 41m 46s | Max:  1h 05m
      🟨 nvcc12.5           Pass:  50%/2   | Total:  1h 46m | Avg: 53m 10s | Max:  1h 11m | Hits:  69%/1127  
      🟨 nvcc12.8           Pass:   2%/36  | Total: 17h 56m | Avg: 29m 54s | Max:  1h 18m | Hits:  69%/1218  
    🟨 cudacxx_family
      🟥 ClangCUDA          Pass:   0%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
      🟨 nvcc               Pass:   4%/43  | Total: 23h 11m | Avg: 32m 21s | Max:  1h 18m | Hits:  69%/2345  
    🟨 cxx
      🟥 Clang14            Pass:   0%/4   | Total:  2h 20m | Avg: 35m 11s | Max: 36m 51s
      🟥 Clang15            Pass:   0%/2   | Total:  1h 07m | Avg: 33m 36s | Max: 35m 05s
      🟥 Clang16            Pass:   0%/2   | Total:  1h 05m | Avg: 32m 55s | Max: 34m 02s
      🟥 Clang17            Pass:   0%/2   | Total:  1h 09m | Avg: 34m 49s | Max: 36m 05s
      🟥 Clang18            Pass:   0%/7   | Total:  3h 48m | Avg: 32m 36s | Max:  1h 01m
      🟥 GCC7               Pass:   0%/2   | Total:  1h 05m | Avg: 32m 58s | Max: 33m 22s
      🟥 GCC8               Pass:   0%/1   | Total: 35m 09s | Avg: 35m 09s | Max: 35m 09s
      🟥 GCC9               Pass:   0%/2   | Total:  1h 10m | Avg: 35m 14s | Max: 36m 34s
      🟥 GCC10              Pass:   0%/2   | Total:  1h 05m | Avg: 32m 32s | Max: 32m 38s
      🟥 GCC11              Pass:   0%/2   | Total:  1h 06m | Avg: 33m 25s | Max: 33m 33s
      🟥 GCC12              Pass:   0%/2   | Total:  1h 05m | Avg: 32m 31s | Max: 32m 58s
      🟨 GCC13              Pass:   9%/11  | Total:  3h 02m | Avg: 16m 37s | Max: 56m 26s | Hits:  69%/1218  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  1h 10m
      🟥 MSVC14.42          Pass:   0%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 18m
      🟨 NVHPC24.7          Pass:  50%/2   | Total:  1h 46m | Avg: 53m 10s | Max:  1h 11m | Hits:  69%/1127  
    🟨 cxx_family
      🟥 Clang              Pass:   0%/17  | Total:  9h 31m | Avg: 33m 37s | Max:  1h 01m
      🟨 GCC                Pass:   4%/22  | Total:  9h 11m | Avg: 25m 03s | Max: 56m 26s | Hits:  69%/1218  
      🟥 MSVC               Pass:   0%/4   | Total:  4h 44m | Avg:  1h 11m | Max:  1h 18m
      🟨 NVHPC              Pass:  50%/2   | Total:  1h 46m | Avg: 53m 10s | Max:  1h 11m | Hits:  69%/1127  
    🟨 gpu
      🟥 h100               Pass:   0%/3   | Total: 13m 20s | Avg:  4m 26s | Max: 13m 20s
      🟨 rtx2080            Pass:   5%/34  | Total: 23h 54m | Avg: 42m 12s | Max:  1h 18m | Hits:  69%/2345  
      🟥 rtxa6000           Pass:   0%/8   | Total:  1h 05m | Avg:  8m 09s | Max: 32m 39s
    🟨 jobs
      🟨 Build              Pass:   5%/37  | Total:  1d 01h | Avg: 40m 54s | Max:  1h 18m | Hits:  69%/2345  
      🟥 DeviceLaunch       Pass:   0%/1  
      🟥 GraphCapture       Pass:   0%/1  
      🟥 HostLaunch         Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/3  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total: 13m 20s | Avg:  4m 26s | Max: 13m 20s
      🟥 90;90a;100         Pass:   0%/1   | Total: 37m 42s | Avg: 37m 42s | Max: 37m 42s
    🟨 std
      🟨 17                 Pass:  10%/20  | Total: 14h 33m | Avg: 43m 41s | Max:  1h 11m | Hits:  69%/2345  
      🟥 20                 Pass:   0%/25  | Total: 10h 39m | Avg: 25m 35s | Max:  1h 18m
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 18m 01s | Avg: 9m 00s | Max: 15m 31s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 18m 01s | Avg:  9m 00s | Max: 15m 31s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 30s | Avg:  2m 30s | Max:  2m 30s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 15m 31s | Avg: 15m 31s | Max: 15m 31s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 02m | Avg: 1h 02m | Max: 1h 02m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Comment on lines 204 to 215
template <
typename... Iterators,
::cuda::std::enable_if_t<!(::cuda::std::is_same_v<::cuda::std::decay_t<Iterators>, zip_iterator> || ...), int> = 0,
::cuda::std::enable_if_t<(::cuda::std::input_or_output_iterator<::cuda::std::remove_cvref_t<Iterators>> && ...)
// FIXME(bgruber): Adding this constraint causes a difficult compilation error:
// &&
// ::cuda::std::is_constructible_v<::cuda::tuple<::cuda::std::remove_cvref_t<Iterators>...>,
// Iterators...>
,
int> = 0>
_CCCL_HOST_DEVICE zip_iterator(Iterators&&... iterators)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco I think I need your help with the constraint here. thrust.test.cuda.inner_product.cdp_1 compiles without the constraint, but fails to compile with it. The error is:

/home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/detail/libcxx/include/tuple(663): error: incomplete type "cuda::std::__4::__tuple_constructible<cuda::std::__4::tuple<const cuda::std::__4::remove_volatile<cuda::std::__4::remove_const_t<cuda::std::__4::remove_reference_t<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<std::decay<cuda::std::__4::multiplies<int> &>::type>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<cuda::std::__4::remove_reference<thrust::THRUST_300000_SM_860_NS::detail::vector_base<int, thrust::THRUST_300000_SM_860_NS::device_allocator<int>>::iterator &>::type, cuda::std::__4::remove_reference<thrust::THRUST_300000_SM_860_NS::detail::vector_base<int, thrust::THRUST_300000_SM_860_NS::device_allocator<int>>::iterator &>::type>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>>>::type &>, cuda::std::__4::tuple<cuda::std::__4::remove_volatile<cuda::std::__4::remove_const_t<cuda::std::__4::remove_reference_t<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<std::decay<cuda::std::__4::multiplies<int> &>::type>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<cuda::std::__4::remove_reference<thrust::THRUST_300000_SM_860_NS::detail::vector_base<int, thrust::THRUST_300000_SM_860_NS::device_allocator<int>>::iterator &>::type, cuda::std::__4::remove_reference<thrust::THRUST_300000_SM_860_NS::detail::vector_base<int, thrust::THRUST_300000_SM_860_NS::device_allocator<int>>::iterator &>::type>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>>>::type>, true, true>" (aka "cuda::std::__4::__tuple_constructible<cuda::std::__4::tuple<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>, cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>, true, true>") is not allowed
        __tuple_constructible<tuple<_Args...>, tuple<_Tp...>>::value
        ^
          detected during:
            instantiation of class "cuda::std::__4::__tuple_constraints<_Tp...>::__variadic_constraints<_Args...> [with _Tp=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>, _Args=<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>]" at line 157 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__type_traits/is_constructible.h
            instantiation of "const __nv_bool cuda::std::__4::is_constructible_v [with _Tp=cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>, _Args=<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>]" at line 152 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__type_traits/is_constructible.h
            instantiation of class "cuda::std::__4::is_constructible<_Tp, _Args...> [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, _Args=<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>]" at line 65 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__tuple_dir/sfinae_helpers.h
            instantiation of class "cuda::std::__4::__tuple_sfinae_base::__test<_Trait, cuda::std::__4::__tuple_types<_LArgs...>, cuda::std::__4::__tuple_types<_RArgs...>, true> [with _Trait=cuda::std::__4::is_constructible, _LArgs=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>, _RArgs=<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>]" at line 101 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__tuple_dir/sfinae_helpers.h
            instantiation of class "cuda::std::__4::__tuple_constructible<_Tp, _Up, true, true> [with _Tp=cuda::std::__4::tuple<const thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default> &>, _Up=cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>]" at line 641
            instantiation of class "cuda::std::__4::__tuple_constraints<_Tp...> [with _Tp=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>]" at line 157 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__type_traits/is_constructible.h
            instantiation of "const __nv_bool cuda::std::__4::is_constructible_v [with _Tp=cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>, _Args=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>]" at line 157 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__type_traits/is_constructible.h
            instantiation of "const __nv_bool cuda::std::__4::is_constructible_v [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, _Args=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>]" at line 62 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__concepts/constructible.h
            instantiation of "const __nv_bool cuda::std::__4::constructible_from [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, _Args=<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>]" at line 83 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__concepts/constructible.h
            instantiation of "const __nv_bool cuda::std::__4::move_constructible [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 97 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__concepts/constructible.h
            instantiation of "const __nv_bool cuda::std::__4::copy_constructible [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 50 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__concepts/copyable.h
            instantiation of "const __nv_bool cuda::std::__4::copyable [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 534 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__iterator/iterator_traits.h
            instantiation of "const __nv_bool cuda::std::__4::__iterator_traits_detail::__cpp17_iterator [with _Ip=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 634 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__iterator/iterator_traits.h
            instantiation of "const __nv_bool cuda::std::__4::__cpp17_iterator_missing_members [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 638 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__iterator/iterator_traits.h
            instantiation of "const __nv_bool cuda::std::__4::__cpp17_input_iterator_missing_members [with _Tp=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>]" at line 803 of /home/bgruber/dev/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/std/__iterator/iterator_traits.h
            instantiation of class "cuda::std::__4::iterator_traits<_Ip, <unnamed>> [with _Ip=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, <unnamed>=void]" at line 89 of /home/bgruber/dev/cccl/lib/cmake/cub/../../../cub/cub/util_type.cuh
            instantiation of type "cub::CUB_300000_SM_860::detail::it_value_t<thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>>" at line 139 of /home/bgruber/dev/cccl/lib/cmake/cub/../../../cub/cub/device/dispatch/dispatch_reduce.cuh
            processing of template argument list for "cub::CUB_300000_SM_860::DispatchReduce" based on template arguments <thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, int *, OffsetT, cuda::std::__4::plus<int>, int> at line 227 of /home/bgruber/dev/cccl/lib/cmake/cub/../../../cub/cub/device/device_reduce.cuh
            instantiation of "cudaError_t cub::CUB_300000_SM_860::DeviceReduce::Reduce(void *, size_t &, InputIteratorT, OutputIteratorT, NumItemsT, ReductionOpT, T, cudaStream_t) [with InputIteratorT=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, OutputIteratorT=int *, ReductionOpT=cuda::std::__4::plus<int>, T=int, NumItemsT=int32_t]" at line 793 of /home/bgruber/dev/cccl/lib/cmake/thrust/../../../thrust/thrust/system/cuda/detail/reduce.h
            instantiation of "T thrust::THRUST_300000_SM_860_NS::cuda_cub::detail::reduce_n_impl(thrust::THRUST_300000_SM_860_NS::cuda_cub::execution_policy<Derived> &, InputIt, Size, T, BinaryOp) [with Derived=thrust::THRUST_300000_SM_860_NS::cuda_cub::par_t, InputIt=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, Size=ptrdiff_t, T=int, BinaryOp=cuda::std::__4::plus<int>]" at line 850 of /home/bgruber/dev/cccl/lib/cmake/thrust/../../../thrust/thrust/system/cuda/detail/reduce.h
            instantiation of "T thrust::THRUST_300000_SM_860_NS::cuda_cub::reduce_n(thrust::THRUST_300000_SM_860_NS::cuda_cub::execution_policy<Derived> &, InputIt, Size, T, BinaryOp) [with Derived=thrust::THRUST_300000_SM_860_NS::cuda_cub::par_t, InputIt=thrust::THRUST_300000_SM_860_NS::transform_iterator<thrust::THRUST_300000_SM_860_NS::zip_function<cuda::std::__4::multiplies<int>>, thrust::THRUST_300000_SM_860_NS::zip_iterator<cuda::std::__4::tuple<thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>>>, thrust::THRUST_300000_SM_860_NS::use_default, thrust::THRUST_300000_SM_860_NS::use_default>, Size=ptrdiff_t, T=int, BinaryOp=cuda::std::__4::plus<int>]" at line 62 of /home/bgruber/dev/cccl/lib/cmake/thrust/../../../thrust/thrust/system/cuda/detail/inner_product.h
            instantiation of "T thrust::THRUST_300000_SM_860_NS::cuda_cub::inner_product(thrust::THRUST_300000_SM_860_NS::cuda_cub::execution_policy<Derived> &, InputIt1, InputIt1, InputIt2, T, ReduceOp, ProductOp) [with Derived=thrust::THRUST_300000_SM_860_NS::cuda_cub::par_t, InputIt1=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, InputIt2=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, T=int, ReduceOp=cuda::std::__4::plus<int>, ProductOp=cuda::std::__4::multiplies<int>]" at line 69 of /home/bgruber/dev/cccl/lib/cmake/thrust/../../../thrust/thrust/system/cuda/detail/inner_product.h
            instantiation of "T thrust::THRUST_300000_SM_860_NS::cuda_cub::inner_product(thrust::THRUST_300000_SM_860_NS::cuda_cub::execution_policy<Derived> &, InputIt1, InputIt1, InputIt2, T) [with Derived=thrust::THRUST_300000_SM_860_NS::cuda_cub::par_t, InputIt1=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, InputIt2=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, T=int]" at line 46 of /home/bgruber/dev/cccl/lib/cmake/thrust/../../../thrust/thrust/detail/inner_product.inl
            instantiation of "OutputType thrust::THRUST_300000_SM_860_NS::inner_product(const thrust::THRUST_300000_SM_860_NS::detail::execution_policy_base<DerivedPolicy> &, InputIterator1, InputIterator1, InputIterator2, OutputType) [with DerivedPolicy=thrust::THRUST_300000_SM_860_NS::cuda_cub::par_t, InputIterator1=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, InputIterator2=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, OutputType=int]" at line 11 of /home/bgruber/dev/cccl/thrust/testing/cuda/inner_product.cu
            instantiation of "void inner_product_kernel(ExecutionPolicy, Iterator1, Iterator1, Iterator2, T, Iterator3) [with ExecutionPolicy=thrust::THRUST_300000_SM_860_NS::detail::device_t, Iterator1=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, Iterator2=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>, T=int, Iterator3=thrust::THRUST_300000_SM_860_NS::detail::normal_iterator<thrust::THRUST_300000_SM_860_NS::device_ptr<int>>]" at line 31 of /home/bgruber/dev/cccl/thrust/testing/cuda/inner_product.cu

Copy link
Contributor

🟨 CI finished in 1h 25m: Pass: 38%/93 | Total: 1d 22h | Avg: 30m 11s | Max: 1h 23m | Hits: 58%/49225
  • 🟨 thrust: Pass: 35%/45 | Total: 14h 32m | Avg: 19m 22s | Max: 1h 12m | Hits: 50%/28440

    🟨 cxx
      🟨 Clang14            Pass:  50%/4   | Total:  1h 14m | Avg: 18m 38s | Max: 32m 34s | Hits:  52%/3554  
      🟨 Clang15            Pass:  50%/2   | Total: 38m 14s | Avg: 19m 07s | Max: 33m 21s | Hits:  49%/1777  
      🟨 Clang16            Pass:  50%/2   | Total: 39m 04s | Avg: 19m 32s | Max: 34m 32s | Hits:  48%/1777  
      🟨 Clang17            Pass:  50%/2   | Total: 37m 03s | Avg: 18m 31s | Max: 32m 34s | Hits:  48%/1777  
      🟨 Clang18            Pass:  28%/7   | Total:  1h 15m | Avg: 10m 45s | Max: 34m 45s | Hits:  51%/3554  
      🟥 GCC7               Pass:   0%/2   | Total: 11m 05s | Avg:  5m 32s | Max:  5m 48s
      🟥 GCC8               Pass:   0%/1   | Total:  5m 20s | Avg:  5m 20s | Max:  5m 20s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 09m | Avg: 34m 39s | Max: 35m 04s | Hits:  61%/3556  
      🟨 GCC10              Pass:  50%/2   | Total: 36m 56s | Avg: 18m 28s | Max: 32m 31s | Hits:  48%/1778  
      🟨 GCC11              Pass:  50%/2   | Total: 38m 46s | Avg: 19m 23s | Max: 34m 13s | Hits:  48%/1778  
      🟨 GCC12              Pass:  50%/2   | Total: 40m 20s | Avg: 20m 10s | Max: 35m 46s | Hits:  48%/1778  
      🟨 GCC13              Pass:  30%/10  | Total:  2h 05m | Avg: 12m 30s | Max: 39m 23s | Hits:  48%/5334  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  1h 31m | Avg: 45m 47s | Max: 46m 12s
      🟥 MSVC14.42          Pass:   0%/3   | Total:  1h 49m | Avg: 36m 27s | Max:  1h 02m
      🟨 NVHPC24.7          Pass:  50%/2   | Total:  1h 20m | Avg: 40m 00s | Max:  1h 12m | Hits:  31%/1777  
    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total:  1h 09m | Avg: 34m 52s | Max: 39m 23s | Hits:  48%/3556  
    🟨 cpu
      🟨 amd64              Pass:  37%/43  | Total: 14h 22m | Avg: 20m 03s | Max:  1h 12m | Hits:  50%/28440 
      🟥 arm64              Pass:   0%/2   | Total:  9m 43s | Avg:  4m 51s | Max:  4m 56s
    🟨 ctk
      🟨 12.0               Pass:  40%/5   | Total:  2h 03m | Avg: 24m 44s | Max: 45m 23s | Hits:  62%/3555  
      🟨 12.5               Pass:  50%/2   | Total:  1h 20m | Avg: 40m 00s | Max:  1h 12m | Hits:  31%/1777  
      🟨 12.8               Pass:  34%/38  | Total: 11h 08m | Avg: 17m 35s | Max:  1h 02m | Hits:  49%/23108 
    🟨 cudacxx
      🟨 ClangCUDA18        Pass:  50%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 28m 15s | Hits:  49%/1777  
      🟨 nvcc12.0           Pass:  40%/5   | Total:  2h 03m | Avg: 24m 44s | Max: 45m 23s | Hits:  62%/3555  
      🟨 nvcc12.5           Pass:  50%/2   | Total:  1h 20m | Avg: 40m 00s | Max:  1h 12m | Hits:  31%/1777  
      🟨 nvcc12.8           Pass:  33%/36  | Total: 10h 37m | Avg: 17m 41s | Max:  1h 02m | Hits:  49%/21331 
    🟨 cudacxx_family
      🟨 ClangCUDA          Pass:  50%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 28m 15s | Hits:  49%/1777  
      🟨 nvcc               Pass:  34%/43  | Total: 14h 00m | Avg: 19m 33s | Max:  1h 12m | Hits:  50%/26663 
    🟨 cxx_family
      🟨 Clang              Pass:  41%/17  | Total:  4h 24m | Avg: 15m 32s | Max: 34m 45s | Hits:  50%/12439 
      🟨 GCC                Pass:  38%/21  | Total:  5h 26m | Avg: 15m 33s | Max: 39m 23s | Hits:  52%/14224 
      🟥 MSVC               Pass:   0%/5   | Total:  3h 20m | Avg: 40m 11s | Max:  1h 02m
      🟨 NVHPC              Pass:  50%/2   | Total:  1h 20m | Avg: 40m 00s | Max:  1h 12m | Hits:  31%/1777  
    🟨 gpu
      🟥 h100               Pass:   0%/2   | Total:  3m 26s | Avg:  1m 43s | Max:  3m 26s
      🟨 rtx2080            Pass:  42%/33  | Total: 12h 06m | Avg: 22m 01s | Max:  1h 12m | Hits:  50%/24884 
      🟨 rtx4090            Pass:  20%/10  | Total:  2h 21m | Avg: 14m 11s | Max:  1h 02m | Hits:  48%/3556  
    🟨 jobs
      🟨 Build              Pass:  39%/38  | Total: 13h 52m | Avg: 21m 54s | Max:  1h 12m | Hits:  50%/26662 
      🟥 TestCPU            Pass:   0%/3  
      🟨 TestGPU            Pass:  25%/4   | Total: 39m 23s | Avg:  9m 50s | Max: 39m 23s | Hits:  48%/1778  
    🟥 sm
      🟥 90                 Pass:   0%/2   | Total:  3m 26s | Avg:  1m 43s | Max:  3m 26s
      🟥 90;90a;100         Pass:   0%/1   | Total:  4m 41s | Avg:  4m 41s | Max:  4m 41s
    🟨 std
      🟨 17                 Pass:  70%/20  | Total: 11h 05m | Avg: 33m 16s | Max:  1h 12m | Hits:  50%/24884 
      🟥 20                 Pass:   0%/23  | Total:  2h 16m | Avg:  5m 57s | Max:  1h 02m
    
  • 🟨 cub: Pass: 37%/45 | Total: 1d 06h | Avg: 41m 18s | Max: 1h 23m | Hits: 69%/20465

    🟨 cxx
      🟨 Clang14            Pass:  50%/4   | Total:  2h 59m | Avg: 44m 54s | Max: 57m 09s | Hits:  69%/2440  
      🟨 Clang15            Pass:  50%/2   | Total:  1h 27m | Avg: 43m 42s | Max: 55m 30s | Hits:  69%/1218  
      🟨 Clang16            Pass:  50%/2   | Total:  1h 30m | Avg: 45m 00s | Max: 56m 09s | Hits:  69%/1218  
      🟨 Clang17            Pass:  50%/2   | Total:  1h 29m | Avg: 44m 36s | Max: 56m 47s | Hits:  69%/1218  
      🟨 Clang18            Pass:  28%/7   | Total:  4h 10m | Avg: 35m 45s | Max:  1h 01m | Hits:  72%/2270  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 56m | Avg: 58m 08s | Max: 59m 38s | Hits:  69%/2440  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m | Hits:  69%/1220  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 53s | Max:  1h 00m | Hits:  69%/2440  
      🟨 GCC10              Pass:  50%/2   | Total:  1h 36m | Avg: 48m 00s | Max:  1h 02m | Hits:  69%/1220  
      🟨 GCC11              Pass:  50%/2   | Total:  1h 31m | Avg: 45m 56s | Max: 59m 52s | Hits:  69%/1218  
      🟨 GCC12              Pass:  50%/2   | Total:  1h 31m | Avg: 45m 48s | Max: 58m 55s | Hits:  69%/1218  
      🟨 GCC13              Pass:   9%/11  | Total:  3h 08m | Avg: 17m 07s | Max: 56m 23s | Hits:  69%/1218  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 12m
      🟥 MSVC14.42          Pass:   0%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 23m
      🟨 NVHPC24.7          Pass:  50%/2   | Total:  1h 43m | Avg: 51m 52s | Max:  1h 06m | Hits:  69%/1127  
    🟨 cpu
      🟨 amd64              Pass:  39%/43  | Total:  1d 05h | Avg: 41m 18s | Max:  1h 23m | Hits:  69%/20465 
      🟥 arm64              Pass:   0%/2   | Total:  1h 22m | Avg: 41m 11s | Max: 41m 48s
    🟨 ctk
      🟨 12.0               Pass:  60%/5   | Total:  4h 37m | Avg: 55m 33s | Max:  1h 07m | Hits:  69%/3660  
      🟨 12.5               Pass:  50%/2   | Total:  1h 43m | Avg: 51m 52s | Max:  1h 06m | Hits:  69%/1127  
      🟨 12.8               Pass:  34%/38  | Total:  1d 00h | Avg: 38m 52s | Max:  1h 23m | Hits:  69%/15678 
    🟨 cudacxx
      🟨 ClangCUDA18        Pass:  50%/2   | Total:  1h 59m | Avg: 59m 54s | Max:  1h 01m | Hits:  75%/1052  
      🟨 nvcc12.0           Pass:  60%/5   | Total:  4h 37m | Avg: 55m 33s | Max:  1h 07m | Hits:  69%/3660  
      🟨 nvcc12.5           Pass:  50%/2   | Total:  1h 43m | Avg: 51m 52s | Max:  1h 06m | Hits:  69%/1127  
      🟨 nvcc12.8           Pass:  33%/36  | Total: 22h 37m | Avg: 37m 42s | Max:  1h 23m | Hits:  69%/14626 
    🟨 cudacxx_family
      🟨 ClangCUDA          Pass:  50%/2   | Total:  1h 59m | Avg: 59m 54s | Max:  1h 01m | Hits:  75%/1052  
      🟨 nvcc               Pass:  37%/43  | Total:  1d 04h | Avg: 40m 26s | Max:  1h 23m | Hits:  69%/19413 
    🟨 cxx_family
      🟨 Clang              Pass:  41%/17  | Total: 11h 36m | Avg: 40m 58s | Max:  1h 01m | Hits:  70%/8364  
      🟨 GCC                Pass:  40%/22  | Total: 12h 45m | Avg: 34m 48s | Max:  1h 02m | Hits:  69%/10974 
      🟥 MSVC               Pass:   0%/4   | Total:  4h 52m | Avg:  1h 13m | Max:  1h 23m
      🟨 NVHPC              Pass:  50%/2   | Total:  1h 43m | Avg: 51m 52s | Max:  1h 06m | Hits:  69%/1127  
    🟨 gpu
      🟥 h100               Pass:   0%/3   | Total: 13m 01s | Avg:  4m 20s | Max: 13m 01s
      🟨 rtx2080            Pass:  50%/34  | Total:  1d 05h | Avg: 52m 20s | Max:  1h 23m | Hits:  69%/20465 
      🟥 rtxa6000           Pass:   0%/8   | Total:  1h 06m | Avg:  8m 17s | Max: 34m 27s
    🟨 jobs
      🟨 Build              Pass:  45%/37  | Total:  1d 06h | Avg: 50m 14s | Max:  1h 23m | Hits:  69%/20465 
      🟥 DeviceLaunch       Pass:   0%/1  
      🟥 GraphCapture       Pass:   0%/1  
      🟥 HostLaunch         Pass:   0%/3  
      🟥 TestGPU            Pass:   0%/3  
    🟥 sm
      🟥 90                 Pass:   0%/3   | Total: 13m 01s | Avg:  4m 20s | Max: 13m 01s
      🟥 90;90a;100         Pass:   0%/1   | Total: 42m 41s | Avg: 42m 41s | Max: 42m 41s
    🟨 std
      🟨 17                 Pass:  85%/20  | Total: 20h 09m | Avg:  1h 00m | Max:  1h 12m | Hits:  69%/20465 
      🟥 20                 Pass:   0%/25  | Total: 10h 49m | Avg: 25m 58s | Max:  1h 23m
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 16m 57s | Avg: 8m 28s | Max: 14m 36s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 16m 57s | Avg:  8m 28s | Max: 14m 36s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 21s | Avg:  2m 21s | Max:  2m 21s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 14m 36s | Avg: 14m 36s | Max: 14m 36s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 00m | Avg: 1h 00m | Max: 1h 00m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 93)

# Runner
66 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
2 participants