Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Current main branch breaks host compiler when Thrust headers are included #1373

Closed
1 task done
leofang opened this issue Feb 13, 2024 · 8 comments
Closed
1 task done
Labels
bug Something isn't working right.

Comments

@leofang
Copy link
Member

leofang commented Feb 13, 2024

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

CuPy cannot be built due to this compiler error, which does not exist at all. I think this might be due to a missing compiler guard that used to exist somewhere and shielded the device APIs from the host path.

    -------- Configuring Module: thrust --------
    In file included from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh:52,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/util.h:48,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/malloc_and_free.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/adl/malloc_and_free.h:50,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.inl:30,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.h:77,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/detail/reference.h:36,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/memory.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/device_ptr.h:33,
                     from /tmp/tmps692imk8/a.cpp:1:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘void cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC()’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:271:5: error: ‘__syncthreads’ was not declared in this scope
      271 |     __syncthreads();
          |     ^~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC_AND(int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:280:12: error: ‘__syncthreads_and’ was not declared in this scope
      280 |     return __syncthreads_and(p);
          |            ^~~~~~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::CTA_SYNC_OR(int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:289:12: error: ‘__syncthreads_or’ was not declared in this scope
      289 |     return __syncthreads_or(p);
          |            ^~~~~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘void cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_SYNC(unsigned int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:298:5: error: ‘__syncwarp’ was not declared in this scope
      298 |     __syncwarp(member_mask);
          |     ^~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_ANY(int, unsigned int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:307:12: error: ‘__any_sync’ was not declared in this scope
      307 |     return __any_sync(member_mask, predicate);
          |            ^~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_ALL(int, unsigned int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:316:12: error: ‘__all_sync’ was not declared in this scope
      316 |     return __all_sync(member_mask, predicate);
          |            ^~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::WARP_BALLOT(int, unsigned int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:325:12: error: ‘__ballot_sync’ was not declared in this scope
      325 |     return __ballot_sync(member_mask, predicate);
          |            ^~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘unsigned int cub::CUB_200300___CUDA_ARCH_LIST___NS::SHFL_IDX_SYNC(unsigned int, int, unsigned int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:368:12: error: ‘__shfl_sync’ was not declared in this scope
      368 |     return __shfl_sync(member_mask, word, src_lane);
          |            ^~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh: In function ‘int cub::CUB_200300___CUDA_ARCH_LIST___NS::RowMajorTid(int, int, int)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_ptx.cuh:415:39: error: ‘threadIdx’ was not declared in this scope
      415 |     return ((block_dim_z == 1) ? 0 : (threadIdx.z * block_dim_x * block_dim_y)) +
          |                                       ^~~~~~~~~
    In file included from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/libcudacxx/cuda/std/detail/libcxx/include/__cuda/ptx.h:31,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/libcudacxx/cuda/discard_memory:16,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh:57,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/util.h:48,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/malloc_and_free.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/adl/malloc_and_free.h:50,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.inl:30,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.h:77,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/detail/reference.h:36,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/memory.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/device_ptr.h:33,
                     from /tmp/tmps692imk8/a.cpp:1:
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h: In function ‘uint32_t cuda::ptx::__4::__as_ptr_smem(const void*)’:
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h:40:44: error: ‘__cvta_generic_to_shared’ was not declared in this scope
       40 |   return static_cast<_CUDA_VSTD::uint32_t>(__cvta_generic_to_shared(__ptr));
          |                                            ^~~~~~~~~~~~~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h: In function ‘uint64_t cuda::ptx::__4::__as_ptr_gmem(const void*)’:
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h:60:44: error: ‘__cvta_generic_to_global’ was not declared in this scope
       60 |   return static_cast<_CUDA_VSTD::uint64_t>(__cvta_generic_to_global(__ptr));
          |                                            ^~~~~~~~~~~~~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h: In function ‘_Tp* cuda::ptx::__4::__from_ptr_smem(size_t)’:
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h:73:33: error: there are no arguments to ‘__cvta_shared_to_generic’ that depend on a template parameter, so a declaration of ‘__cvta_shared_to_generic’ must be available [-fpermissive]
       73 |   return reinterpret_cast<_Tp*>(__cvta_shared_to_generic(__ptr));
          |                                 ^~~~~~~~~~~~~~~~~~~~~~~~
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h:73:33: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h: In function ‘_Tp* cuda::ptx::__4::__from_ptr_gmem(size_t)’:
    /home/leof/dev/cupy_cuda122/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx/ptx_helper_functions.h:94:33: error: there are no arguments to ‘__cvta_global_to_generic’ that depend on a template parameter, so a declaration of ‘__cvta_global_to_generic’ must be available [-fpermissive]
       94 |   return reinterpret_cast<_Tp*>(__cvta_global_to_generic(__ptr));
          |                                 ^~~~~~~~~~~~~~~~~~~~~~~~
    In file included from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/util.h:48,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/cuda/detail/malloc_and_free.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/adl/malloc_and_free.h:50,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.inl:30,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/system/detail/generic/memory.h:77,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/detail/reference.h:36,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/memory.h:34,
                     from /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/thrust/thrust/device_ptr.h:33,
                     from /tmp/tmps692imk8/a.cpp:1:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh: In static member function ‘static typename AgentT::TempStorage& cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl<AgentT>::get_temp_storage(cub::CUB_200300___CUDA_ARCH_LIST___NS::NullType&, cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_t&)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh:160:63: error: ‘blockIdx’ was not declared in this scope
      160 |       static_cast<char*>(vsmem.gmem_ptr) + (vsmem_per_block * blockIdx.x));
          |                                                               ^~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh: In static member function ‘static bool cub::CUB_200300___CUDA_ARCH_LIST___NS::detail::vsmem_helper_impl<AgentT>::discard_temp_storage(typename AgentT::TempStorage&)’:
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh:201:38: error: ‘threadIdx’ was not declared in this scope
      201 |     const std::size_t linear_tid   = threadIdx.x;
          |                                      ^~~~~~~~~
    /home/leof/dev/cupy_cuda122/cupy/_core/include/cupy/_cccl/cub/cub/util_device.cuh:202:50: error: ‘blockDim’ was not declared in this scope
      202 |     const std::size_t block_stride = line_size * blockDim.x;
          |                                                  ^~~~~~~~
    command '/usr/bin/gcc' failed with exit code 1

How to Reproduce

Two ways:

  1. Build CuPy from source, or use the reproducer below.
    1. Clone CuPy: git clone --recursive https://github.com/cupy/cupy.git; cd cupy
    2. Update the CCCL submodule to latest: cd third_party/cccl/; git checkout main; git pull origin main; cd ../..
    3. Build CuPy: pip install --no-deps --no-build-isolation -v -e .
  2. Compile the standalone reproducer using g++ (which is essentially what CuPy's build system does)
// g++ -std=c++11 -I ./third_party/cccl/thrust/ -I ./third_party/cccl/libcudacxx/include/ -I ./third_party/cccl/cub/ -I $CUDA_PATH/include test.cpp -o test
#include "thrust/device_ptr.h"
#include "thrust/sequence.h"
#include "thrust/sort.h"

int main() {
    return 0;
}

Expected behavior

Module configuration works without error (no extra lines after -------- Configuring Module: thrust --------).

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@gevtushenko
Copy link
Collaborator

@leofang I'm afraid this is not a supported use case. The default thrust device system has always been CUDA. If you want to get this working, you need to specify -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP or equivalent non-CUDA alternative (OMP, TBB).

@github-project-automation github-project-automation bot moved this from Todo to Done in CCCL Feb 13, 2024
@jrhemstad
Copy link
Collaborator

@leofang I'm a little confused.

I wouldn't have expected thrust/device_ptr.h to have ever been able to compile with only a host compiler without having specified -DTHRUST_DEVICE_SYSTEM=... as Georgii said above.

@leofang
Copy link
Member Author

leofang commented Feb 13, 2024

@gevtushenko So should I take it as a happy coincidence that it just happened to work in the past few years? Also, is there any Thrust header that's guaranteed safe to be compiled by a host compiler (for the purpose of helping an autotool-like build system, where the host compiler checks if the header is present in the search path via compiling a snippet)? Just wanna know the expectation to plan ahead.

@jrhemstad jrhemstad reopened this Feb 13, 2024
@github-project-automation github-project-automation bot moved this from Done to Needs Triage in CCCL Feb 13, 2024
@leofang
Copy link
Member Author

leofang commented Feb 13, 2024

(To be clear, it's not a blocker to CuPy, we live at head so it's a fairly simple fix on our side if it is deemed as NAB. I just want to 1. raise for awareness, and 2. confirm the team's expectation so that I can work on the fix to CuPy.)

@jrhemstad
Copy link
Collaborator

Also, is there any Thrust header that's guaranteed safe to be compiled by a host compiler (for the purpose of helping an autotool-like build system, where the host compiler checks if the header is present in the search path via compiling a snippet)? Just wanna know the expectation to plan ahead.

This is a very good question that I don't think has a satisfactory answer today. The safest and most restrictive answer is that no Thrust header can be included in a host-only TU unless the device system is set to CPP/OMP/TBB.

However, there are definitely Thrust headers today that can be included/used in a host-only TU. I don't believe this was done intentionally, but just a coincidence of the implementation.

That said, as a user, I'd reasonably expect things like thrust/tuple.h to work no matter what the device system is configured as.

So clearly there is a gap here that needs a closer look.

@jrhemstad
Copy link
Collaborator

Closing this in favor of #1374

Thanks @leofang for identifying this gap in our documentation!

@github-project-automation github-project-automation bot moved this from Needs Triage to Done in CCCL Feb 13, 2024
@leofang
Copy link
Member Author

leofang commented Feb 13, 2024

Thanks, @gevtushenko @jrhemstad for quick turnaround!

@leofang
Copy link
Member Author

leofang commented Feb 13, 2024

That said, as a user, I'd reasonably expect things like thrust/tuple.h to work no matter what the device system is configured as.

Maybe thrust/version.h is another good target?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Archived in project
Development

No branches or pull requests

3 participants