Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Thrust to CCCL #206

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

bernhardmgruber
Copy link

@bernhardmgruber bernhardmgruber commented Jun 26, 2024

The Thrust library no longer exists as a standalone project since it has been absorbed into the CCCL. I therefore propose to update the Thrust-based version of BabelStream to account for this. Specifically, this PR:

  • will download CCCL from GitHub via FetchContent if instructed to do so by the user. If not, then it tries to find CCCL via cmake's find_package. Newer CUDA toolkits ship CCCL, and you can point cmake to pick it up with the already available SDK_DIR. If no version of CCCL is found, cmake will try to find the legacy Thrust and CUB targets.
  • recent versions of Thrust no longer allow including device_vector into code that is not compiled by nvcc. I therefore pulled the corresponding data members into the source file using a the PIMPL pattern.
  • recent versions of Thrust (since CUDA 11.3) allow passing a pack of iterators to make_zip_iterator, which simplifies the benchmark implementation. With this change, CUDA < 11.3 will no longer be able to compile the benchmark.

I also want to emphasize that using a new version of CCCL is crucial, since CCCL constantly improves performance of Thrust algorithms. However, for comparability of benchmarks, it makes sense to pick the CCCL version of the CUDA SDK by default, and not introduce another variable.

I tried configuring and building thrust-stream on my Ubuntu 24.04 (default nvcc 12.5, default g++ 13.2) from a <git_clone>/build directory:

cmake .. -DMODEL=thrust -DCMAKE_CUDA_COMPILER=nvcc -DCUDA_ARCH=86 -DFETCH_CCCL=ON && make && ./thrust-stream
cmake .. -DMODEL=thrust -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.5/bin/nvcc -DCUDA_ARCH=86 -DSDK_DIR=/usr/local/cuda-12.5/lib64/cmake && make && ./thrust-stream
cmake .. -DMODEL=thrust -DCMAKE_CUDA_COMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/bin/nvcc -DCUDA_ARCH=86 -DSDK_DIR=/opt/nvidia/hpc_sdk/Linux_x86_64/24.5/cuda/12.4/lib64/cmake && make && ./thrust-stream
cmake .. -DMODEL=thrust -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.4/bin/nvcc -DCUDA_ARCH=86 -DSDK_DIR=/usr/local/cuda-11.4/include -DCMAKE_CUDA_HOST_COMPILER=g++-10 && make && ./thrust-stream

So fetching CCCL seems to work, using CCCL from a CUDA toolkit works, using CCCL from a HPC SDK works, and using legacy Thrust and CUB from an old CUDA SDK works.

I did not test whether rocThrust works.

Comment on lines -78 to -80
thrust_create_target(Thrust${BACKEND}
HOST CPP
DEVICE ${BACKEND})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines switch between the different Thrust backends, and this feature is lost in the new version. By default, CCCL uses the CUDA backend for Thrust, but it looks like BabelStream may want to use TBB or OMP.

To keep this functionality, force-set the cache variable CCCL_THRUST_DEVICE_SYSTEM to the desired backend before any find_package/CPM calls.

Force setting that variable tells the CCCL CMake package to use the desired device backend for the CCCL::Thrust target that gets linked into CCCL::CCCL. See this for details. This is a convenience for the common case where only one backend is used per configure. If multiple Thrust targets with different backends are needed in the same build, lmk, there's a different approach that can be used.

Also, using paths based off of CMAKE_SOURCE_DIR is fragile. If BabelStream is included in another project (via add_subdirectory, CPM, etc), this variable will point to the top-level project's source dir, not BabelStream's. Use the project specific variable BabelStream_SOURCE_DIR instead.

The include call should wrap it's argument in quotes. This is best practice for all paths in cmake to support spaces, etc.

Finally, rather than manually checking find_package and falling back to CPMAddPackage, it looks like CPMFindPackage will do this for us:

The function CPMFindPackage will try to find a local dependency via CMake's find_package and fallback to CPMAddPackage, if the dependency is not found.

All together, you're looking at something like:

set(MIN_CCCL_VERSION 2.4.0)
# Tell CCCL's package to configure Thrust using the desired backend:
set(CCCL_THRUST_DEVICE_SYSTEM ${BACKEND} CACHE STRING "" FORCE)

# CPMFindPackage will:
# 1. Attempt to locate a local installation of CCCL.
#    Set CCCL_DIR to the directory containing `cccl-config.cmake` to force a
#    specific installation to be used.
# 2. If no local package is found, the requested version will be downloaded
#    from GitHub and configured using CPMAddPackage.
include("${BabelStream_SOURCE_DIR}/src/thrust/CPM.cmake")
CPMFindPackage(
  NAME CCCL
  GITHUB_REPOSITORY nvidia/cccl
  GIT_TAG v${MIN_CCCL_VERSION}
)
register_link_library(CCCL::CCCL)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I set CCCL_THRUST_DEVICE_SYSTEM now before starting to find and fetch CCCL.

As for handling CPM, I decided now to revert to FetchContent to just make the CMake setup simpler.

Let me know if anything is still missing, thx!

@tomdeakin tomdeakin changed the base branch from main to develop June 26, 2024 19:51
Recent changes in Thrust, for the CUDA backend, no longer allow including `<thrust/device_vector.h>` into a translation unit that is not compiled by a CUDA compiler. As a workaround, the vectors containing the data are moved from the header into the benchmark implementation file.
@bernhardmgruber bernhardmgruber force-pushed the thrust_fixes branch 2 times, most recently from 05afc2e to eafbf41 Compare June 26, 2024 20:18
@bernhardmgruber
Copy link
Author

@gonzalobg I would appreciate your input on this PR. Do these changes break any of your workflows?

@bernhardmgruber bernhardmgruber marked this pull request as ready for review July 1, 2024 16:07
@bernhardmgruber
Copy link
Author

Ping. @tomdeakin, this PR is really important to enable users to use CCCL instead of the legacy Thrust library.

@tomdeakin
Copy link
Contributor

We are planning a re-think of CUDA and Thrust models generally with @gonzalobg. To manage expectations on timescales, we hope to release v6 in time for Supercomputing, but it depends on how much progress we make on other changes for v6.

@bernhardmgruber
Copy link
Author

Thank you for the answer! I knew @gonzalobg is somehow involved, but I didn't know the plan. In any case, I hope v6 will support Thrust from CCCL, however it is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants