Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update fmt (to 11.0.2) and spdlog (to 1.14.1) #689

Open
wants to merge 7 commits into
base: branch-24.10
Choose a base branch
from

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Sep 11, 2024

Description

Contributes to rapidsai/build-planning#56

Replaces #592

Now that most of conda-forge has been updated to fmt >=11.0.1,<12.0a0 and spdlog>=1.14.1,<1.15 (rapidsai/build-planning#56 (comment)), we're attempting to upgrade RAPIDS to similar versions of those libraries... so that the next release of RAPIDS will be installable alongside newer versions of its dependencies and complementary packages on conda-forge.

Notes for Reviewers

This will also have the added side-benefit of (at least for now) shrinking the size and install time of some RAPIDS conda packages... because now builds will get spdlog from those conda-forge packages, instead of downloading it via CPM and then vendoring it.

How I tested this

Opened PRs in some other RAPIDS repos using rapids-cmake from here and rmm from rapidsai/rmm#1678.

See the list at rapidsai/build-planning#56

Also tested in unified devcontainers, with all RAPIDS repos at (in devcontainers manifest.yaml) checked out to the branches from those PRs (pointed at my fork of rapids-cmake here), like this:

uninstall-all -j
clean-all -j
build-all -j

./rmm/ci/run_pytests.sh
./cudf/ci/run_cudf_pytests.sh
./cuml/ci/run_cuml_singlegpu_pytests.sh

pushd ./cuspatial/python/cuspatial/cuspatial
python -m pytest \
  --cache-clear \
  --numprocesses=8 \
  --dist=worksteal \
  tests

Saw everything build successfully and all the tests pass (with CUDA 12.2, on x86_64).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The cmake-format.json is up to date with these changes.
  • I have added new files under rapids-cmake/
    • I have added include guards (include_guard(GLOBAL))
    • I have added the associated docs/ rst file and update the api.rst

@jameslamb jameslamb added breaking Introduces a breaking change improvement Improves an existing functionality 2 - In Progress Currenty a work in progress labels Sep 11, 2024
@jameslamb jameslamb changed the title WIP: Bump fmt (to 11.0.2) and spdlog (to 1.14.1) WIP: update fmt (to 11.0.2) and spdlog (to 1.14.1) Sep 11, 2024
}
+#if defined(__NVCC__)
+#pragma nv_diag_default 128
+#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch works around an issue we observed in raft (rapidsai/raft#2433 (comment)) and in the cudf pip devcontainers build.

  FAILED: CMakeFiles/raft_objs.dir/src/matrix/detail/select_k_half_uint32_t.cu.o 
  /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCUTLASS_NAMESPACE=raft_cutlass -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_COMPILED -DRAFT_EXPLICIT_INSTANTIATE_ONLY -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -I/home/coder/raft/cpp/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/hnswlib-src -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/rmm-src/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/cccl-src/thrust/thrust/cmake/../.. -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/cccl-src/cub/cub/cmake/../.. -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/fmt-src/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/spdlog-src/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/nvtx3-src/c/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/cuco-src/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/nvidiacutlass-src/include -I/home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/nvidiacutlass-build/include -I/usr/local/cuda/include -isystem /usr/local/cuda/targets/x86_64-linux/include -t=1 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -Xcompiler=-Wno-deprecated-declarations -DRAFT_HIDE_DEPRECATION_WARNINGS -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations -Werror=all-warnings --expt-extended-lambda --expt-relaxed-constexpr -DCUDA_API_PER_THREAD_DEFAULT_STREAM -Xfatbin=-compress-all -Xcompiler=-fopenmp -MD -MT CMakeFiles/raft_objs.dir/src/matrix/detail/select_k_half_uint32_t.cu.o -MF CMakeFiles/raft_objs.dir/src/matrix/detail/select_k_half_uint32_t.cu.o.d -x cu -c /home/coder/raft/cpp/src/matrix/detail/select_k_half_uint32_t.cu -o CMakeFiles/raft_objs.dir/src/matrix/detail/select_k_half_uint32_t.cu.o
  /home/coder/raft/cpp/build/pip/cuda-12.5/release/_deps/fmt-src/include/fmt/base.h(471): error #128-D: loop is not reachable
      for (; n != 0; ++s1, ++s2, --n) {
      ^
            detected during:
              instantiation of "auto fmt::v11::detail::compare(const Char *, const Char *, std::size_t)->int [with Char=char]" at line 583
              instantiation of "auto fmt::v11::basic_string_view<Char>::compare(fmt::v11::basic_string_view<Char>) const->int [with Char=char]" at line 591
              instantiation of class "fmt::v11::basic_string_view<Char> [with Char=char]" at line 1929
              instantiation of "auto fmt::v11::basic_format_args<Context>::get_id(fmt::v11::basic_string_view<Char>) const->int [with Context=fmt::v11::context, Char=char]" at line 1969
  
  Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
  
  1 error detected in the compilation of "/home/coder/raft/cpp/src/matrix/detail/select_k_half_uint32_t.cu".
  ninja: build stopped: subcommand failed.

I'm not sure what the root cause is, but thought the cost "a bit of unreachable code is compiled in" was worth it in exchange for getting this upgrade done.

Otherwise, I think we'd have to do one of these other higher-effort things:

  • put similar guards around every #include that can lead to #include <fmt/base.h> (recursively) across RAPIDS
  • find the true root cause of the warning

NOTE: taking this patch means that fmt will still be vendored in some RAPIDS conda packages though (rapidsai/rmm#1528).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a strategy to upstream or fix this, so that we aren't carrying the patch indefinitely. Let's compose an issue to fmt that demonstrates this more minimally, and then figure out what to do for a fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember which nvcc versions were affected (just 12.5? also 11.8?) but that would be relevant to include in a bug report.

I assume that this is a fmt bug and not an nvcc bug, but if you think it's an nvcc bug, let's discuss and file that internally as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'd love to get rid of the patch.

I don't know the range of nvcc versions or whether the issue is in fmt or nvcc. Will put up a tracking issue here in rapids-cmake with steps to reproduce this so we can investigate later.

Comment on lines +68 to +69
"git_url": "https://github.com/jameslamb/rmm.git",
"git_tag": "fmt-and-spdlog"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"git_url": "https://github.com/jameslamb/rmm.git",
"git_tag": "fmt-and-spdlog"
"git_url": "https://github.com/rapidsai/rmm.git",
"git_tag": "branch-${version}"

TODO: revert this back right before merging

@jameslamb jameslamb removed the 2 - In Progress Currenty a work in progress label Sep 19, 2024
@jameslamb jameslamb changed the title WIP: update fmt (to 11.0.2) and spdlog (to 1.14.1), remove patches WIP: update fmt (to 11.0.2) and spdlog (to 1.14.1) Sep 19, 2024
@jameslamb jameslamb added 3 - Ready for Review Ready for review by team 5 - DO NOT MERGE Hold off on merging; see PR for details labels Sep 19, 2024
@jameslamb jameslamb changed the title WIP: update fmt (to 11.0.2) and spdlog (to 1.14.1) update fmt (to 11.0.2) and spdlog (to 1.14.1) Sep 19, 2024
@jameslamb jameslamb marked this pull request as ready for review September 19, 2024 16:51
@jameslamb jameslamb requested a review from a team as a code owner September 19, 2024 16:51
@jameslamb jameslamb requested a review from a team as a code owner September 19, 2024 16:51
@jameslamb jameslamb requested review from AyodeAwe, bdice and KyleFromNVIDIA and removed request for a team and AyodeAwe September 19, 2024 16:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this so it has the word "nvcc" in the filename. If that requires too much CI rerunning, don't bother.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's not worth the CI re-running. The patch has #if defined(__NVCC__) around its changes, so hopefully that's enough information for anyone looking at it that this is specific to nvcc.

}
+#if defined(__NVCC__)
+#pragma nv_diag_default 128
+#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember which nvcc versions were affected (just 12.5? also 11.8?) but that would be relevant to include in a bug report.

I assume that this is a fmt bug and not an nvcc bug, but if you think it's an nvcc bug, let's discuss and file that internally as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 5 - DO NOT MERGE Hold off on merging; see PR for details breaking Introduces a breaking change improvement Improves an existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants