Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor CUDA versions in dependencies.yaml #14733

Merged
merged 6 commits into from
Jan 11, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 10, 2024

Description

Follow-up to #14644.

Contributes to rapidsai/build-planning#7.

Similar to rapidsai/rmm#1422, this proposes splitting the cuda-version dependency in dependencies.yaml out to its own thing, separate from the bits of the CUDA Toolkit cudf needs.

Some other simplifications:

  • removes the notebook-specific stuff added in Split cuda versions for notebook testing #14722 (which I think were added specifically because cuda-version and CTK stuff was coupled)
  • consolidates two sections with selectors only based on CUDA {major}.{minor}

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@jameslamb jameslamb requested a review from a team as a code owner January 10, 2024 16:39
dependencies.yaml Show resolved Hide resolved
dependencies.yaml Outdated Show resolved Hide resolved
@vyasr vyasr added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 11, 2024
specific:
- output_types: conda
matrices:
- matrix:
cuda: "12.*"
cuda: "11.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop 11.2 from this file? We still support 11.2 via CEC, but we don't need to be able to build envs with it any more do we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not really in scope for this PR. Something we can come back to later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are supporting it, wouldn't we want a way to construct an environment with this CUDA version for testing, debugging, etc.?

Copy link
Contributor

@bdice bdice Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11.2 has been dropped from our CI matrix and Docker images. I think it’s OK to remove this but I agree it’s probably better to keep the scope narrow here. At some point we will drop CUDA 11 entirely and clean up a lot of stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we can come back to later.

I think it should come later, yeah.

@jameslamb
Copy link
Member Author

docs builds are failing like this:

/__w/cudf/cudf/cpp/doxygen/main_page.md:1: warning: multiple use of section label 'md_main_page' while adding anchor, (first occurrence: index.html)
...
reading sources... [  2%] user_guide/10min
/__w/cudf/cudf/docs/cudf/source/user_guide/10min.ipynb: Executing notebook using local CWD [mystnb]
../../thread/thread_load.cuh(36): warning: cuda.h: [jitify] File not found
../../thread/thread_store.cuh(36): warning: cuda.h: [jitify] File not found
2024-01-11 00:24:38,405 - distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/opt/conda/envs/docs/lib/python3.10/site-packages/distributed/protocol/core.py", line 175, in loads
    return msgpack.loads(
  File "msgpack/_unpacker.pyx", line 194, in msgpack._cmsgpack.unpackb
  File "/opt/conda/envs/docs/lib/python3.10/site-packages/distributed/protocol/core.py", line 172, in _decode_default
    return pickle.loads(sub_header["pickled-obj"], buffers=sub_frames)
  File "/opt/conda/envs/docs/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 99, in loads
    return pickle.loads(x, buffers=buffers)
AttributeError: Can't get attribute 'func' on <module '__main__' (built-in)>
...
/__w/cudf/cudf/docs/cudf/source/user_guide/10min.ipynb: WARNING: Executing notebook failed: CellTimeoutError [mystnb.exec]
/__w/cudf/cudf/docs/cudf/source/user_guide/10min.ipynb: WARNING: Notebook exception traceback saved in: /__w/cudf/cudf/docs/cudf/build/dirhtml/reports/user_guide/10min.err.log [mystnb.exec]
...
build finished with problems, 2 warnings.
make: *** [Makefile:20: dirhtml] Error 1

(build link)

I re-checked the diff and don't see anything that's obviously related to these warnings and errors. I just tried pushing an empty commit to re-trigger CI. If that doesn't work I'll ask for some help.

@wence-
Copy link
Contributor

wence- commented Jan 11, 2024

/__w/cudf/cudf/docs/cudf/source/user_guide/10min.ipynb: WARNING: Executing notebook failed: CellTimeoutError [mystnb.exec]
/__w/cudf/cudf/docs/cudf/source/user_guide/10min.ipynb: WARNING: Notebook exception traceback saved in: > /__w/cudf/cudf/docs/cudf/build/dirhtml/reports/user_guide/10min.err.log [mystnb.exec]

These shouldn't timeout, I think. Do they run locally without problems?

@jakirkham
Copy link
Member

As noted offline by Bradley,

@rjzamora
Copy link
Member

Just a note that the CI failure is not RAPIDS related. Issue is here: dask/distributed#8454 (cause is dask/distributed#8443)

@ajschmidt8 ajschmidt8 merged commit 0d87bb7 into rapidsai:branch-24.02 Jan 11, 2024
66 of 67 checks passed
@jameslamb jameslamb deleted the rework-dependencies branch January 11, 2024 22:05
@jameslamb jameslamb mentioned this pull request Jan 11, 2024
3 tasks
@jakirkham
Copy link
Member

Thanks all! 🙏

rapids-bot bot pushed a commit that referenced this pull request Feb 10, 2024
* switches to CUDA 12.2.2 for building conda packages and wheels
* adds new tests running against CUDA 12.2.2

### Notes for Reviewers

This is part of ongoing work to build and test packages against CUDA 12.2.2 across all of RAPIDS.

For more details see:

* rapidsai/build-planning#7
* rapidsai/shared-workflows#166
* adds some `dependencies.yaml` simplifications missed in #14733

Planning a second round of PRs to revert these references back to a proper `branch-24.{nn}` release branch of `shared-workflows` once rapidsai/shared-workflows#166 is merged.

*(created with `rapids-reviser`)*

Authors:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)
  - https://github.com/jakirkham

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Bradley Dice (https://github.com/bdice)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #14712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants