Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] dask-cuda v25.02 #1438

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

[RELEASE] dask-cuda v25.02 #1438

wants to merge 22 commits into from

Conversation

AyodeAwe
Copy link
Contributor

❄️ Code freeze for branch-25.02 and v25.02 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.02 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.02 into main for the release

raydouglass and others added 22 commits November 15, 2024 09:26
Forward-merge branch-24.12 into branch-25.02
Forward-merge branch-24.12 into branch-25.02
By default, CI runs on draft PRs. This leads to many CI runs that may be unnecessary.

With this PR's change to `.github/copy-pr-bot.yaml`, an `/ok to test` comment from a trusted user is required to trigger CI on draft PRs. Non-draft PRs will run CI by default, assuming that all commits are signed by trusted users. Otherwise an `/ok to test` is required (as before) -- see the `copy-pr-bot` docs at https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/ for more information.

Part of rapidsai/build-planning#123.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1412
Forward-merge branch-24.12 into branch-25.02
Conda builds are failing due to missing `setuptools`, this change add the missing dependency to fix the failure.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Bradley Dice (https://github.com/bdice)

URL: #1418
When PyNVML fails to identify CPU affinity appropriately, it may cause an error with launching Dask-CUDA. After extensive discussions in #1381, it seems appropriate to allow continuing if CPU affinity identification fails and print a warning with a link to documentation instead. New documentation is also added to help in first steps of troubleshooting.

Unfortunately testing warnings in Distributed plugins seems very hard to do, I couldn't find a way to do that even with `distributed.utils_tests.captured_logger`, which runs only after the cluster is created with a `LocalCluster` (or `LocalCUDACluster`). For the `dask cuda worker` CLI there's no way for us to mock the value passed to `CPUAffinity` to force a warning to be raised, so no tests are added at this time.

Closes #1381 .

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #1420
Do not skip `pynvml` if it's not importable, given `pynvml` is a hard-dependency.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - https://github.com/jakirkham
  - James Lamb (https://github.com/jameslamb)

URL: #1421
Bump `pynvml` from `11` to `12`. This version of `pynvml` also now depends on `nvidia-ml-py` for core functionality.

Authors:
  - https://github.com/jakirkham
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1419
Adding this now that wheels are available

- **deps(kvikio): add kvikio to CUDA version matrices**
- **test(wheels): enable wheel tests in CI**

Resolves #1344

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)

URL: #1416
Removes testing/handling for "legacy" Dask cuDF (i.e. `DASK_DATAFRAME__QUERY_PLANNING=False`).

This PR also adds support for the `"explicit-comms"` config with query-planning enabled (we used to raise an error telling the user to disable query planning).

This should be merged **before** rapidsai/cudf#17558 (otherwise Dask-CUDA CI will break).
This PR is marked as "breaking", because it technically breaks the `"explicit-comms"` config with the "legacy" version of Dask cuDF (which we are about to remove in 25.02 anyway).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - James Lamb (https://github.com/jameslamb)
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1417
Follow up to #1417

Cleans up some imports (some of which don't work for `dask>2024.12.1`).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1424
Numba 0.61.0 just got released with couple of breaking changes, this pr is required to unblock the ci.

xref: rapidsai/cudf#17777

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #1426
Pull in build dependencies from `pyproject.toml` into Conda's `meta.yaml`.

Authors:
  - https://github.com/jakirkham

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1425
shellcheck  is a fast, static analysis tool for shell scripts. It's good at                                  
flagging up unused variables, unintentional glob expansions, and other potential                              
execution and security headaches that arise from the wonders of  bash  (and other shlangs).                   
                                                                                                              
This PR adds a  pre-commit  hook to run  shellcheck  on all of the  sh-lang  files in the  ci/  directory, and
the changes requested by  shellcheck  to make the existing files pass the check.                              
                                                                                                              
xref: rapidsai/build-planning#135

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1427
A new configuration to the UCX comms module was introduced in rapidsai/rapids-dask-dependency#80, this is designed to help with timeouts in larger clusters, and sometimes even small ones depending on the architecture. This change documents that new configuration.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #1428
Contributes to rapidsai/build-planning#142

`ucx-proc` is no longer necessary, for the reasons described in that issue. This proposes dropping the dependency on it here.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #1429
This PR uses CUDA 12.8.0 to build and test.

xref: rapidsai/build-planning#139

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1432
This PR points the shared workflow branches back to the default 25.02 branches.

xref: rapidsai/build-planning#139

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1436
@AyodeAwe AyodeAwe requested review from a team as code owners January 31, 2025 21:40
@AyodeAwe AyodeAwe requested review from jameslamb and removed request for a team January 31, 2025 21:40
@github-actions github-actions bot added python python code needed conda conda issue ci labels Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.