[RELEASE] dask-cuda v22.12 #1059

GPUtester · 2022-12-01T20:25:14Z

❄️ Code freeze for `branch-22.12` and v22.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-22.12 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-22.12 into main for the release

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

[REVIEW] Unpin `dask` and `distributed` for development

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

Closes #978 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) URL: #1010

Closes #1015 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1016

Early attempt at resolving #989 <img width="1374" alt="Screen Shot 2022-10-04 at 6 37 20 PM" src="https://user-images.githubusercontent.com/1403768/193943590-5ac4e20a-4dbc-46ec-aefa-3f74ce7c6d17.png"> In an ideal world, we could call `dask.config.config` or `dask.config.get(...)` and that will have been updated with live values. Unfortunately, that is not the case. Config options are set in all manner of places: cli, env, yaml, kwargs, etc and we are not consistent about storing them. For example, in this PR `self.device_memory_limit = device_memory_limit` is added to `device_host_file.py` just so `client.run` could be accessed from within the worker. This change employs a hodge-podge of techniques to get at options we _think_ may be potentially useful for debugging but in no way is this exhaustive. Authors: - Benjamin Zaitlen (https://github.com/quasiben) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1006

This is already supported by `memory_limit`/`device_memory_limit`, and this has been raised in #1020 during discussions on how to make Dask Profiles usable in Dask-CUDA. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1021

@gjoseph92

To add some metrics on distributed overheads to our benchmark suite, I propose using https://github.com/gjoseph92/dask-noop/ to rewrite the compute graphs into no-ops. Now we just measure the scheduler and worker-to-worker performance on zero-sized work (or something close thereto). Interestingly the cupy-based transpose runs take _longer_ if I noopify (cc @gjoseph92 any insight here?), but I haven't done anything systematic yet. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #994

This PR removes the stale issue labeler workflow Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1024

Even though pytest-async is installed, it has not executed async tests that aren't wrapped in gen_cluster. Using gen_cluster is the proper way for Dask testing, but in few exceptions we may not be able to use it, such as when cleanup must be avoided. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1026

In shuffle, use `Client.submit()` to control where tasks are executed and release temporary dataframes ASAP. #### Context In the final step in explicit-comms shuffle, we call `getitem()` to extract the final dataframe partitions from the result of the local shuffle. In some cases, these `getitem()` tasks would not run on the worker that ran the local shuffle, which would result in extra communication and spilling. We now use `submit(..., worker=...)` to make sure that the worker running the local shuffle also runs the `getitem()` task afterwards. Is it possible to do this without the use of `submit()` to avoid the overhead of creating a `Future` for each dataframe partition? Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Lawrence Mitchell (https://github.com/wence-) URL: #1025

With the merge of #994 I accidentally changed the recorded time in the merge benchmark for the explicit-comms case. Since explicit-comms is eager, we must time the entire operation, not just `wait(result.persist())`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1028

By installing Dask nightly as last step in CI, we prevent any other steps from inadvertently downgrading it before running tests. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mark Sadang (https://github.com/msadang) URL: #1029

Support the new `Buffer` in rapidsai/cudf#12009 Fixes #1032 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1033

For automated cleanup when the cluster exits, the on-disk spilling directory needs to live inside the relevant worker's local_directory. Since we do not have a handle on the worker when constructing the keyword arguments to DeviceHostFile or ProxifyHostFile, instead take advantage of dask/distributed#7153 and request that we are called with the worker_local_directory as an argument. Closes #1018. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1023

Should apply custom __new__ if Python < 3.9. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1036

Work around the issue reported in #1040 . This probably needs to be fixed elsewhere, so this patch temporarily unblocks CI. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - https://github.com/jakirkham - AJ Schmidt (https://github.com/ajschmidt8) URL: #1041

Addresses errors reported recently in #583 (comment) . Depends on dask/distributed#6720 . Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #950

All requirements towards `pytest-asyncio` should have been removed, dropping all remains. Closes #1044 Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Ray Douglass (https://github.com/raydouglass) - https://github.com/jakirkham URL: #1045

* Upgrade to versioneer 0.28 & run formatting tools on those files * Move most of `setup.py` to `pyproject.toml` * Move all of `setup.cfg` to `pyproject.toml` * Move `pytest.ini` to `pyproject.toml` * Move `.coveragerc` to `pyproject.toml` * Move Flake8 settings to `.flake8` Authors: - https://github.com/jakirkham Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) URL: #1035

Support of the [new built-in spilling in cuDF](rapidsai/cudf#12106) so that `device_memory_limit` and `memory_limit` ignores cuDF's device buffers. This is only implemented for `DeviceHostFile`. Since jit-unspill also targets cuDF and libraries such as cupy isn't supported, I don't think it is important to support cuDF's built-in spilling in `ProxifyHostFile`. For now, `DeviceHostFile` simply ignores cuDF's device buffers and let cuDF handle the spilling. This means that `DeviceHostFile` might estimate the device and host memory usage incorrectly ([or more incorrectly than usually](dask/distributed#4568 (comment))). Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #984

Introduce staging in explicit-comms. The idea is to "stage" the keys of the input on the workers so that a later explicit-comms task can access **and** free the data associated with the keys. Notice, explicit-comms and this new staging approach is still experimental. If or when it gets to a state where it provides a significant performance improvements over a range of workflows, the plan is to tighten up the API. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1030

This PR addresses a breaking change that was made in upstream distributed: dask/distributed#5669 Error due to above change: ``` > self.memory_limit = parse_memory_limit( memory_limit=memory_limit, nthreads=1, total_cores=n_workers ) E TypeError: parse_memory_limit() missing 1 required keyword-only argument: 'logger' ``` Introduced a version-based function call because we haven't yet finalized the `dask` pinning for `22.12`, though as of now `2022.11.1` seems to be the most likely release pinning. Since the dask dev `2022.11.1` packages now contain this breaking change, this check will also help unblock any `22.12` CI jobs that are fetching `2022.11.1` dask dev packages. Once `dask` is pinned to a specific version, this version-based check can be removed. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1055

It often happens that `pre-commit` finds improperly formatted files that are not in the `dask_cuda` directory, therefore it is sensible to check the entire repo. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) - Jake Awe (https://github.com/AyodeAwe) URL: #1053

This PR pins `dask` and `distributed` to `2022.11.1` for `22.12` release. xref: rapidsai/cudf#12165 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1046

codecov-commenter · 2022-12-01T20:52:41Z

Codecov Report

Base: 0.00% // Head: 0.00% // No change to project coverage 👍

Coverage data is based on head (ee40483) compared to base (60ea5be).
Patch coverage: 0.00% of modified lines in pull request are covered.

❗ Current head ee40483 differs from pull request most recent head 8c87288. Consider uploading reports for the commit 8c87288 to get more accurate results

Additional details and impacted files

@@          Coverage Diff           @@
##            main   #1059    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         24      26     +2     
  Lines       3184    3439   +255     
======================================
- Misses      3184    3439   +255

Impacted Files	Coverage Δ
dask_cuda/_version.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/common.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/local_cudf_groupby.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/local_cudf_merge.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/local_cudf_shuffle.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/local_cupy.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/local_cupy_map_overlap.py	`0.00% <0.00%> (ø)`
dask_cuda/benchmarks/utils.py	`0.00% <0.00%> (ø)`
dask_cuda/cli/dask_config.py	`0.00% <0.00%> (ø)`
dask_cuda/cli/dask_cuda_worker.py	`0.00% <ø> (ø)`
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

raydouglass and others added 30 commits September 23, 2022 11:38

DOC

3a5bdd1

Merge pull request #996 from rapidsai/branch-22.10

f2667a5

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

Merge pull request #999 from rapidsai/branch-22.10

60ca8f2

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

Merge pull request #1002 from rapidsai/branch-22.10

e85bffc

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

Merge pull request #1004 from rapidsai/branch-22.10

13ce8b9

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

unpin dask and distributed for development

38ac04f

Merge pull request #1005 from galipremsagar/unpin_dask

8883dc8

[REVIEW] Unpin `dask` and `distributed` for development

Merge pull request #1007 from rapidsai/branch-22.10

b8e79e3

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]

Merge 22.08 into 22.10 (#1010)

23460f6

Closes #978 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) URL: #1010

Merge 22.10 into 22.12 (#1016)

76453ab

Closes #1015 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1016

Remove stale labeler (#1024)

99894b1

This PR removes the stale issue labeler workflow Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1024

Support the new Buffer in cudf (#1033)

4de1d84

Support the new `Buffer` in rapidsai/cudf#12009 Fixes #1032 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1033

Fix version constraint (#1036)

2c99f5a

Should apply custom __new__ if Python < 3.9. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1036

Fixes for handling MIG devices (#950)

b3ed902

Addresses errors reported recently in #583 (comment) . Depends on dask/distributed#6720 . Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #950

GPUtester requested review from a team as code owners December 1, 2022 20:25

github-actions bot added conda conda issue gpuCI gpuCI issue python python code needed labels Dec 1, 2022

update changelog

8c87288

raydouglass merged commit bc7ec70 into main Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] dask-cuda v22.12 #1059

[RELEASE] dask-cuda v22.12 #1059

GPUtester commented Dec 1, 2022

codecov-commenter commented Dec 1, 2022 •

edited

Loading

[RELEASE] dask-cuda v22.12 #1059

[RELEASE] dask-cuda v22.12 #1059

Conversation

GPUtester commented Dec 1, 2022

❄️ Code freeze for branch-22.12 and v22.12 release

What does this mean?

What is the purpose of this PR?

codecov-commenter commented Dec 1, 2022 • edited Loading

Codecov Report

❄️ Code freeze for `branch-22.12` and v22.12 release

codecov-commenter commented Dec 1, 2022 •

edited

Loading