Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] dask-cuda v22.12 #1059

Merged
merged 31 commits into from
Dec 8, 2022
Merged

[RELEASE] dask-cuda v22.12 #1059

merged 31 commits into from
Dec 8, 2022

Conversation

GPUtester
Copy link
Contributor

❄️ Code freeze for branch-22.12 and v22.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-22.12 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-22.12 into main for the release

raydouglass and others added 30 commits September 23, 2022 11:38
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[REVIEW] Unpin `dask` and `distributed` for development
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Closes #978

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)

URL: #1010
Closes #1015

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1016
Early attempt at resolving #989

<img width="1374" alt="Screen Shot 2022-10-04 at 6 37 20 PM" src="https://user-images.githubusercontent.com/1403768/193943590-5ac4e20a-4dbc-46ec-aefa-3f74ce7c6d17.png">

In an ideal world, we could call `dask.config.config` or `dask.config.get(...)` and that will have been updated with live values.   Unfortunately, that is not the case.  Config options are set in all manner of places: cli, env, yaml, kwargs, etc and we are not consistent about storing them.  For example, in this PR `self.device_memory_limit = device_memory_limit` is added to `device_host_file.py` just so `client.run` could be accessed from within the worker.

This change employs a hodge-podge of techniques to get at options we _think_ may be potentially useful for debugging but in no way is this exhaustive.

Authors:
  - Benjamin Zaitlen (https://github.com/quasiben)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1006
This is already supported by `memory_limit`/`device_memory_limit`, and this has been raised in #1020 during discussions on how to make Dask Profiles usable in Dask-CUDA.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1021
To add some metrics on distributed overheads to our benchmark suite, I propose using https://github.com/gjoseph92/dask-noop/ to rewrite the compute graphs into no-ops. Now we just measure the scheduler and worker-to-worker performance on zero-sized work (or something close thereto).

Interestingly the cupy-based transpose runs take _longer_ if I noopify (cc @gjoseph92 any insight here?), but I haven't done anything systematic yet.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #994
This PR removes the stale issue labeler workflow

Authors:
  - Ray Douglass (https://github.com/raydouglass)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1024
Even though pytest-async is installed, it has not executed async tests that aren't wrapped in gen_cluster. Using gen_cluster is the proper way for Dask testing, but in few exceptions we may not be able to use it, such as when cleanup must be avoided.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1026
In shuffle, use `Client.submit()` to control where tasks are executed and release temporary dataframes ASAP.

#### Context
In the final step in explicit-comms shuffle, we call `getitem()` to extract the final dataframe partitions from the result of the local shuffle. In some cases, these `getitem()` tasks would not run on the worker that ran the local shuffle, which would result in extra communication and spilling.
We now use `submit(..., worker=...)` to make sure that the worker running the local shuffle also runs the `getitem()` task afterwards.

Is it possible to do this without the use of `submit()` to avoid the overhead of creating a `Future` for each dataframe partition?

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1025
With the merge of #994 I accidentally changed the recorded time in the merge benchmark for the explicit-comms case. Since explicit-comms is eager, we must time the entire operation, not just `wait(result.persist())`.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1028
By installing Dask nightly as last step in CI, we prevent any other steps from inadvertently downgrading it before running tests.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mark Sadang (https://github.com/msadang)

URL: #1029
Support the new `Buffer` in rapidsai/cudf#12009

Fixes  #1032

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1033
For automated cleanup when the cluster exits, the on-disk spilling directory needs to live inside the relevant worker's local_directory. Since we do not have a handle on the worker when constructing the keyword arguments to DeviceHostFile or ProxifyHostFile, instead take advantage of dask/distributed#7153 and request that we are called with the worker_local_directory as an argument. Closes #1018.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1023
Should apply custom __new__ if Python < 3.9.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1036
Work around the issue reported in #1040 . This probably needs to be fixed elsewhere, so this patch temporarily unblocks CI.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - https://github.com/jakirkham
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1041
Addresses errors reported recently in #583 (comment) . Depends on dask/distributed#6720 .

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #950
All requirements towards `pytest-asyncio` should have been removed, dropping all remains.

Closes #1044

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - https://github.com/jakirkham

URL: #1045
* Upgrade to versioneer 0.28 & run formatting tools on those files
* Move most of `setup.py` to `pyproject.toml`
* Move all of `setup.cfg` to `pyproject.toml`
* Move `pytest.ini` to `pyproject.toml`
* Move `.coveragerc` to `pyproject.toml`
* Move Flake8 settings to `.flake8`

Authors:
  - https://github.com/jakirkham

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1035
Support of the [new built-in spilling in cuDF](rapidsai/cudf#12106) so that `device_memory_limit` and `memory_limit` ignores cuDF's device buffers.

This is only implemented for `DeviceHostFile`. Since jit-unspill also targets cuDF and libraries such as cupy isn't supported, I don't think it is important to support cuDF's built-in spilling in `ProxifyHostFile`.

For now, `DeviceHostFile` simply ignores cuDF's device buffers and let cuDF handle the spilling. This means that `DeviceHostFile` might estimate the device and host memory usage incorrectly ([or more incorrectly than usually](dask/distributed#4568 (comment))).

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #984
Introduce staging in explicit-comms. The idea is to "stage" the keys of the input on the workers so that a later explicit-comms task can access **and** free the data associated with the keys. 

Notice, explicit-comms and this new staging approach is still experimental. If or when it gets to a state where it provides a significant performance improvements over a range of workflows, the plan is to tighten up the API.

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #1030
This PR addresses a breaking change that was made in upstream distributed: dask/distributed#5669

Error due to above change:
```
>       self.memory_limit = parse_memory_limit(
            memory_limit=memory_limit, nthreads=1, total_cores=n_workers
        )
E       TypeError: parse_memory_limit() missing 1 required keyword-only argument: 'logger'

```
Introduced a version-based function call because we haven't yet finalized the `dask` pinning for `22.12`, though as of now `2022.11.1` seems to be the most likely release pinning. Since the dask dev `2022.11.1` packages now contain this breaking change, this check will also help unblock any `22.12` CI jobs that are fetching `2022.11.1` dask dev packages.

Once `dask` is pinned to a specific version, this version-based check can be removed.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1055
It often happens that `pre-commit` finds improperly formatted files that are not in the `dask_cuda` directory, therefore it is sensible to check the entire repo.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Lawrence Mitchell (https://github.com/wence-)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1053
This PR pins `dask` and `distributed` to `2022.11.1` for `22.12` release.

xref: rapidsai/cudf#12165

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1046
@GPUtester GPUtester requested review from a team as code owners December 1, 2022 20:25
@github-actions github-actions bot added conda conda issue gpuCI gpuCI issue python python code needed labels Dec 1, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2022

Codecov Report

Base: 0.00% // Head: 0.00% // No change to project coverage 👍

Coverage data is based on head (ee40483) compared to base (60ea5be).
Patch coverage: 0.00% of modified lines in pull request are covered.

❗ Current head ee40483 differs from pull request most recent head 8c87288. Consider uploading reports for the commit 8c87288 to get more accurate results

Additional details and impacted files
@@          Coverage Diff           @@
##            main   #1059    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         24      26     +2     
  Lines       3184    3439   +255     
======================================
- Misses      3184    3439   +255     
Impacted Files Coverage Δ
dask_cuda/_version.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/common.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_groupby.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_merge.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_shuffle.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy_map_overlap.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/utils.py 0.00% <0.00%> (ø)
dask_cuda/cli/dask_config.py 0.00% <0.00%> (ø)
dask_cuda/cli/dask_cuda_worker.py 0.00% <ø> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@raydouglass raydouglass merged commit bc7ec70 into main Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue gpuCI gpuCI issue python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants