-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] dask-cuda v22.12 #1059
Conversation
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
[REVIEW] Unpin `dask` and `distributed` for development
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Closes #978 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) URL: #1010
Closes #1015 Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1016
Early attempt at resolving #989 <img width="1374" alt="Screen Shot 2022-10-04 at 6 37 20 PM" src="https://user-images.githubusercontent.com/1403768/193943590-5ac4e20a-4dbc-46ec-aefa-3f74ce7c6d17.png"> In an ideal world, we could call `dask.config.config` or `dask.config.get(...)` and that will have been updated with live values. Unfortunately, that is not the case. Config options are set in all manner of places: cli, env, yaml, kwargs, etc and we are not consistent about storing them. For example, in this PR `self.device_memory_limit = device_memory_limit` is added to `device_host_file.py` just so `client.run` could be accessed from within the worker. This change employs a hodge-podge of techniques to get at options we _think_ may be potentially useful for debugging but in no way is this exhaustive. Authors: - Benjamin Zaitlen (https://github.com/quasiben) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1006
This is already supported by `memory_limit`/`device_memory_limit`, and this has been raised in #1020 during discussions on how to make Dask Profiles usable in Dask-CUDA. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1021
To add some metrics on distributed overheads to our benchmark suite, I propose using https://github.com/gjoseph92/dask-noop/ to rewrite the compute graphs into no-ops. Now we just measure the scheduler and worker-to-worker performance on zero-sized work (or something close thereto). Interestingly the cupy-based transpose runs take _longer_ if I noopify (cc @gjoseph92 any insight here?), but I haven't done anything systematic yet. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #994
This PR removes the stale issue labeler workflow Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: #1024
Even though pytest-async is installed, it has not executed async tests that aren't wrapped in gen_cluster. Using gen_cluster is the proper way for Dask testing, but in few exceptions we may not be able to use it, such as when cleanup must be avoided. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1026
In shuffle, use `Client.submit()` to control where tasks are executed and release temporary dataframes ASAP. #### Context In the final step in explicit-comms shuffle, we call `getitem()` to extract the final dataframe partitions from the result of the local shuffle. In some cases, these `getitem()` tasks would not run on the worker that ran the local shuffle, which would result in extra communication and spilling. We now use `submit(..., worker=...)` to make sure that the worker running the local shuffle also runs the `getitem()` task afterwards. Is it possible to do this without the use of `submit()` to avoid the overhead of creating a `Future` for each dataframe partition? Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Lawrence Mitchell (https://github.com/wence-) URL: #1025
With the merge of #994 I accidentally changed the recorded time in the merge benchmark for the explicit-comms case. Since explicit-comms is eager, we must time the entire operation, not just `wait(result.persist())`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #1028
By installing Dask nightly as last step in CI, we prevent any other steps from inadvertently downgrading it before running tests. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mark Sadang (https://github.com/msadang) URL: #1029
Support the new `Buffer` in rapidsai/cudf#12009 Fixes #1032 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1033
For automated cleanup when the cluster exits, the on-disk spilling directory needs to live inside the relevant worker's local_directory. Since we do not have a handle on the worker when constructing the keyword arguments to DeviceHostFile or ProxifyHostFile, instead take advantage of dask/distributed#7153 and request that we are called with the worker_local_directory as an argument. Closes #1018. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1023
Should apply custom __new__ if Python < 3.9. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1036
Work around the issue reported in #1040 . This probably needs to be fixed elsewhere, so this patch temporarily unblocks CI. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - https://github.com/jakirkham - AJ Schmidt (https://github.com/ajschmidt8) URL: #1041
Addresses errors reported recently in #583 (comment) . Depends on dask/distributed#6720 . Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #950
All requirements towards `pytest-asyncio` should have been removed, dropping all remains. Closes #1044 Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Ray Douglass (https://github.com/raydouglass) - https://github.com/jakirkham URL: #1045
* Upgrade to versioneer 0.28 & run formatting tools on those files * Move most of `setup.py` to `pyproject.toml` * Move all of `setup.cfg` to `pyproject.toml` * Move `pytest.ini` to `pyproject.toml` * Move `.coveragerc` to `pyproject.toml` * Move Flake8 settings to `.flake8` Authors: - https://github.com/jakirkham Approvers: - Peter Andreas Entschev (https://github.com/pentschev) - Ray Douglass (https://github.com/raydouglass) URL: #1035
Support of the [new built-in spilling in cuDF](rapidsai/cudf#12106) so that `device_memory_limit` and `memory_limit` ignores cuDF's device buffers. This is only implemented for `DeviceHostFile`. Since jit-unspill also targets cuDF and libraries such as cupy isn't supported, I don't think it is important to support cuDF's built-in spilling in `ProxifyHostFile`. For now, `DeviceHostFile` simply ignores cuDF's device buffers and let cuDF handle the spilling. This means that `DeviceHostFile` might estimate the device and host memory usage incorrectly ([or more incorrectly than usually](dask/distributed#4568 (comment))). Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #984
Introduce staging in explicit-comms. The idea is to "stage" the keys of the input on the workers so that a later explicit-comms task can access **and** free the data associated with the keys. Notice, explicit-comms and this new staging approach is still experimental. If or when it gets to a state where it provides a significant performance improvements over a range of workflows, the plan is to tighten up the API. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1030
This PR addresses a breaking change that was made in upstream distributed: dask/distributed#5669 Error due to above change: ``` > self.memory_limit = parse_memory_limit( memory_limit=memory_limit, nthreads=1, total_cores=n_workers ) E TypeError: parse_memory_limit() missing 1 required keyword-only argument: 'logger' ``` Introduced a version-based function call because we haven't yet finalized the `dask` pinning for `22.12`, though as of now `2022.11.1` seems to be the most likely release pinning. Since the dask dev `2022.11.1` packages now contain this breaking change, this check will also help unblock any `22.12` CI jobs that are fetching `2022.11.1` dask dev packages. Once `dask` is pinned to a specific version, this version-based check can be removed. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1055
It often happens that `pre-commit` finds improperly formatted files that are not in the `dask_cuda` directory, therefore it is sensible to check the entire repo. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) - Jake Awe (https://github.com/AyodeAwe) URL: #1053
This PR pins `dask` and `distributed` to `2022.11.1` for `22.12` release. xref: rapidsai/cudf#12165 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Ray Douglass (https://github.com/raydouglass) - Peter Andreas Entschev (https://github.com/pentschev) URL: #1046
Codecov ReportBase: 0.00% // Head: 0.00% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #1059 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 24 26 +2
Lines 3184 3439 +255
======================================
- Misses 3184 3439 +255
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
❄️ Code freeze for
branch-22.12
and v22.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-22.12
until release (merging of this PR).What is the purpose of this PR?
branch-22.12
intomain
for the release