Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise shuffle deprecation to align with dask/dask #14762

Merged
merged 3 commits into from
Jan 17, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Jan 16, 2024

Description

It seems that #14708 was a bit too aggressive in updating the shuffle kwarg to shuffle_method, and gpu-CI is currently failing in dask/dask.

This PR aligns dask-cudf with dask.dataframe by warning the user when the shuffle argument is used (rather than raising an error). I definitely don't think it makes sense for RAPIDS to be more strict than dask/dask about this argument.

This also fixes a bug in groupby.aggregate() (where we were still using shuffle=)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora added bug Something isn't working 2 - In Progress Currently a work in progress dask Dask issue non-breaking Non-breaking change labels Jan 16, 2024
@rjzamora rjzamora self-assigned this Jan 16, 2024
@rjzamora rjzamora requested a review from a team as a code owner January 16, 2024 16:59
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 16, 2024
@rjzamora
Copy link
Member Author

cc @pentschev (Thanks for tackling the original fix! Let me know if you have any thoughts on this small update)

@rjzamora rjzamora added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 16, 2024
@rjzamora
Copy link
Member Author

CI failures seem unrelated

@jameslamb
Copy link
Member

I think this would probably be solved by just re-running the failed jobs:

RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/rapidsai/noarch/repodata.json.zst]
    Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

Looks like temporary outage in services from anaconda.org.

raft saw something similar around the same time: https://github.com/rapidsai/raft/actions/runs/7546779797/job/20549033091?pr=2096

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock so we can fix CI

@bdice
Copy link
Contributor

bdice commented Jan 17, 2024

/merge

@rapids-bot rapids-bot bot merged commit 2bead95 into rapidsai:branch-24.02 Jan 17, 2024
68 checks passed
@jakirkham
Copy link
Member

Thanks all! 🙏

Please feel free to submit follow up work if needed. Just want to unblock other PRs

@rjzamora rjzamora deleted the revise-shuffle-deprecation branch January 17, 2024 02:57
@pentschev
Copy link
Member

Sorry I missed this yesterday. Changes look fine to me nevertheless, thanks @rjzamora !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working dask Dask issue non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants