Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use task-based rechunking to prechunk along partial boundaries #8831

Merged
merged 12 commits into from
Aug 20, 2024

Conversation

hendrikmakait
Copy link
Member

This PR faciliates P2P rechunking by offloading pre-chunking along partial boundaries to task-based rechunking. This also functions as a predecessor to pre-concatenation of small chunks which can also be offloaded to P2P rechunking.

  • Tests added / passed
  • Passes pre-commit run --all-files

@hendrikmakait hendrikmakait requested a review from fjetter as a code owner August 13, 2024 20:26
@hendrikmakait hendrikmakait marked this pull request as draft August 13, 2024 20:26
Copy link
Contributor

github-actions bot commented Aug 13, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    25 files  ±  0      25 suites  ±0   10h 20m 20s ⏱️ + 8m 25s
 4 112 tests + 11   3 990 ✅ +  7    113 💤 ±0  9 ❌ +4 
47 511 runs  +121  45 371 ✅ +117  2 131 💤 ±0  9 ❌ +4 

For more details on these failures, see this check.

Results for commit 245f345. ± Comparison against base commit fe79a36.

♻️ This comment has been updated with latest results.

@hendrikmakait hendrikmakait force-pushed the use-rechunking-to-prechunk branch from d1a204c to c23fd8f Compare August 15, 2024 08:45
@@ -64,7 +64,7 @@ repos:
- tornado
- pyarrow
- urllib3
- git+https://github.com/dask/dask
- git+https://github.com/hendrikmakait/dask@prechunk-p2p-rechunk
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Revert before merging

@hendrikmakait hendrikmakait marked this pull request as ready for review August 20, 2024 08:50
Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 2 nits, the logic seems to be fine that you added, I still don't quite understand the logic in _slice_new_chunks_into_partials but that't out of scope here

@@ -431,52 +432,108 @@ def _construct_graph(self) -> _T_LowLevelGraph:
return dsk


def _prechunk_for_partials(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name implies to me that we apply the rechunking here, could you rename to something that tells us that we are only calculating the chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[((2, 2),), ((2, 2),), ((2, 2),)],
[((2, 2),), ((4,),), ((2, 2),)],
[((2, 2),), ((1, 1, 1, 1),), ((2, 2),)],
[((2, 2, 2),), ((1, 2, 2, 1),), ((1, 1, 1, 1, 2),)],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the last 2 in expected a good idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the 1 in new at the end means that the input chunk only creates 2 output chunks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this a little more, and you're right! There was a small bug here. I've adjusted the algorithm accordingly.

@phofl
Copy link
Collaborator

phofl commented Aug 20, 2024

looks like there is an env mismatch of some kind because of tags

@hendrikmakait hendrikmakait merged commit e9d8233 into dask:main Aug 20, 2024
21 of 29 checks passed
@hendrikmakait hendrikmakait deleted the use-rechunking-to-prechunk branch August 20, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants