Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for choosing default rechunking method #8843

Merged
merged 3 commits into from
Aug 28, 2024

Conversation

hendrikmakait
Copy link
Member

Sibling to dask/dask#11337

  • Tests added / passed
  • Passes pre-commit run --all-files

@hendrikmakait hendrikmakait requested a review from fjetter as a code owner August 26, 2024 12:38
Copy link
Contributor

github-actions bot commented Aug 26, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    25 files  ± 0      25 suites  ±0   10h 19m 18s ⏱️ + 7m 24s
 4 122 tests + 4   4 002 ✅  -  2    111 💤 ±0  9 ❌ +6 
47 615 runs  +44  45 499 ✅ +38  2 107 💤 ±0  9 ❌ +6 

For more details on these failures, see this check.

Results for commit 88bf685. ± Comparison against base commit ea7d35c.

♻️ This comment has been updated with latest results.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hendrikmakait. Just a couple of minor comments but overall LGTM

)

assert x2.chunks == new
assert np.all(await c.compute(x2) == a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use da.assert_eq here instead? I assume there's some reason -- just curious

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, probably something to do with the client being async

],
)
@gen_cluster(client=True)
async def test_rechunk_heuristic(c, s, a, b, new, expected_algorithm):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised we're testing this here instead of over in dask/dask, where the definition of the heuristic lives. I don't mean this to be a blocking comment -- I'm glad there's test coverage for this somewhere : )

I'm a little out of date on how we're choosing where rechunking-related tests go. Why did you put this test here instead of in dask/dask#11337?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing tests (that I'm aware of) around rechunk configuration live in test_rechunk.py. I don't think there's a perfect place to put these as these two components are generally intertwined and P2P requires a distributed cluster. I could see two things happening:

  • Move the logic around choosing between P2P and tasks on a distributed cluster over into distributed. This way, it lives in the same repo as the tests.
  • Add a test to dask that simply tests that we use tasks-based rechunking when not using a distributed cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Agreed that this is sort of a gray area where there's not a clear and obvious place for these to live. What you have here is fine and we can always move things around later if we want.

@jrbourbeau jrbourbeau added the skip-caching Apply to PRs to disable CI environment caching label Aug 27, 2024
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just bumped CI now that dask/dask#11337 has been merged

@phofl phofl merged commit 4aeed40 into dask:main Aug 28, 2024
22 of 32 checks passed
@phofl
Copy link
Collaborator

phofl commented Aug 28, 2024

thx

@hendrikmakait hendrikmakait deleted the auto-rechunk-heuristic branch August 28, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-caching Apply to PRs to disable CI environment caching
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants