-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenate small input chunks before P2P rechunking #8832
Concatenate small input chunks before P2P rechunking #8832
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 25 files ± 0 25 suites ±0 10h 9m 26s ⏱️ - 10m 16s For more details on these failures, see this check. Results for commit 5487c23. ± Comparison against base commit e9d8233. This pull request removes 8 and adds 14 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
f709b3c
to
6f7a107
Compare
A/B tests performed on coiled/benchmarks#1532 show a significant improvement in runtime for tests with small input chunks: As expected, we also see a significant increase in memory usage for those tests: The memory usage is still within very safe bounds. Interestingly, we also see an increase in |
Could you quantify this a little? |
Average memory increases by 10%-30% of the original value. |
@phofl: As discussed offline, I've extended the documentation. LMK if anything is still unclear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few very small comment, but lgtm now!
Thanks for the comments, that was very helpful
( | ||
((2, 2), (1, 1, 1, 1), (1, 1, 1, 1)), | ||
((1, 1, 1, 1), (2, 2), (2, 2)), | ||
((2, 2), (2, 2), (2, 2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one worries me a little bit. the max input chunk is 2, max output chunk is 4 but the algorithm concatenates in a way that we end with 8, which is not great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the block size limit the upper bound here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, https://github.com/dask/distributed/pull/8832/files#diff-0b80e83452ff3472b265026d4516846014500b991e12f3de4a41b39a990afbc6R494 is the limit here, so in this case it's 8 because array.chunk-size
is 16 B
.
# by trying dimensions in decreasing value / weight order. | ||
def key(k: int) -> float: | ||
gse = graph_size_effect[k] | ||
bse = block_size_effect[k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a sentences what these 2 variables represent when you define them above? Took me a bit to figure this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Blocked by (and includes) #8831pre-commit run --all-files