Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: shuffle segments during distributed pruning #8793

Merged
merged 3 commits into from
Nov 15, 2022

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Nov 14, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

shuffle segment before distributed pruning, to mitigate potential data skew

  • reorder the segments by has(segment_path) % number_of_nodes

so that segments are shuffled, and the same segment is likely(but not assured) to be sent to the same query node (assuming stable membership of the query cluster).

Closes #issue

@vercel
Copy link

vercel bot commented Nov 14, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
databend ✅ Ready (Inspect) Visit Preview Nov 15, 2022 at 1:09PM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 14, 2022
@dantengsky dantengsky force-pushed the feat-shuffle-segments branch from e10cd64 to 427d8c0 Compare November 14, 2022 18:52
@sundy-li
Copy link
Member

If the segment number is too slow (less than query nodes), the redistribute_source_fragment may be inaccurate.

FuseLazyPartInfo seems too lazy to schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants