Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make table.read_partitions distributed #7805

Closed
Tracked by #7823
BohuTANG opened this issue Sep 22, 2022 · 3 comments
Closed
Tracked by #7823

feat: make table.read_partitions distributed #7805

BohuTANG opened this issue Sep 22, 2022 · 3 comments
Assignees
Labels
C-improvement Category: improvement

Comments

@BohuTANG
Copy link
Member

BohuTANG commented Sep 22, 2022

Summary

table.read_partitions may do many IO operations, such as the min-max index filter or bloom filter index filter.
If a table has many partitions, the read_partitions will be very slow.

For distributed, we can:

  1. read_partitions return segments instead of partition if the segments > 1000
  2. Distribute the Partitions to cluster
  3. In read2, to check file is segment or partition file
@BohuTANG BohuTANG added the C-improvement Category: improvement label Sep 22, 2022
@BohuTANG
Copy link
Member Author

cc @dantengsky

@Xuanwo
Copy link
Member

Xuanwo commented Sep 23, 2022

I expect to decouple ReadDataSourcePlan from the Table API in #7816.

Please let me know if anything I can help with. @zhang2014

@BohuTANG
Copy link
Member Author

BohuTANG commented Oct 8, 2022

Impl in #7867 cc @zhang2014

@BohuTANG BohuTANG closed this as completed Oct 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-improvement Category: improvement
Projects
None yet
Development

No branches or pull requests

3 participants