Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale task writers based on throughput for partitioned tables with skewness #13379

Closed
gaurav8297 opened this issue Jul 27, 2022 · 1 comment
Closed
Assignees

Comments

@gaurav8297
Copy link
Member

gaurav8297 commented Jul 27, 2022

Problem

  • Improve the performance of partitioned writes specifically in case writers/partitions are skewed.

  • Right now, prefer-partitioning only works if you have statistics and the number of partitions is greater than preferred-write-partitioning-min-number-of-partitions (default to 50). However, we know that stats are not always guaranteed to be present in which case partitioned writes will go through from an inefficient route. But with scaling, we could enable prefer-partitioning for any number of partitions thus we don't have to rely on statistics.

Approach (local scaling): (Addressed in #14718)

We will introduce a new local exchanger (ScalePartitionLocalExchanger) that will split the page into different partitions and assign those pages to their respective table-writer operators. Additionally, it will keep track of the partition level physical written bytes coming from the table writer operator and use that to scale the parallelism for a particular skewed partition in a round-robin fashion.

Approach (global scaling):

For prefer partitioning, the distribution across workers happens through the PartitionedOutputOperator which is hard to scale. So, we still haven't figured out what do to with that.

cc @trinodb/maintainers

@findepi
Copy link
Member

findepi commented Sep 9, 2022

@gaurav8297 can you please make the issue description more self-contained.

Also, what's the relation between this one and #14042 ?

@gaurav8297 gaurav8297 changed the title Scale task writers based on throughput for partitioned tables Scale task writers based on throughput for partitioned tables with skewness Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants