Scale task writers based on throughput for partitioned tables with skewness #13379

gaurav8297 · 2022-07-27T22:12:36Z

Problem

Improve the performance of partitioned writes specifically in case writers/partitions are skewed.
- Known issue: Scale writers when write is partitioned #10791
- Relevant slack thread: https://trinodb.slack.com/archives/CFLB9AMBN/p1642961339264200
Right now, prefer-partitioning only works if you have statistics and the number of partitions is greater than preferred-write-partitioning-min-number-of-partitions (default to 50). However, we know that stats are not always guaranteed to be present in which case partitioned writes will go through from an inefficient route. But with scaling, we could enable prefer-partitioning for any number of partitions thus we don't have to rely on statistics.

Approach (local scaling): (Addressed in #14718)

We will introduce a new local exchanger (ScalePartitionLocalExchanger) that will split the page into different partitions and assign those pages to their respective table-writer operators. Additionally, it will keep track of the partition level physical written bytes coming from the table writer operator and use that to scale the parallelism for a particular skewed partition in a round-robin fashion.

Approach (global scaling):

For prefer partitioning, the distribution across workers happens through the PartitionedOutputOperator which is hard to scale. So, we still haven't figured out what do to with that.

cc @trinodb/maintainers

The text was updated successfully, but these errors were encountered:

findepi · 2022-09-09T08:00:22Z

@gaurav8297 can you please make the issue description more self-contained.

Also, what's the relation between this one and #14042 ?

sopel39 assigned gaurav8297 Sep 9, 2022

gaurav8297 mentioned this issue Sep 26, 2022

Local scale writers for partitioned data #14140

Closed

gaurav8297 changed the title ~~Scale task writers based on throughput for partitioned tables~~ Scale task writers based on throughput for partitioned tables with skewness Sep 28, 2022

gaurav8297 mentioned this issue Nov 2, 2022

Mitigate Writer skewness when writing partitioned data with preferred partitioning enabled #14718

Merged

gaurav8297 closed this as completed Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale task writers based on throughput for partitioned tables with skewness #13379

Scale task writers based on throughput for partitioned tables with skewness #13379

gaurav8297 commented Jul 27, 2022 •

edited by sopel39

Loading

findepi commented Sep 9, 2022

Scale task writers based on throughput for partitioned tables with skewness #13379

Scale task writers based on throughput for partitioned tables with skewness #13379

Comments

gaurav8297 commented Jul 27, 2022 • edited by sopel39 Loading

Problem

Approach (local scaling): (Addressed in #14718)

Approach (global scaling):

findepi commented Sep 9, 2022

gaurav8297 commented Jul 27, 2022 •

edited by sopel39

Loading