improve partition assignment balance (#93) #97

bobh66 · 2021-02-09T17:34:45Z

Note: Before submitting this pull request, please review our contributing
guidelines.

Description

Modifies the partition assignment logic to produce a more balanced distribution of partitions across clients. Specifically:

when table_standby_replicas is set to 0, and the number of partitions is >= the number of clients, on a re-assignment the partitioner will calculate the "minimum" capacity for each client, the smallest number of partitions that each client should have in a balanced distribution, and remove partitions from existing clients that have more than the minimum number. This is to ensure that there are enough unassigned partitions available to assign to all clients.
the partitioner will sort the "candidate" clients by the number of (active or standby) partitions that each client has, and then round-robin over that set, so that the clients with the smallest number of partitions get new partitions added to them before clients with larger numbers of partitions. This only works reliably when table_standby_replicas is 0, because there is logic that will "promote" standby partition assignment to active, which preempts the balancing logic.

jkgenser · 2021-02-10T14:03:11Z

@bobh66: Super excited for this. I came across this recently when I was scaling from 1-8 workers for an 8 partition topic. It's unfortunate that workers 5, 6, and I think 7 do not get a partition. Only once 8/8 workers are up does the load split to 1 partition per worker. This will enable more granular scaling with no standby replicas. Also separately, it looks like in the next release, we'll be able to use one shared rocksdb file for multiple workers which reduces the need for standby replicas in this config.

Of course, people with standby replicas in a sharded environment might want to use them.

bobh66 · 2021-02-10T15:11:37Z

@jkgenser Note that the changes will only remove partitions from existing pods when table_standby_replicas is 0, and the default is 1, so you have to "activate" this change by setting it to 0. If you are using tables then I would not set table_standby_replicas to 0 unless you don't need the table data to persist across a restart or be accessed by multiple pods.

There is a lot more complexity involved when trying to combine the need for balance with the need for "sticky" and "standby" partitions, so this is the first step to handle the simple case where no standby partitions are involved.

jkgenser · 2021-02-10T16:36:50Z

Makes sense. I'm planning on having multiple workers share the same DATA_DIR and bind mount the same dir to all workers. Even though only one worker will own a given rocksdb-partition at a time. This would be the primary case to use your new assignment logic I think

bobh66 added 2 commits February 8, 2021 12:00

improve partition assignment balance (faust-streaming#93)

007b129

restrict partition balancing logic (faust-streaming#93)

d05cacc

Merge branch 'master' into fix_partition_assignment

eb5cbce

patkivikram approved these changes Feb 10, 2021

View reviewed changes

patkivikram merged commit 9e1e8ac into faust-streaming:master Feb 10, 2021

bobh66 mentioned this pull request Oct 8, 2021

Rebalance can leave some workers with no partitions #93

Open

2 tasks

patkivikram mentioned this pull request Oct 28, 2022

Partition Assignments are still non uniform for apps with table_standby_replicas=0 #401

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve partition assignment balance (#93) #97

improve partition assignment balance (#93) #97

bobh66 commented Feb 9, 2021

jkgenser commented Feb 10, 2021

bobh66 commented Feb 10, 2021

jkgenser commented Feb 10, 2021 •

edited

Loading

improve partition assignment balance (#93) #97

improve partition assignment balance (#93) #97

Conversation

bobh66 commented Feb 9, 2021

Description

jkgenser commented Feb 10, 2021

bobh66 commented Feb 10, 2021

jkgenser commented Feb 10, 2021 • edited Loading

jkgenser commented Feb 10, 2021 •

edited

Loading