Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

pratikthakkar24 · 2022-06-29T14:51:15Z

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Do NOT create issues for questions, use the discussion forum: https://github.com/edenhill/librdkafka/discussions

Description

<I have a single topic, 3 partitions and 3 consumers.
I am using "rebalance_cb" to register my rebalance callback in all the consumers and i am also using subscribe() in all consumers. That means I am not manually assigning any partition to the consumers.
I am using cooperative-sticky as the partition assignment strategy.

Now lets assume, upon startup of producer (a single producer is producing messages on all the partitions) and consumers,
Kafka has assigned in the following manner.
Partition 0 --> Consumer 1
Partition 1 --> Consumer 2
Partition 2 --> Consumer 3

Now when i shut down Consumer 1, ideally according to the cooperative-sticky behavior, partition 0 should be assigned to one of the other two consumers. That also happens in my case.

Lets say Partition 0 gets assigned to Consumer 2.
But after it gets assigned, again both the partitions (0 and 1) gets revoked from Consumer 2 and they again get assigned to Consumer 2.
This process of revoking and assigning both the partitions happens approximately 7 to 8 times after which any revocation or assignment dont happen.

While this process of revocation and assignment of 0 and 1 partitions in Consumer 2 is ongoing, Messages are also consumed by that consumer. Some of these messages which were consumed during the above process (revocation and assignment) are received duplicate in that consumer.
I suppose this behavior is probably because of the failure of committing those messages to broker while the partitions were revoked and again assigned. (I have used manual commit on each message)

My question was whether this process of revocation and assignment of both partitions on consumer 2 multiple times (upon shut down of consumer 1) is normal ?

I am not able to reproduce the behavior every time i shut down consumer 1, sometimes this process of multiple revocation and assignment doesn't happen and sometimes it happens.

Kindly throw some light.>

How to reproduce

<your steps how to reproduce goes here, or remove section if not relevant>
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

[1.9.0.1] librdkafka version (release number or git tag): <REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current"> - [ ] Apache Kafka version: <REPLACE with e.g., 0.10.2.3>
[2.8] librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
[Windows Server 2012 R2] Operating system: <REPLACE with e.g., Centos 5 (x64)>
Provide logs (with debug=.. as necessary) from librdkafka
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

mensfeld · 2022-09-02T08:45:17Z

I observe the same behaviour with v1.9.0. Partitions that should stay assigned are being revoked upon one consumer close.

mensfeld · 2022-10-07T16:10:59Z

@pratikthakkar24 I was able to figure out the reason on my side and was able to reproduce it.

Maybe it's going to be helpful to you:

I was seeing this because, during the rebalance but prior to running the rebalance callback itself, I was calling the rd_kafka_commit. It still impacted the process despite being called with an async flag. Once I got rid of it completely, the cooperative-sticky works as expected.

ref https://github.com/karafka/karafka/pull/1050/files#diff-a9d9ccd2fc27b8adc58d028232236b82563c12e7b46ded7eca808ea9a1ec7baeL124

mironovdm · 2022-12-25T00:02:18Z

The problem is described here #4059
In short: manual offset commits during rebalance lead to group generation id error and rebalance of the consumer who sent this commit. Use auto commit instead.

nhaq-confluent · 2024-02-27T13:20:33Z

Closing as based on the previous comment, the issue is captured in #4059

If this is indeed different, please reopen or make a new issue.

pratikthakkar24 mentioned this issue Jul 4, 2022

Issues with "partition.assignment.strategy=cooperative-sticky" #3306

Closed

7 tasks

mensfeld mentioned this issue Oct 3, 2022

Support the incremental partition assign and unassign APIs for the cooperative rebalancing protocol when a rebalance listener is set karafka/rdkafka-ruby#221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

pratikthakkar24 commented Jun 29, 2022 •

edited

Loading

mensfeld commented Sep 2, 2022

mensfeld commented Oct 7, 2022

mironovdm commented Dec 25, 2022

nhaq-confluent commented Feb 27, 2024

Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

Comments

pratikthakkar24 commented Jun 29, 2022 • edited Loading

Description

How to reproduce

Checklist

mensfeld commented Sep 2, 2022

mensfeld commented Oct 7, 2022

mironovdm commented Dec 25, 2022

nhaq-confluent commented Feb 27, 2024

pratikthakkar24 commented Jun 29, 2022 •

edited

Loading