Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitions get revoked and assigned multiple times on other consumer on shut down of one consumer for Cooperative sticky assignment strategybeh #3891

Open
3 tasks
pratikthakkar24 opened this issue Jun 29, 2022 · 4 comments

Comments

@pratikthakkar24
Copy link

pratikthakkar24 commented Jun 29, 2022

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Do NOT create issues for questions, use the discussion forum: https://github.com/edenhill/librdkafka/discussions

Description

<I have a single topic, 3 partitions and 3 consumers.
I am using "rebalance_cb" to register my rebalance callback in all the consumers and i am also using subscribe() in all consumers. That means I am not manually assigning any partition to the consumers.
I am using cooperative-sticky as the partition assignment strategy.

Now lets assume, upon startup of producer (a single producer is producing messages on all the partitions) and consumers,
Kafka has assigned in the following manner.
Partition 0 --> Consumer 1
Partition 1 --> Consumer 2
Partition 2 --> Consumer 3

Now when i shut down Consumer 1, ideally according to the cooperative-sticky behavior, partition 0 should be assigned to one of the other two consumers. That also happens in my case.

Lets say Partition 0 gets assigned to Consumer 2.
But after it gets assigned, again both the partitions (0 and 1) gets revoked from Consumer 2 and they again get assigned to Consumer 2.
This process of revoking and assigning both the partitions happens approximately 7 to 8 times after which any revocation or assignment dont happen.

While this process of revocation and assignment of 0 and 1 partitions in Consumer 2 is ongoing, Messages are also consumed by that consumer. Some of these messages which were consumed during the above process (revocation and assignment) are received duplicate in that consumer.
I suppose this behavior is probably because of the failure of committing those messages to broker while the partitions were revoked and again assigned. (I have used manual commit on each message)

My question was whether this process of revocation and assignment of both partitions on consumer 2 multiple times (upon shut down of consumer 1) is normal ?

I am not able to reproduce the behavior every time i shut down consumer 1, sometimes this process of multiple revocation and assignment doesn't happen and sometimes it happens.

Kindly throw some light.>

How to reproduce

<your steps how to reproduce goes here, or remove section if not relevant>
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • [1.9.0.1] librdkafka version (release number or git tag): <REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current"> - [ ] Apache Kafka version: <REPLACE with e.g., 0.10.2.3>
  • [2.8] librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
  • [Windows Server 2012 R2] Operating system: <REPLACE with e.g., Centos 5 (x64)>
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@mensfeld
Copy link

mensfeld commented Sep 2, 2022

I observe the same behaviour with v1.9.0. Partitions that should stay assigned are being revoked upon one consumer close.

@mensfeld
Copy link

mensfeld commented Oct 7, 2022

@pratikthakkar24 I was able to figure out the reason on my side and was able to reproduce it.

Maybe it's going to be helpful to you:

I was seeing this because, during the rebalance but prior to running the rebalance callback itself, I was calling the rd_kafka_commit. It still impacted the process despite being called with an async flag. Once I got rid of it completely, the cooperative-sticky works as expected.

ref https://github.com/karafka/karafka/pull/1050/files#diff-a9d9ccd2fc27b8adc58d028232236b82563c12e7b46ded7eca808ea9a1ec7baeL124

@mironovdm
Copy link

The problem is described here #4059
In short: manual offset commits during rebalance lead to group generation id error and rebalance of the consumer who sent this commit. Use auto commit instead.

@nhaq-confluent
Copy link

Closing as based on the previous comment, the issue is captured in #4059

If this is indeed different, please reopen or make a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants