-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Endless loop while initializing a consumer group #1381
Comments
Hi guys, any update about this issue? |
I see you are connecting to "localhost". If it is port-forwarding, be careful, it can mess up the way the kafka protocol works. But if it is not port forwarding, I guess what we have here is a simple kafka cluster for test purpose ? edit: But if i am wrong, please post the broker config. |
Hi, yes. It’s not a product ready cluster. it’s a test cluster with two nodes on my local machine.
Nevertheless, I think the client should respect retry count and return error instead of endless loop. In my case it’s because of corrupted __consumer_offet topic. In production such error can rarely happen, but still possible, right? Also i am not sure if other error will cause this problem. I think It would be bad if such edge case gets triggered and service gets stuck and worse, monitoring on errors channel can’t observe it.
…Sent from my iPhone
On 18. Aug 2019, at 20:31, Francois Poinsot ***@***.***> wrote:
I see you are connecting to "localhost".
Is you kafka cluster a single node cluster running on your local machine?
Or is it some port-forwarding to a distant cluster?
If it is port-forwarding, be careful, it can mess up the way the kafka protocol works.
All you brokers have an "advertised host" that will be transmitted to clients that need to communicate with these brokers.
This "advertised host" need to be a functional host for you client.
You may have only forwarded "localhost". And if you are running an actual cluster of brokers, I am fairly sure that your brokers are not setup with advertised.host.name=localhost
But if it is not port forwarding, I guess what we have here is a simple kafka cluster for test purpose ?
Then you might as well restart the cluster from scratch. (clean up the data)
If you had any corruption on your __consumer_offsets topic, they will be solved this way.
This kind of corruption should not happen as soon as you will be working with a production-ready setup for you kafka cluster.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This PR introduced that code path, #1231 it also mentions problem related to Sadly, there are no unit tests for this. |
I could see the problem here: if there is an error in 1), 2) never decreases the retries, hence the endless loop |
on top of that, it doesn't matter if the retries are not decreased, as it seems |
let's ping the original PR author to see if it has some idea about what's going on. |
@d1egoaz good eye! I think you're right, this looks like a bug. Agree the line you pointed out is likely to be problematic, and at a glance it does seem like we should be making the recursive call with Unfortunately it could be a week or two before I can have a shot at this, but on the surface it seems like an easy-ish change if somebody else wants to try before then. Thanks again for tracking this down (and sorry for the trouble!) |
|
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. |
Hi, I believe the bug is still there in master https://github.com/Shopify/sarama/blob/4ee86d9c4d49e2d434afa98c39f7db9276ff7d17/consumer_group.go#L198 here the retries is not decreased. |
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. |
I have the same issue. The consumer group client stays in the connection loop when there are no sufficient grants. |
Versions
Sarama Version: 1.22.1 (ea9ab1c)
Kafka Version: 2.1.1
Go Version: go1.11.2
Configuration
What configuration values are you using for Sarama and Kafka?
for Sarama I simply set this config
Logs
Problem Description
I am trying to initialize a consumer group with simple setup from example code. After calling
ConsumerGroup.Consume
, I notice theSetup
function seems not triggered at all. I traced down the issue and found out that the code stuck atretryNewSession
func and more precisely https://github.com/Shopify/sarama/blob/ea9ab1c316850bee881a07bb2555ee8a685cd4b6/consumer_group.go#L189from the log, looks like my test kafka cluster has some corrupt
__consumer_offsets
topicfrom the code, looks like if
RefreshCoordinator
always returns error, then it will keep retrying regardless the retry settings. the call stack ofretryNewSession
will get deeper and deeper.more importantly, I don't seem to find a way to catch this error. so the caller of
Consumer
will never notice this error which makes monitoring on such error impossible.could you check if it's expected behavior? Thank you
The text was updated successfully, but these errors were encountered: