Producer.Retry.BackoffFunc is not applied when broker is down #1719

faec · 2020-06-10T16:48:45Z

Versions

Sarama	Kafka	Go
1.24.1	2.1	1.13.10

(I believe the bug is still present at head but these are the versions I used to diagnose it.)

Configuration

The Kafka configuration is a default 3-node system, set up as described in https://kafka.apache.org/quickstart but omitting steps 3-5 and the topic replication setup.

The relevant Sarama configuration is that Producer.Retry.BackoffFunc is set to a callback that returns exponential backoff in the number of retries.

This bug arises from elastic/beats#19015, which may give relevant context.

Problem Description

If we shut down the lead broker, we observe Sarama's callback being called with retries set to zero every time, which makes it always use the initial retry period, potentially causing overloads.

There is some question whether the two types of error (missing broker vs. a failure response from a live broker) should have the same backoff behavior, but for our use case we definitely need to be able to specify backoff when the broker is down. It seems the intent in the code is to allow this, so I speculate it's an oversight since the current error handling architecture increments retry counts when a response is received, which results in this bug in the special case where there is no response.

I have a candidate fix which works in our tests, which simply adds a separate local variable in partitionProducer.dispatch to track broker-refresh retries separately from retries of a particular message.

The text was updated successfully, but these errors were encountered:

ghost · 2021-03-16T23:14:38Z

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

This was referenced Jun 10, 2020

Properly report retry counts when refreshing the lead broker #1720

Closed

Re-vendor sarama from the current elastic fork elastic/beats#19226

Closed

d1egoaz added the bug :-( label Jun 19, 2020

ghost added the stale Issues and pull requests without any recent activity label Mar 16, 2021

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producer.Retry.BackoffFunc is not applied when broker is down #1719

Producer.Retry.BackoffFunc is not applied when broker is down #1719

faec commented Jun 10, 2020 •

edited

Loading

ghost commented Mar 16, 2021

Producer.Retry.BackoffFunc is not applied when broker is down #1719

Producer.Retry.BackoffFunc is not applied when broker is down #1719

Comments

faec commented Jun 10, 2020 • edited Loading

Versions

Configuration

Problem Description

ghost commented Mar 16, 2021

faec commented Jun 10, 2020 •

edited

Loading