-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka output backoff does not seem to grow as expected #19015
Comments
Pinging @elastic/integrations (Team:Integrations) |
Here's multiple lines of the relevant output:
Notice how |
The updated version of the sarama fork has been released in 7.9. Can we close this? |
Version:
master
/8.0.0
.In #17808 we implemented a stateless exponential-backoff-with-jitter for the Kafka output. We even added a unit test we believed would test the growth of the backoff duration over time and that test has been passing!
However, it appears that when we add some debugging statements to the backoff function, the actual backoff duration doesn't grow as expected — exponentially but with some jitter, up to a max value.
Steps to reproduce
To test this PR, one needs to cause the partition leader node to go away, which should trigger the backoff. One also needs to subsequently bring the partition leader node back online to ensure that the backoff duration has been reset.
Start a 3-node Kafka cluster. Follow the instructions through step 6 on https://kafka.apache.org/quickstart but skip steps 3, 4, and 5.
Configure
filebeat.yml
(could be any Beat, but Filebeat is easy for testing) like so:Start writing log lines to the test log file.
Modify the source of the backoff function to emit the computed backoff duration right before it is returned, like so:
Build Filebeat with the modified source.
Start Filebeat.
Check that the
foobar
topic has been created in Kafka.This topic is expected to have only 1 partition, partition
0
. Note this partition's leader node ID.Now kill the Kafka node corresponding to the partition leader's node ID.
Re-run the command in step 5 and verify that that partition
0
has no leader.Check the Filebeat log. You should now see multiple entries like this:
Note that the backoff duration reported in these entries seems to hover around the
backoff.init
value but never grows towards thebackoff.max
value.The text was updated successfully, but these errors were encountered: