Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backoff for Kafka output #16777

Closed
ycombinator opened this issue Mar 3, 2020 · 5 comments · Fixed by #17808
Closed

Implement backoff for Kafka output #16777

ycombinator opened this issue Mar 3, 2020 · 5 comments · Fixed by #17808
Assignees
Labels

Comments

@ycombinator
Copy link
Contributor

Describe the enhancement:

Currently the Kafka output does not support any sort of backoff for publishing events in the situation where the Kafka broker might've temporarily gone away. We should add support for this, similar to what the Redis and Elasticsearch outputs.

Describe a specific use case for the enhancement or feature:

To prevent the Kafka output aggressively retrying to publish events to a Kafka broker that might have temporarily gone away.

@ycombinator
Copy link
Contributor Author

ycombinator commented Mar 3, 2020

Potentially useful for implementation:

@ycombinator
Copy link
Contributor Author

@faec @urso See the link in the previous comment for backoff/retry options for the producer in Sarama.

@urso
Copy link

urso commented Mar 10, 2020

BackoffFunc is "new". We should give it a try. From experience the 'Backoff' setting did not always work correctly, which did lead to us using 100% CPU (depending where an error occured and if the internal write buffer is full). Not sure if this has been improved.

We try to move most 'retry' handling to libbeat, because sarama does not support infinite retry. This might impact the BackoffFunc I presume. Plus, with exponential backoff we will need a way to reset the wait state upon success.

@ZHumphries
Copy link

Just experienced this issue. Someone changed kafka to use authentication and all winlogbeat instances on every server spiked to 100% cpu once the buffer was full. Fortunately the winlogbeat rollout wasnt complete so vmware didnt grid completely to a halt.

@toby-sutor
Copy link

It's worth mentioning that a temporary workaround is to add the Kafka IPs to the local /etc/hosts file of the Beats nodes to relieve the DNS servers until this has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants