Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outputs.graphite: Retry sending metrics immediately after reconnect #3680

Merged
merged 1 commit into from
Jan 17, 2018

Conversation

piotr1212
Copy link
Contributor

If writing to Graphite would fail the plugin would reconnect, but not retry to send the metrics until the next interval. In a situation when the connection would break before the next interval the metrics would never reach the Graphite server. Obviously this is a network issue which but Telegraf should handle these kind of situations better.

This patch retries to send the metrics one time immediately after reconnecting.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated. <- not needed
  • Has appropriate unit tests.

If writing to Graphite would fail the plugin would reconnect, but not retry
to send the metrics until the next interval. In a situation when the connection
would break before the next interval the metrics would never reach the Graphite
server. Obviously this is a network issue which but Telegraf should handle these
kind of situations better.

This patch retries to send the metrics one time immediately after reconnecting.
@danielnelson
Copy link
Contributor

Can you check if your disconnects are detected by the code in checkEOF? I think it might make more sense to reconnect before Write, so that we don't need to double send.

@danielnelson danielnelson added this to the 1.6.0 milestone Jan 16, 2018
@danielnelson danielnelson added the fix pr to fix corresponding bug label Jan 16, 2018
@piotr1212
Copy link
Contributor Author

Yes it does detect them, but I wasn't sure if I have to check all connections and then reconnect the broken ones, reconnect all connections or be happy with at least one connection. I suspect I would have to refactor connect() then as well, as it now reconnects all connections. Or add some way to mark a connection as broken so it won't try to send on it. I thought this would be safe as well, there might be situations which are not detected by checkEOF. I'll think a bit about it, if you have any suggestions let me know.

And this doesn't double send, the second send is only called when the first failed.

@danielnelson
Copy link
Contributor

Okay, I think we will want to refactor this code in the future so that we can reconnect individually to each server. This way the logic can basically be:

for each server
  if not connected
    connect
  write

This is still an improvement, so I'm going to merge.

@danielnelson danielnelson modified the milestones: 1.6.0, 1.5.2 Jan 17, 2018
@danielnelson danielnelson merged commit f374a29 into influxdata:master Jan 17, 2018
danielnelson pushed a commit that referenced this pull request Jan 17, 2018
mkboudreau added a commit to mkboudreau/telegraf that referenced this pull request Jan 18, 2018
* master:
  Update changelog
  Reconnect before sending graphite metrics if disconnected (influxdata#3680)
  Update changelog
  Add support for using globs in devices list of diskio input plugin (influxdata#3687)
  Use go-redis for the redis input (influxdata#3661)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants