outputs.graphite: Retry sending metrics immediately after reconnect #3680

piotr1212 · 2018-01-16T15:48:26Z

If writing to Graphite would fail the plugin would reconnect, but not retry to send the metrics until the next interval. In a situation when the connection would break before the next interval the metrics would never reach the Graphite server. Obviously this is a network issue which but Telegraf should handle these kind of situations better.

This patch retries to send the metrics one time immediately after reconnecting.

Required for all PRs:

Signed CLA.
Associated README.md updated. <- not needed
Has appropriate unit tests.

If writing to Graphite would fail the plugin would reconnect, but not retry to send the metrics until the next interval. In a situation when the connection would break before the next interval the metrics would never reach the Graphite server. Obviously this is a network issue which but Telegraf should handle these kind of situations better. This patch retries to send the metrics one time immediately after reconnecting.

danielnelson · 2018-01-16T21:35:07Z

Can you check if your disconnects are detected by the code in checkEOF? I think it might make more sense to reconnect before Write, so that we don't need to double send.

piotr1212 · 2018-01-17T16:15:05Z

Yes it does detect them, but I wasn't sure if I have to check all connections and then reconnect the broken ones, reconnect all connections or be happy with at least one connection. I suspect I would have to refactor connect() then as well, as it now reconnects all connections. Or add some way to mark a connection as broken so it won't try to send on it. I thought this would be safe as well, there might be situations which are not detected by checkEOF. I'll think a bit about it, if you have any suggestions let me know.

And this doesn't double send, the second send is only called when the first failed.

danielnelson · 2018-01-17T23:25:30Z

Okay, I think we will want to refactor this code in the future so that we can reconnect individually to each server. This way the logic can basically be:

for each server
  if not connected
    connect
  write

This is still an improvement, so I'm going to merge.

(cherry picked from commit f374a29)

* master: Update changelog Reconnect before sending graphite metrics if disconnected (influxdata#3680) Update changelog Add support for using globs in devices list of diskio input plugin (influxdata#3687) Use go-redis for the redis input (influxdata#3661)

danielnelson added this to the 1.6.0 milestone Jan 16, 2018

danielnelson added the fix pr to fix corresponding bug label Jan 16, 2018

danielnelson modified the milestones: 1.6.0, 1.5.2 Jan 17, 2018

danielnelson merged commit f374a29 into influxdata:master Jan 17, 2018

danielnelson pushed a commit that referenced this pull request Jan 17, 2018

Reconnect before sending graphite metrics if disconnected (#3680)

8b566b2

(cherry picked from commit f374a29)

maxunt pushed a commit that referenced this pull request Jun 26, 2018

Reconnect before sending graphite metrics if disconnected (#3680)

5f8d908

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

outputs.graphite: Retry sending metrics immediately after reconnect #3680

outputs.graphite: Retry sending metrics immediately after reconnect #3680

piotr1212 commented Jan 16, 2018

danielnelson commented Jan 16, 2018

piotr1212 commented Jan 17, 2018

danielnelson commented Jan 17, 2018

outputs.graphite: Retry sending metrics immediately after reconnect #3680

outputs.graphite: Retry sending metrics immediately after reconnect #3680

Conversation

piotr1212 commented Jan 16, 2018

Required for all PRs:

danielnelson commented Jan 16, 2018

piotr1212 commented Jan 17, 2018

danielnelson commented Jan 17, 2018