Not able to send metrics to Datadog #6093

harshitdx29 · 2019-07-09T07:38:40Z

Relevant telegraf.conf:

# # Configuration for DataDog API to send metrics to.
[[outputs.datadog]]
#   # Datadog API key
    apikey = "**************************************"
#
#   ## Connection timeout.
    timeout = "5s"

Expected behavior:

I expect my metrics to be sent to datadog.

Actual behavior:

Getting timeout error. But when I hit Datadog API directly from the same host, it's working so certainly not a connectivity issue
The API I am hitting is:

curl -X POST -H "Content-type: application/json"
-d "{ "series" :
[{"metric":"test.metric",
"points":[[$currenttime, 20]],
"type":"rate",
"interval": 20,
"host":"test.example.com",
"tags":["environment:test"]}
]
}"
'https://api.datadoghq.com/api/v1/series?api_key=<YOUR_API_KEY>'

Additional info:

2019-07-09T07:37:15Z E! Error writing to output [datadog]: error POSTing metrics, Post https://app.datadoghq.com/api/v1/series?api_key=****************: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

The text was updated successfully, but these errors were encountered:

danielnelson · 2019-07-09T17:13:01Z

@harshitdx29 I checked that this plugin is working for me, and also I took a quick look over the code for this plugin, but I don't see any obvious cause other than a timeout/networking issue. Any chance you are you using a proxy?

harshitdx29 · 2019-07-10T04:54:24Z

It can't be a networking issue because I am able to hit the datadog URL directly using CURL.

Also, we are not using any proxy for this.

danielnelson · 2019-07-10T19:50:03Z

Any chance we just need to increase the timeout option? Telegraf is likely sending a much larger payload compared to the curl comman.

Otherwise, first try running in foreground (not as service) from the same shell as the curl command. If that still doesn't help could you try from a different computer preferably at a different location.

harshitdx29 · 2019-07-31T07:01:33Z

I increased the timeout to 60s but to no success. Also, I re confirmed from devops here, we are not using any proxy.

harshitdx29 · 2019-07-31T10:09:33Z

I am now getting 2019-07-31T10:09:02Z E! Error writing to output [datadog]: received bad status code, 413 from telegraf logs.

danielnelson · 2019-07-31T16:48:35Z

This means Datadog rejected the payload because it was too large, what is your agent metric_batch_size? Try reducing it and see if it helps.

harshitdx29 · 2019-07-31T16:53:05Z

It was default of 1000. Reduced it to 500. Also I noticed another problem in all of our services. For the initial duration of starting the telegraf container I get the following error:

Error writing to output [datadog]: error POSTing metrics, Post https://app.datadoghq.com/api/v1/series?api_key=****************: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

After that it creates connection.

harshitdx29 · 2019-07-31T16:54:44Z

I also started getting a lot of errors:

2019-07-31T16:54:20Z W! Skipping a scheduled flush because there is already a flush ongoing.
2019-07-31T16:54:20Z E! Error in plugin [inputs.system]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.disk]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.diskio]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.net]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.netstat]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.statsd]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.mem]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.processes]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.kernel]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.docker]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.swap]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.conntrack]: took longer to collect than collection interval (10s)
2019-07-31T16:54:20Z E! Error in plugin [inputs.cpu]: took longer to collect than collection interval (10s)

danielnelson · 2019-07-31T16:59:17Z

The first error is a timeout, Datadog took too long to respond (60s?). The second is a common error in older Telegraf versions when the output is taking a long time to write, it will probably go away if you update.

harshitdx29 · 2019-07-31T17:04:00Z

For the first one why does it take too long to respond using telegraf. If I directly hit datadog apis it responds quickly.

For the second one can you suggest how to update the version. I am using docker to run telegraf container. Also it started coming when I reduced metric_batch_size to 500. Earlier it was not coming.

harshitdx29 · 2019-07-31T17:12:50Z

Reducing metrics batch size worked. Thanks :)

danielnelson · 2019-07-31T18:38:05Z

Great, I'm going to close this issue then. I think the full batch sizes were just to large for DataDog to process quickly or they were taking too long to upload. If you want to get further visibility into this you could enable the internal plugin which will output metrics about how log it takes to write each batch.

harshitdx29 · 2019-08-01T03:53:02Z

2019-08-01T03:47:10Z E! [agent] Error writing to output [datadog]: unable to marshal TimeSeries, json: unsupported value: NaN Getting this now :(

danielnelson · 2019-08-01T06:15:20Z

I imagine this should be an easy fix, could you do me a favor and open a new issue for it? This way when we reference the bug in the changelog it will be clear what the issue was and it won't be muddied by the previous conversation.

harshitdx29 · 2019-08-01T06:21:05Z

Done: #6191

danielnelson added bug unexpected problem or unintended behavior need more info labels Jul 9, 2019

danielnelson closed this as completed Jul 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to send metrics to Datadog #6093

Not able to send metrics to Datadog #6093

harshitdx29 commented Jul 9, 2019

danielnelson commented Jul 9, 2019

harshitdx29 commented Jul 10, 2019

danielnelson commented Jul 10, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Aug 1, 2019

danielnelson commented Aug 1, 2019

harshitdx29 commented Aug 1, 2019

Not able to send metrics to Datadog #6093

Not able to send metrics to Datadog #6093

Comments

harshitdx29 commented Jul 9, 2019

Relevant telegraf.conf:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Jul 9, 2019

harshitdx29 commented Jul 10, 2019

danielnelson commented Jul 10, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

harshitdx29 commented Jul 31, 2019

danielnelson commented Jul 31, 2019

harshitdx29 commented Aug 1, 2019

danielnelson commented Aug 1, 2019

harshitdx29 commented Aug 1, 2019