Add support for StatsD style aggregator #39

pauldix · 2015-07-02T22:17:30Z

We should support the StatsD protocol and aggregation. However, unlike StatsD, the metric names should follow the conventions of the key section of the InfluxDB line protocol.

The StatsD values should be output as a single field called value. This should be able to flush to any of the output sinks like what is mentioned in #35.

This means that a single Telegraf instance could serve as a StatsD aggregator that works with the InfluxDB schema design of measurements and tags.

The text was updated successfully, but these errors were encountered:

nstott · 2015-07-11T14:31:31Z

Looking at the statsd spec from here:

https://github.com/b/statsd_spec

@pauldix are you thinking of a line format something like this?

cpu_load_short,host=server01,region=us-west:2.34|g
cpu_load_short,host=server01,region=us-west:3.42|g
errors,host=server01,region=us-west:1|c

where the server adds the timestamp either when it receives the message, or perhaps in the case of counters, adding the timestamp when it flushes to a sink might be more appropriate

liyichao · 2015-07-16T08:12:38Z

It may be good if telegraf can add hostname as a tag instead of application sending hostname, because application may run in a container.

pauldix · 2015-07-16T16:21:00Z

@nstott yeah, that's exactly what I was thinking. Telegraf should specify timestamps when it flushes. In general when writing to InfluxDB it's best to specify timestamps. That way if there is a partial write in a cluster, you can just write again and it's idempotent.

@liyichao the issue is that you'd have one telegraf server collecting all the metrics for all of your hosts (like what you do with StatsD). Essentially one of your telegraf installs would become your statsd server.

nstott · 2015-07-16T17:30:47Z

I'll see if i can knock something out in the next few days for this

alvaromorales · 2015-08-05T22:59:31Z

+1

skyrocknroll · 2015-08-13T03:44:40Z

This is one of the awesome feature to have 👍

zp-markusp · 2015-08-13T05:58:17Z

+1

rvrignaud · 2015-09-11T12:54:24Z

+1

caquino · 2015-09-20T10:53:48Z

+1, having a replacement for StatsD/datadog-agent-statsd will make the migration from other services way easier.

ranjib · 2015-09-21T16:16:13Z

@pauldix is anyone actively working on it. if not i can take a stab at it. this will be a really useful feature. Im currently running an additional statsd agent (statsdaemon) along side telegraf for this.
@sparrc comments?

sparrc · 2015-09-21T16:40:56Z

@ranjib I am hoping to work on this today

pauldix · 2015-09-21T18:49:44Z

With the 0.9.5 release coming we'll have support for many fields and we'll stop pushing people to only have a single field per measurement. We should support writing data to multiple fields. I'm thinking that we can support the StatsD protocol like I mentioned above, but we should also make it possible to write values into different fields. I'm thinking it should look exactly like the line protocol.

skyrocknroll · 2015-09-21T19:31:49Z

+1 @pauldix #39 (comment)

skyrocknroll · 2015-10-05T08:26:31Z

does somebody working on this ? Is the any ETA or target release ?

ranjib · 2015-10-05T08:29:03Z

@skyrocknroll #237

sparrc · 2015-10-05T17:25:43Z

It's something I'm working on right now. At the moment I have counters, gauges, and sets working. I still have a ways to go with timers, as they're a bit more complicated.

I'm hoping to have timers working by the end of the week, life permitting ;-)

skyrocknroll · 2015-10-05T17:32:53Z

Thank you @ranjib

@sparrc
Thank you for your kind update. Right now just to maintain the count we are inserting lot of records. If influxdb statsd is there then our No of records will reduce to 1/1000 th :) and performance will improve a lot.

Eagerly waiting for the release :)

sparrc · 2015-10-05T17:52:38Z

@skyrocknroll Since InfluxDB is a bit more powerful than Graphite, the default behavior is going to be a little different than a typical statsd server.

to give you a little preview, counters would look something like this:

Metrics sent:

$ echo "deploys.test.myservice:1|c" | nc -C -w 1 -u localhost 8125
[10s later...]
$ echo "deploys.test.myservice:1|c" | nc -C -w 1 -u localhost 8125

Telegraf debug output:

> [] statsd_deploys_test_myservice_counter value=1
2015/10/05 11:49:25 Cranking default (10s) interval, gathered 1 metrics from 1 plugins in 142.169µs
> [] statsd_deploys_test_myservice_counter value=2
2015/10/05 11:49:35 Cranking default (10s) interval, gathered 1 metrics from 1 plugins in 99.549µs
> [] statsd_deploys_test_myservice_counter value=2
2015/10/05 11:49:45 Cranking default (10s) interval, gathered 1 metrics from 1 plugins in 59.998µs

As you can see, counters will be maintained and reported at each collection interval, and they will not be cleared by default.

Since I've never used statsd in production, I'd love to hear what you (and anyone else in this thread) thinks of that behavior.

Thanks a bunch!

skyrocknroll · 2015-10-05T18:11:42Z

@sparrc wherever i have used , counters are always associated with time. Like requests per second.
Some actions per second. So it would be better if we clear of counter values after each flush. For gauge maintaining values across each flush does make sense.

So default behavior

counter --> reset to 0 after each flush.
guage -> maintaining the value between flush.
But providing everything as configurable is awesome :)
for more details
https://github.com/etsy/statsd/blob/master/exampleConfig.js
https://github.com/etsy/statsd/blob/master/docs/metric_types.md

sparrc · 2015-10-05T18:36:30Z

My problem resetting the counter is this: InfluxDB provides you with the ability to calculate rates of change on counters that are always-increasing (like this: SELECT non_negative_derivative(value, 1s) FROM statsd_deploys_myservice_counter)

If the counter reset, this obviously wouldn't work, and calculating rates of change on the counter requires knowledge of the flushing interval. This also means that the flushing interval can never be changed once the data starts being collected. With an ever-increasing counter, you are able to change the collection interval completely arbitrarily, because you simply have timestamps associated with different points in the counters' upward trajectory.

To me this makes more sense because it is also generally how OS-level counters work, ie: network bytes & packets received and sent, CPU ticks, etc.

Let me know what you think, the general idea here is that working with InfluxDB is less limited than working with Graphite since it's query language is more featured. Statsd was a protocol built with graphite in mind, and I'd like our implementation to support InfluxDB better.

skyrocknroll · 2015-10-06T04:25:20Z

@sparrc I agree with you. one more question. How we are planning to write data using this ?
Pointing influxdb client to telegraf statsd or we should use separate influx-statsd client which supports tags & fields along with measurement .

sparrc · 2015-10-06T04:41:46Z

It will be a "plugin" on one of your telegraf instances. That telegraf instance will open up a port and listen for UDP packets, where you can send your normal statsd-style packets. On the regular telegraf interval, the statsd server will be flushed and all data will be sent to InfluxDB.

skyrocknroll · 2015-10-06T05:16:30Z

@sparrc Will the line format support tags & fields of influxdb ? Right now we are not using any of statsd influxdb writer because those doesn't understand influxdb tags & fields.

sparrc · 2015-10-06T05:22:41Z

yes, it will support a way to create a mapping of a statsd "bucket" to an influxdb measurement with tags: https://github.com/influxdb/telegraf/blob/statsd/plugins/statsd/README.md

zp-markusp · 2015-10-06T05:35:14Z

Why don't you take advantages of influxdb and use the line protocol syntax? So that you are able to define tags on the fly and don't rely on any hardcoded dot separated order?

Regards, Markus

skyrocknroll · 2015-10-06T08:42:33Z

@sparrc as @zp-markusp said we were looking exactly the same feature. We see influx tags & fields unbeatable feature. If we use the same line protocol then we get all the dynamism of tags and filed and also counters & gauge at the telegraf level.

Or may be we need both of it . Plain statsd for statsd protocol and statsd features with the line protocol.

Plain statstd strips away all the awesomeness of tags & fields.

Datatog has both plain statsd and also datadog-statsd which supports tags.

justin8 · 2015-10-06T11:01:32Z

It would be very useful to support both. Being able to use it as a drop in replacement for things like datadog would be really useful, with the added benefit that you can alter your apps to utilize tags afterwards. It would make the barrier for entry incredibly low.

sparrc · 2015-10-06T17:04:29Z

Thanks everyone for the input, especially for the datadog-statsd link, that is very useful and it seems like they have created a good system for adding tags to statsd lines.

As I see it, there are two options we can support: datadog-statsd is closer to plain statsd and simply adds a list of tags after a |# character. influx-statsd would be similar to what @nstott wrote above. It is less similar to plain statsd but more similar to the InfluxDB line protocol.

I'm leaning towards only supporting datadog-statsd because then users can more easily migrate between influxdb and datadog, and it also allows people to use existing datadog statsd clients. If we create our own statsd protocol, we're contributing to this problem

@justin8 @skyrocknroll @zp-markusp @pauldix @nathanielc What would you prefer between these two tag formatting options? should we support both?

datadog-statsd

cpu.load.short:2.34|g|#host:server01,region:us-west

influx-statsd

cpu.load.short,host=server01,region=us-west:2.34|g

skyrocknroll · 2015-10-06T17:40:04Z

@sparrc I would like to go with influx-statsd because it will give us consistency across whole influxdb ecosystem.It looks very similar to influxdb line protocol. Also @pauldix #39 (comment) was mentioning about supporting multiple values. If we are going to design influxdb-statstd lets provision a way to support multiple field values also.

But right now i don't see strong importance on supporting multiple field values. But others may help on this.
I am thinking of something like this if we support multiple field values.

temperature,machine=unit42,type=assembly internal=32|g,external=100|c

zp-markusp · 2015-10-06T17:43:31Z

From a gut feeling perspective I would prefer influx-statsd as this could be implemented without changing the statsd library on the application side as it follows the pattern string{identifier}{value}{statsd type}. So just the identifier has to be exchanged.

skyrocknroll · 2015-10-06T17:46:31Z

@zp-markusp +1 One way is we can try to parse the identifier on the telegraf side and if it has tags then lets use it as measurment & tags otherwise we can use whole identifier as measurement in influxdb.

zp-markusp · 2015-10-06T17:51:48Z

For example the standard statsd output from logstash could be used.

nathanielc · 2015-10-06T17:58:56Z

I say influx-statsd since its a subset of the statsd protocol, like @zp-markusp said. It won't require a new client.

I think you should also do something similar to the graphite plugin in InfluxDB that allows you to transform a metric name into a measurement, fields, and tags set. See https://github.com/influxdb/influxdb/tree/master/services/graphite#templates

This will allow for users that already have lots of tag data in the metric name,
i.e us-west.server01.cpu.short.load:2.34|g

sparrc · 2015-10-06T18:08:52Z

okay, good point @skyrocknroll about supporting multiple fields, how about this:

measurement[,tag1=key1,tag2=key2]:[field=]value[,field2=value2]|type

so an example would look like:

cpu.usage,host=server01,region=us-west:idle=10.0,user=50.0,system=40.0|g
=> statsd_cpu_usage_gauge,host='server01',region='us-west' idle=10,user=50,system=50

field names and tags are optional, so you could also just do this:

cpu.usage.idle:10.0|g
=> statsd_cpu_usage_idle_gauge value=10

@nathanielc thanks for pointing me to that, I did not realize that we already had a graphite template transformation setup, I was going to have telegraf have a configuration table for transforming the statsd bucket into tags like this: https://github.com/influxdb/telegraf/blob/statsd/plugins/statsd/README.md#statsd-bucket---influxdb-mapping, but I may want to borrow from the influxdb graphite template instead.

zp-markusp · 2015-10-06T18:22:20Z

@sparrc does it make sense to hard code the gauge as suffix to the name?
I would propose to either ignore it or add it as a tag (statsd-type=gauge)

nathanielc · 2015-10-06T18:24:42Z

I don't see a strong need to support multiple fields either. StatsD is an event counter, seems odd to want to send multiple fields for a single event. But as long as it is backwards compatible with the StatsD protocol (like your example) I don't see an issue supporting it.

sparrc · 2015-10-06T19:00:11Z

@zp-markusp I like that idea more too, I'll change the behavior to add a metric_type tag 👍

justin8 · 2015-10-06T20:44:47Z

Bit late to reply to this one now; but the way it seems to be heading sounds great! Backwards compatible with extra features/tags 👍

sparrc · 2015-10-15T21:12:18Z

This is now in master and can be gotten by building from source, see README here for documentation and usage details: https://github.com/influxdb/telegraf/tree/master/plugins/statsd

more feedback is much appreciated, thanks all

penguincp · 2017-03-11T05:51:40Z

According to #1876 (commented by sparrc on Oct 11, 2016), multiple field support (e.g. cpu.usage,host=server01,region=us-west:idle=10.0,user=50.0,system=40.0|g) was removed and will not be supported in the future, why?

danielnelson · 2017-03-13T18:40:16Z

@penguincp The statsd protocol is incompatible with multiple fields, we do support multiple tags and you can use a stat for each field. If you would like to discuss this further please open a new issue or ask a questions at the InfluxData Community site.

sparrc added the enhancement label Aug 5, 2015

sparrc added the plugin request label Aug 24, 2015

sparrc closed this as completed in 6977119 Oct 15, 2015

gunnaraasen mentioned this issue Nov 5, 2015

StatsD output #348

Open

rijnhard mentioned this issue Nov 10, 2015

Support InfluxDB/Telegraf style tags DataDog/dogstatsd-ruby#24

Open

chrusty mentioned this issue Mar 5, 2016

GREEDY field templates for the graphite parser #789

Closed

Add support for StatsD style aggregator #39

Add support for StatsD style aggregator #39

Comments

pauldix commented Jul 2, 2015

nstott commented Jul 11, 2015

liyichao commented Jul 16, 2015

pauldix commented Jul 16, 2015

nstott commented Jul 16, 2015

alvaromorales commented Aug 5, 2015

skyrocknroll commented Aug 13, 2015

zp-markusp commented Aug 13, 2015

rvrignaud commented Sep 11, 2015

caquino commented Sep 20, 2015

ranjib commented Sep 21, 2015

sparrc commented Sep 21, 2015

pauldix commented Sep 21, 2015

skyrocknroll commented Sep 21, 2015

skyrocknroll commented Oct 5, 2015

ranjib commented Oct 5, 2015

sparrc commented Oct 5, 2015

skyrocknroll commented Oct 5, 2015

sparrc commented Oct 5, 2015

skyrocknroll commented Oct 5, 2015

sparrc commented Oct 5, 2015

skyrocknroll commented Oct 6, 2015

sparrc commented Oct 6, 2015

skyrocknroll commented Oct 6, 2015

sparrc commented Oct 6, 2015

zp-markusp commented Oct 6, 2015

skyrocknroll commented Oct 6, 2015

justin8 commented Oct 6, 2015

sparrc commented Oct 6, 2015

skyrocknroll commented Oct 6, 2015

zp-markusp commented Oct 6, 2015

skyrocknroll commented Oct 6, 2015

zp-markusp commented Oct 6, 2015

nathanielc commented Oct 6, 2015

sparrc commented Oct 6, 2015

zp-markusp commented Oct 6, 2015

nathanielc commented Oct 6, 2015

sparrc commented Oct 6, 2015

justin8 commented Oct 6, 2015

sparrc commented Oct 15, 2015

penguincp commented Mar 11, 2017 • edited Loading

danielnelson commented Mar 13, 2017

penguincp commented Mar 11, 2017 •

edited

Loading