Telegraf should do some simple metric aggregation/rollup #380

ekini · 2015-11-19T12:44:33Z

Let's say I have 1k metrics per second, generated by one host, with the same tags, but different values.
I want to sum all values, aggregated by 1 minute.

I can send all of them to InfluxDB and do aggregation there. It works for a few hosts, but what if I have thousands of them? InfluxDB will just die.

I'm not speaking about complex functions, but some simple ones like sum(), count() and mean() would be nice to have.

sparrc · 2015-11-20T17:31:31Z

This is an interesting idea, do you have any ideas for how these aggregation functions could be configured? It would probably need to be a separate [aggregation] section of the config, where you could define different aggregation functions, maybe like this:

[aggregations]
[[aggregations.sum]]
measurement = "cpu_usage_*"
interval = "60s"
...
[[aggregations.mean]]
...

This would then need to be processed after collection. It'll be a little tricky I think because these measurements will need to be gathered, but then dropped before they get flushed (but flushed as part of the aggregate).

Another option could be putting the aggregate config as part of each plugin config, maybe something like this:

[cpu]
percpu = true
totalcpu = true
drop = ["cpu_time"]
[cpu.sum]
...
[cpu.mean]
...

sparrc · 2015-11-20T17:44:51Z

BTW, @ekini which plugin is generating that many metrics?

ekini · 2015-11-20T18:02:14Z

The one that parses logs :)

I've been thinking about it a bit, and I'm still not sure how to configure aggregation. But there should be some grouping, by time and tags.

sparrc · 2015-11-20T18:23:42Z

Seems like aggregations could be their own special type of plugin. They could live in their own directory and have an interface to make it easy for contributors.

Mechanically, I'm thinking they would need to be run by the flusher goroutine in agent.go, on the slice of points, before flush gets called.

Doing it this way would support the former of the two config options I listed above.

sparrc · 2015-11-20T18:58:04Z

Actually we can aggregate stats as they arrive here: https://github.com/influxdb/telegraf/blob/master/agent.go#L397-L399

that way not needing to deal with dropping metrics that shouldn't be flushed on their own, we can just add the aggregated stats directly to the slice of points.

erowan · 2016-06-07T11:20:13Z

I need to sum bytes + duration to aggregate netflow stats. Looking at your statsd plugin it doesn't appear to perform a sum. Can this be added similar to etsy/statsd?

sparrc · 2016-06-07T13:24:14Z

@erowan please open a separate feature request for the statsd input if you have one. Although I'm not 100% sure I understand what you mean. The statsd protocol sums only if you are sending counters, doesn't it? Or are you talking about performing a sum on histogram/timer metrics? Can you link to some documentation on that if it exists in the etsy implementation?

erowan · 2016-06-07T15:20:05Z

Hello @sparrc, it's documented here https://github.com/etsy/statsd/blob/master/docs/metric_types.md

But I think I am going to write (bytes*8)/duration = bps directly as a timing metric to telegraph statsd now.

sparrc · 2016-06-07T16:51:02Z

@erowan do you mean timing sums? https://github.com/etsy/statsd/blob/master/docs/metric_types.md#timing

can you open a separate feature request for that?

erowan · 2016-06-07T17:17:08Z

@sparrc yes that was what I was referring too. I am still pondering on it. I'll gladly open later if required.
Cheers.

alimousazy · 2016-06-08T09:03:29Z

Can I work on aggregation ? it just a matter of moving code around since I have working version but it inside one of the input plug in @sparrc ?

sparrc · 2016-06-08T09:19:51Z

You can open a PR but I can't guarantee I'll accept it. This is a difficult problem and many of the stats require storing large amounts of data to be completely accurate. If you can please try to use the statsd running_stats code for these as well: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/statsd/running_stats.go

I'd prefer that over using an outside library.

Currently running_stats doesn't have a median or sum function, but that should be simple to add.

alimousazy · 2016-06-21T18:22:00Z

Here is a PR which addresz the issue #1364

jadbox · 2016-06-23T20:11:41Z

We're also looking for a way to do aggregations sum within telegraph before the data is sent over to Influx as our volume can be 100k(s) updates per second.

jadbox · 2016-06-23T22:05:35Z

An ideal solution for me is if the logparser plugin (#1320) supported aggregates in the way statsD works.

alimousazy · 2016-06-23T22:38:50Z

@jadbox If you mean by aggregation sum of each field this can be added easily to histogram aggregation filter. I don't think it is the right to have aggregation within input it because it really hard to apply it on other input plugins .

jadbox · 2016-06-24T00:46:22Z

@alimousazy
In my other example, I had "joe" as a key, but my data are arbitrary number of keys that I wouldn't code into the histogram query.

userID | timestamp | doesActionA | doesActionB
joe, 1466550440, 50, 20
joe, 1466550440, 10, 15
terry, 1466550440, 5, 30

and I want to aggregate in telegraph before sending to Influx:

# aggregate into 1s blocks, and send each block to Influx
joe, 1466550440, 60, 35
terry, 1466550440, 5, 30

These are the aggregate 1s slices I need to send directly to Influx. I'm not seeing how histogram solves this- can you explain it more? Note that I do not know the userID field values ahead of time... they are arbitrary data points.

alimousazy · 2016-06-24T02:19:30Z

@jadbox Could you please tell me if Joe and Terry are tag names or metric names ? if it is a tag name then aggregation will be per tag so you will have two metrics with same metric name but different tags aggregated per tag name already supported with current implementation (The result that you want). I will all add "_ALL" as reserved metric name which allow aggregation all metrics regardless of the name but that doesn't matter in your case.by the way LogParser will emit all the metric under one metric name but I think with different tags, so you will the expected result.

sparrc · 2016-06-25T21:47:29Z

see influxdata/influxdb#6910

jadbox · 2016-06-26T15:14:05Z

@sparrc fyi, in my case I need aggregations before I send data to a DB. (400k/s writes)

@alimousazy Joe/Terry are tag names. The metric name would be a single static name as the data falls into a single category.

Okay, you're saying that this is supported with LogParser, but how do I tell LogParser to increment certain fields together by tag name, by 1 minute sliced batches? I don't see anything related to aggregations (either by tag or by time slice) in the docs:

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/logparser

sparrc · 2016-06-26T16:05:23Z

it is not supported by logparser, there is currently no support for this except using the statsd input.

The solution for this will need to be generic and usable across all plugins, as well as supporting filtering of tag key/values, field names, and measurement names.

alimousazy · 2016-06-26T19:13:08Z

@jadbox You don't have to add anything to logpaser config, you just to enable histogram filter by adding this configuration (You can enable the filter to any kind of plugins)

[[filter.histogram]]
  bucketsize = 20  
  flush_interval = "1m"
  [filter.histogram.metrics]
    (replace with your metric name) = [0.90]

*Note: you can tone aggregation interval by modifying flush_interval (I may change flush interval to aggregation interval) , If you don't need percentile just leave the array empty.

Note this code is not merged yet so you have to merge it your self and build from source. expect changes after code review .

Once you feel that the code solve your case I will add sum

jadbox · 2016-06-26T20:03:19Z

@alimousazy Okay, I think adding sum to histogram may work for me. I don't need the percentile so my config would look like this I assume.

[[filter.histogram]]
  bucketsize = 20
  flush_interval = "1m"
  [filter.histogram.metrics]
    tracking_log = []

Might be useful to optionally specify to just export sum (when it has been added) instead of always including variance, mean, and count along with it. This may save a good chunk of performance when dealing with high volume of data. Of course, this breaks the notion of the filter plugin being a histogram versus just an aggregator.

alimousazy · 2016-06-26T22:34:26Z

@jadbox, I just Added support for sum to the pull request.

Don't worry about performance I'm using special implementation for Histogram which specially designed for streaming and low memory foot print, please let me know about any feedback.

I will spend tonight in testing solidifying the solution.

pauldix · 2016-06-27T07:22:51Z

I recently added an issue for InfluxDB to be able to do aggregations across many measurements. It would be good if the Telegraf method for doing this used a similar sort of structure and syntax. See influxdata/influxdb#6910

alimousazy · 2016-06-27T07:54:10Z

@pauldix I can map the syntax to something like this

[[filter.histogram]]
  [rollup] 
    name= "foo"
    measurements = ["foo", "bar"] # Leaving it empty mean all the metrics 
    fields = [] #specifying one field (if left as empty mean all)
    functions = ["mean", "count", "max", "percentile(90) as perc_90", "percentile(99) as perc_99"]
    periods = "5m" #flushing interval 
    drop_original = true # drop original metrics only if it contain all the aggregate fields

While I feel adding the fields condition have a big cost since we are dealing with streaming data.

Any other ideas for filter which reside between input and output plugins , I have the following filter that I might implement in the future if infrastructure get merged :

1- Rename filter for renaming tags or metric (Metric shaping).
2- Condition filter to drop metric which doesn't meet condition like more than specific value or have specific tag (Useful for alerting).
3- Sampling filter use sampling tools and library to sample metric and reduce bandwidth usage .
4- Bandwidth filter specify the max number of metric that should be emitted in specify period of time this can be number of metric or metric size in bytes.
5- Remote control filter which use integrated messaging library like NanoMSG to accept command from centralized service this can be used to enable disable other filters on demand.

Any ideas on these filters syntax ( I might with other ideas in the future ) ?

alimousazy · 2016-08-05T19:28:58Z

I just added support for :
1- Rollup.
2- Functions to be applied (mean, sum .... etc)
3- Glob matching for both metric name and tag name.
4- Flag to not drop the original metric after aggregation.

Example :

rollup = [
  "(Name new) (Tag interface en*) (Functions mean 0.90)",
  "(Name cpu_value) (Measurements cpu) (Functions mean sum) (pass)",
]

For more information check the pull request #1364

sparrc · 2016-08-29T12:09:02Z

closing this in favor of #1662

sparrc mentioned this issue Jun 2, 2016

Accumulator options? #1310

Closed

This was referenced Jun 8, 2016

Feature Request #1349

Closed

Logstash integration #328

Closed

jadbox mentioned this issue Jun 23, 2016

Add Histogram support aggregation #1364

Closed

3 tasks

sparrc mentioned this issue Jun 25, 2016

Telegraf should be able to write rollup data #1419

Closed

sparrc changed the title ~~Telegraf should do some simple metrics aggregation~~ Telegraf should do some simple metric aggregation/rollup Jun 25, 2016

sparrc mentioned this issue Jul 14, 2016

[Feature Request] Add sumarize data by interval in the log parser input plugin #1478

Closed

sparrc added the Difficulty/Large label Jul 20, 2016

This was referenced Aug 12, 2016

Provide docs on derivative items. #216

Closed

WIP, Downsampling metrics #1595

Closed

pauldix mentioned this issue Aug 23, 2016

"Histogram" statistics aggregator plugin #1662

Closed

sparrc closed this as completed Aug 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegraf should do some simple metric aggregation/rollup #380

Telegraf should do some simple metric aggregation/rollup #380

ekini commented Nov 19, 2015

sparrc commented Nov 20, 2015

sparrc commented Nov 20, 2015

ekini commented Nov 20, 2015

sparrc commented Nov 20, 2015

sparrc commented Nov 20, 2015

erowan commented Jun 7, 2016 •

edited

Loading

sparrc commented Jun 7, 2016

erowan commented Jun 7, 2016

sparrc commented Jun 7, 2016

erowan commented Jun 7, 2016

alimousazy commented Jun 8, 2016 •

edited

Loading

sparrc commented Jun 8, 2016

alimousazy commented Jun 21, 2016

jadbox commented Jun 23, 2016

jadbox commented Jun 23, 2016

alimousazy commented Jun 23, 2016

jadbox commented Jun 24, 2016 •

edited

Loading

alimousazy commented Jun 24, 2016 •

edited

Loading

sparrc commented Jun 25, 2016

jadbox commented Jun 26, 2016 •

edited

Loading

sparrc commented Jun 26, 2016 •

edited

Loading

alimousazy commented Jun 26, 2016

jadbox commented Jun 26, 2016 •

edited

Loading

alimousazy commented Jun 26, 2016

pauldix commented Jun 27, 2016

alimousazy commented Jun 27, 2016

alimousazy commented Aug 5, 2016 •

edited

Loading

sparrc commented Aug 29, 2016

Telegraf should do some simple metric aggregation/rollup #380

Telegraf should do some simple metric aggregation/rollup #380

Comments

ekini commented Nov 19, 2015

sparrc commented Nov 20, 2015

sparrc commented Nov 20, 2015

ekini commented Nov 20, 2015

sparrc commented Nov 20, 2015

sparrc commented Nov 20, 2015

erowan commented Jun 7, 2016 • edited Loading

sparrc commented Jun 7, 2016

erowan commented Jun 7, 2016

sparrc commented Jun 7, 2016

erowan commented Jun 7, 2016

alimousazy commented Jun 8, 2016 • edited Loading

sparrc commented Jun 8, 2016

alimousazy commented Jun 21, 2016

jadbox commented Jun 23, 2016

jadbox commented Jun 23, 2016

alimousazy commented Jun 23, 2016

jadbox commented Jun 24, 2016 • edited Loading

alimousazy commented Jun 24, 2016 • edited Loading

sparrc commented Jun 25, 2016

jadbox commented Jun 26, 2016 • edited Loading

sparrc commented Jun 26, 2016 • edited Loading

alimousazy commented Jun 26, 2016

jadbox commented Jun 26, 2016 • edited Loading

alimousazy commented Jun 26, 2016

pauldix commented Jun 27, 2016

alimousazy commented Jun 27, 2016

alimousazy commented Aug 5, 2016 • edited Loading

sparrc commented Aug 29, 2016

erowan commented Jun 7, 2016 •

edited

Loading

alimousazy commented Jun 8, 2016 •

edited

Loading

jadbox commented Jun 24, 2016 •

edited

Loading

alimousazy commented Jun 24, 2016 •

edited

Loading

jadbox commented Jun 26, 2016 •

edited

Loading

sparrc commented Jun 26, 2016 •

edited

Loading

jadbox commented Jun 26, 2016 •

edited

Loading

alimousazy commented Aug 5, 2016 •

edited

Loading