-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregation functions should provide interpolation when aggregating multiple series #728
Comments
That's because aggregators in OpenTSDB are aggregators over multiple series that get combined into one. In Influx an aggregator is more traditional, that is it aggregates over multiple points in a single series. In the case of your query, each series gets aggregated individually and you get returned as many series as match that regex. If you want empty intervals filled, you can use What you're probably looking to do is to merge many time series into one and then aggregate over that. Issue #72 will give you the ability to do that. But that still won't help you if you have intervals with missing data. In that case you'd probably want something more like a Influx isn't just limited to storing data at fixed intervals of time. As such, it has functions designed to calculate intervals over multiple points. I'm closing this out for now, but feel free to keep commenting and I will open issues that map to feature requests that come out of the discussion. In the future, the proper place to initiate a discussion around feature development or general questions is the mailing list. Thanks |
When will you plan to add this options to fill() ? |
Either as part of the 0.9.0 release or in a point release shortly afterwards |
As a general purpose time series database, I was surprised to see that the aggregation functions do not perform interpolation.
http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html
Given mutliple series that report the same value over time at an irregular interval (like happens in the real world), the summation of such series with a query like:
select sum(value) from /disk.capacity.*/ group by time(5s)
produces a single value. I would expect it to first down sample each series matching /disk.capacity.*/ on 5s intervals, and then if necessary, interpolate each of those series to produce a correct aggregation.
This only roughly works if the group by time interval matches the reporting interval of the original series, but even that can produce a wavy summation line out of otherwise flat time series graphs due to the aggregation function having no idea about how many value it should expect in the sum and compute appropriate values from each series to fill in the missing values in the aggregate series.
The problem seems to be further worse in that the actual output sum in the query above for identical flat-line graphs is multiplied if the group by time() value is multiplied. For example in the above, group by(10s) will produce sums that are twice than what is expected.
Is this something that is on the roadmap to address?
The text was updated successfully, but these errors were encountered: