-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "integral" function to InfluxQL #8194
Conversation
There's a small lingering issue that I'm encountering with the interpolation. Since integral acts so weird, there's a lingering question I have. What happens when it is used with My original idea was just to read in points as a stream and perform interpolation like that. But, what happens when a fill specification is included? Imagine you have data equally spaced every 10 seconds and you call integral and tell it to group every 20 seconds. The interpolation feature allows this to learn where the next point is to complete the line going to the next interval. But, if you were to have the next point skip 1 minute into the future, should it perform an interpolation between those two points or should it cut off the area calculation at the last time before that point? A specific example:
For the first bucket, 0s to 20s, I think this is pretty simple. You would find the area between 0s and 10s and then find the area between 10s and 20s. But, the point at 20s isn't technically in the first bucket. It's the first point of the next bucket. If that first point in the next bucket was 21s and the fill is null, none, or some number, what should happen here? You can see that issue comes up later in that series because 40s is missing and it's the beginning of an interval. @pauldix any thoughts on this? I don't think integral is complete without some form of interpolation handling the area between different intervals personally so I would like to hash out how this should work. |
Note, my current favored plan for that is just to say |
+1 for making |
The potential issue with an error though is should this type of query be allowed?
Since we allow multiple aggregates to be queried, that |
Hmmm yeah. Maybe with multiple aggregates it would just apply to the ones that work while leaving integral alone. |
e4cf04e
to
0cf6b72
Compare
We don't seem to throw any kind of error when |
so you're thinking don't allow fill on any query that has integral in it?
So if they wanted mean and integral they'd just issue two queries?
…On Fri, Mar 24, 2017 at 11:59 AM, Jonathan A. Sternberg < ***@***.***> wrote:
We don't seem to throw any kind of error when FILL() is used in a
situation where it doesn't do anything so I think we should just document
it and plan in the future to improve query parsing. Integral is already
going to be a very weird function.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8194 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQ6wl3l1LTOxJV8EnmeQrfmPB0PhMCks5ro-h5gaJpZM4MnQZR>
.
|
No, I mean just ignore the So this would be valid, but also useless:
|
0cf6b72
to
a88da40
Compare
Sounds good. Well, as good as we have for now until query V2 is born.
…On Fri, Mar 24, 2017 at 12:03 PM, Jonathan A. Sternberg < ***@***.***> wrote:
No, I mean just ignore the FILL() function and let it be used. We don't
seem to have any verification to see if the FILL() function is used
properly anyway. We likely need to start thinking of a plan for a v2 query
parser that prevents these PHP-style things, but the current query parser's
philosophy is mostly to ignore things that don't make sense silently.
So this would be valid, but also useless:
SELECT integral(value) FROM cpu GROUP BY time(1m) FILL(0)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8194 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQ62SkG1B0so57eO9Xu9A0sirFAxctks5ro-lSgaJpZM4MnQZR>
.
|
a88da40
to
550a47f
Compare
While playing with Grafana for graphing impulse meters (e.g. S0 meter) or consumption values I noted that Grafana plots the graph in a way not suitable for such a use case:
This is exactly the use case, why I'm eagerly waiting for the Integral implementation. I was wondering if this is only a "graphical problem" or if it would affect this upcoming feature as well. |
@Sineos I'm not sure I understand your point, but I'm going to give it a guess. Is your point that the integral emits the wrong timestamp and that affects the final graph? I think we're currently emitting the later timestamp rather than the earlier timestamp for the area so I would imagine you run into the same problem. Am I understanding what you're saying correctly? |
550a47f
to
dd239da
Compare
@jsternberg At t1 = 00:00:00 --> 100 Watts Energy = 400W x 1h = 400 Wh So, a typical use case for an integral. The blue graph would show the correct result, whereas the black graph would give an Energy value of 100 Wh (the respective areas under the graph). This would be true for all consumption based calculations and also for rate based calculations if the I guess the idea for a consumption calculation is that I can only look in the past. So if we measure 500 Watts at t2, it is safe to assume that this happened in the time between t1 and t2. So if we choose Delta(t) small enough, the calculation will be pretty accurate. The same logic applies the other way round: If we measure network traffic and our counter shows 100MB at t1 and 400MB at t2, then 300MB of traffic have been generated between t1 and t2. Given the time and the traffic we can calculate the network bandwidth that has caused the traffic. The result in words would be: From t1 onward we had a rate of X MB/s that would eventually lead to an increase of 300MB at t2. |
dd239da
to
9abf2dd
Compare
I tried creating an adhoc test for @Sineos 's example above and came across what appears to be an inconsistency in the timestamps in the output. E.g., given the following data:
It returns the following:
Note that the timestamp for the first bucket is at the beginning of the first bucket and the timestamp for the second bucket is at the end of the third bucket. |
9abf2dd
to
2ea805c
Compare
I fixed the bug that caused the wrong output @dgnorton found. I also added an additional condition where if your last point is at the very start of a new interval (so there is no area yet), no point will be pushed out for the last interval even though other aggregates would. This is solely due to the unique nature of integral. The time that gets output for a bucket is the start of the interval to match with the same behavior that other aggregates do. So the area between 0:00 and 1:00 will have a time of 0:00 when the query is ascending. It'll be the opposite when descending. |
@pauldix any thoughts on whether timestamps in the output should be from the beginning or end of each bucket? |
@dgnorton I think it makes sense to match the behavior of the other ones like @jsternberg did |
@Tomcat-Engineering |
you gots a typo fella SELECT integral(value, 1m) FROM cpu GROU PBY time(20s) |
Fixed the typo for anyone who encounters this from a search engine. Unfortunately, the commit message will be there for all time :( |
The integral function is an aggregator that returns the area under the curve. The integral function also accepts an optional second argument as a time duration to determine the unit of the returned values. The default time duration is
1s
(similar toderivative()
).The area under the curve can also be grouped into buckets, but integral acts slightly differently than other aggregates. First, integral does not support
FILL()
and will ignore anyFILL()
function on the query. Second, the area under the curve will automatically interpolate the area under the curve using a point in the next interval if it exists. So if you group every 20 seconds and record metrics every 10 seconds, the point at 20 seconds will be used to find the area under the curve between 10s and 20s. If you record a point every 15 seconds and group every 20 seconds, then the point at 30 seconds will be used to interpolate the area under the curve between 15 and 20 seconds and the point at 15 seconds will be used to interpolate the area under the curve between 20 and 30 seconds.Unlike
derivative()
, you cannot use a function inside ofintegral()
. If you wish to perform a query like that, subqueries are the easiest way.If there are multiple points at the same time, this is considered a vertical line. Vertical lines do not add anything to the area under the curve, but they do change the line so the next point will be calculated based on the last point at a timestamp rather than just being completely ignored. This behavior differs from the traditional behavior of just ignoring duplicate points in a stream.
Example queries: