Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DERIVATIVE() edge conditions #4237

Closed
caldwell opened this issue Sep 25, 2015 · 0 comments
Closed

DERIVATIVE() edge conditions #4237

caldwell opened this issue Sep 25, 2015 · 0 comments
Assignees

Comments

@caldwell
Copy link

We're running influxdb 0.9.4.1 on Ubuntu 14.04LTS.

We're trying to use DERIVATIVE() and running up against some issues. The data is an ever increasing nginx query count. Here's an example of our query without the DERIVATIVE:

> SELECT first(value) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time            first
----            -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z    5.3238387e+07
2015-09-25T22:12:00Z    5.3241087e+07
2015-09-25T22:13:00Z    5.3243829e+07
2015-09-25T22:14:00Z    5.3246917e+07
2015-09-25T22:15:00Z    5.3246917e+07


name: curl_json_value
tags: host=server-2-e1d-01
time            first
----            -----
2015-09-25T22:10:00Z
2015-09-25T22:11:00Z    409642
2015-09-25T22:12:00Z    412407
2015-09-25T22:13:00Z    415038
2015-09-25T22:14:00Z    416994
2015-09-25T22:15:00Z    416994

When we add DERIVATIVE():

> SELECT derivative(first(value)) FROM "curl_json_value" WHERE "instance" = 'httpserver' AND "type" = 'http_requests' AND "host" =~ /^server-.*/ AND time > now() - 5m GROUP BY time(60s), "host" fill(previous)
name: curl_json_value
tags: host=server-2-e1c-01
time            derivative
----            ----------
2015-09-25T22:10:00Z    0


name: curl_json_value
tags: host=server-2-e1d-01
time            derivative
----            ----------
2015-09-25T22:10:00Z    0

I believe this is because the fill(previous) doesn't give a result for the first item [1].

fill(0) seems to give a result for the first value, but is incorrect in general for DERIVATIVE() because it gives the wrong values unless the GROUP BY time value is exactly correct [2]. We're eventually going to use this query in Grafana, which means we don't get to choose the time period specifically.

Since the times sometimes line up, the DERIVATIVE() query does sometimes return results. These results look ok, except the first and last values are always garbage (large jumps positive or negative). Maybe DERIVATIVE() should query extra value on either side so it can give back valid answers over the given WHERE value, or perhaps it should take the values that it was asked for and throw out the first and last value of the result, giving back a little less data than was asked for.

Interestingly, in 0.9.3 (before bug #3718) we didn't need the FIRST() function inside DERIVATIVE() and everything seemed to work. So, we might not have been doing things right, but 0.9.4 appeared to break things for us.

From my perspective as a query writer, I think I'd like DERIVATIVE() for imply fill(previous) and have fill(previous) be magic enough to look into prior data so the first value is there. Then a simple query just works.

Or perhaps DERIVATIVE() shouldn't bail out early when it doesn't have leading or trailing data points, and instead return NULL or "no data" or whatever until it has a real difference. That would also fix the first output being a huge spike.

I've only just started thinking about all of this, so there are probably even better options than I've come up with so far. :-)


[1] Well, sometimes it doesn't: since the query has now() in it, sometimes things line up perfectly and the first result does have a value, but it's not consistent and can't be relied on.

[2] Picture data like this: [5,6,7] on a 5 second interval. fill(0) on a 1s interval makes the data [5,0,0,0,0,6,0,0,0,0,7,0,0,0,0] and then taking the derivative gives [5,-5,0,0,0,6,-6,0,0,0,7,-7,0,0,0]. What I expect is [5,0,0,0,0,1,0,0,0,0,1,0,0,0,0] or even [1,0,0,0,0,1,0,0,0,0,1,0,0,0,0] (though that requires looking at data outside what was asked for).

@jwilder jwilder self-assigned this Oct 1, 2015
jwilder added a commit that referenced this issue Oct 1, 2015
If an aggregate derivative query did not have a value in the first
time bucket, it would abort early and return a single row with value
of 0.  Similarly, if either the current or previous value was nil,
it would skip the row and not append any values causing gaps and
no data to show up.

Instead, this will append a nil value if either the current or previous
valis is nil.  This essentially allows nil values to carry through the
results as well as gives a more sensible value for rows where we cannot
compute a difference (instead of dropping the row as before).

Fixes #4237 #4263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants