Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent and incorrect results when using non_negative_derivative and another aggregate while selecting multiple series #7469

Closed
mark-rushakoff opened this issue Oct 14, 2016 · 1 comment
Assignees
Milestone

Comments

@mark-rushakoff
Copy link
Contributor

Running on OSX, tested against v1.0.2 and master at b66374e.

Summary:

Queries using non_negative_derivative and a second aggregate will rarely return correct results:

SELECT non_negative_derivative(max(a)) / min(b) FROM m WHERE...

But on its own, non_negative_derivative works correctly:

SELECT non_negative_derivative(max(a)) FROM m WHERE...

Data generation:

This script will continually write data:

#!/bin/bash

set -e

# Drop old database if one existed, and create a new one.
curl -XPOST http://localhost:8086/query --data-urlencode 'q=drop database rs'
curl -XPOST http://localhost:8086/query --data-urlencode 'q=create database rs'

# The higher these cardinalities are, the more frequently the problem surfaces.
T1_CARDINALITY=8
T2_CARDINALITY=8

# Insert data
gendata() {
  for t1 in $(seq "$T1_CARDINALITY"); do
    for t2 in $(seq "$T2_CARDINALITY"); do
      # Given tag t1 and t2, store n=t1*t2, n2=n*n
      echo "m,t1=$t1,t2=$t2 n=$((t1 * t2))i,n2=$((t1*t1*t2*t2))i"
    done
  done
}

while true; do
  gendata | curl -XPOST 'http://localhost:8086/write?db=rs' --data-binary @-
  # Writing at auto-ns precision, sleep less than a second to avoid data gaps in group by 1s
  sleep 0.9 
done

Queries

The above script gives us predictable data, such that the tags, treated as integers, indicate the value in the fields. n = t1 * t2, and n2 = n * n.

> select n, n2 from m where time > now() - 3s group by * limit 1 slimit 4
name: m
tags: t1=1, t2=1
time                            n       n2
----                            -       --
2016-10-14T22:29:45.060938543Z  1       1

name: m
tags: t1=1, t2=2
time                            n       n2
----                            -       --
2016-10-14T22:29:45.060938543Z  2       4

name: m
tags: t1=1, t2=3
time                            n       n2
----                            -       --
2016-10-14T22:29:45.060938543Z  3       9

name: m
tags: t1=1, t2=4
time                            n       n2
----                            -       --
2016-10-14T22:29:45.060938543Z  4       16

Queries that work correctly

The field values are constant within a series, so aggregates like min, mean, max will return the same as the raw value:

> select mean(n), min(n2) from m where time > now() - 3s group by time(1s), * slimit 4
name: m
tags: t1=1, t2=1
time                    mean    min
----                    ----    ---
2016-10-14T22:42:27Z    1       1
2016-10-14T22:42:28Z    1       1
2016-10-14T22:42:29Z    1       1
2016-10-14T22:42:30Z

name: m
tags: t1=1, t2=2
time                    mean    min
----                    ----    ---
2016-10-14T22:42:27Z    2       4
2016-10-14T22:42:28Z    2       4
2016-10-14T22:42:29Z    2       4
2016-10-14T22:42:30Z

name: m
tags: t1=1, t2=3
time                    mean    min
----                    ----    ---
2016-10-14T22:42:27Z    3       9
2016-10-14T22:42:28Z    3       9
2016-10-14T22:42:29Z    3       9
2016-10-14T22:42:30Z

name: m
tags: t1=1, t2=4
time                    mean    min
----                    ----    ---
2016-10-14T22:42:27Z    4       16
2016-10-14T22:42:28Z    4       16
2016-10-14T22:42:29Z    4       16
2016-10-14T22:42:30Z

If we take the non_negative_derivative of one of those aggregates, then we get 0 because the value is unchanging:

> select non_negative_derivative(mean(n)) from m where time > now() - 3s group by time(1s), * slimit 4
name: m
tags: t1=1, t2=1
time                    non_negative_derivative
----                    -----------------------
2016-10-14T22:43:50Z    0
2016-10-14T22:43:51Z    0
2016-10-14T22:43:52Z    0
2016-10-14T22:43:53Z    0

name: m
tags: t1=1, t2=2
time                    non_negative_derivative
----                    -----------------------
2016-10-14T22:43:50Z    0
2016-10-14T22:43:51Z    0
2016-10-14T22:43:52Z    0
2016-10-14T22:43:53Z    0

name: m
tags: t1=1, t2=3
time                    non_negative_derivative
----                    -----------------------
2016-10-14T22:43:50Z    0
2016-10-14T22:43:51Z    0
2016-10-14T22:43:52Z    0
2016-10-14T22:43:53Z    0

name: m
tags: t1=1, t2=4
time                    non_negative_derivative
----                    -----------------------
2016-10-14T22:43:50Z    0
2016-10-14T22:43:51Z    0
2016-10-14T22:43:52Z    0
2016-10-14T22:43:53Z    0

Queries with non_negative_derivative that do not work correctly

When we use non_negative_derivative in addition to a separate aggregate, all bets are off.

The series often appear to be intermixed:

> select non_negative_derivative(mean(n)) + min(n) from m where time > now() - 3s group by time(1s), * slimit 4
name: m
tags: t1=1, t2=1
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:25Z
2016-10-14T22:45:27Z    1

name: m
tags: t1=1, t2=2
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:26Z    1

name: m
tags: t1=1, t2=1
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:28Z

name: m
tags: t1=1, t2=2
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:25Z

name: m
tags: t1=1, t2=3
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:27Z    2

name: m
tags: t1=1, t2=4
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:26Z    2

name: m
tags: t1=1, t2=2
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:28Z

name: m
tags: t1=1, t2=3
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:25Z
2016-10-14T22:45:26Z
2016-10-14T22:45:27Z
2016-10-14T22:45:28Z

name: m
tags: t1=1, t2=4
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:25Z
2016-10-14T22:45:26Z
2016-10-14T22:45:27Z
2016-10-14T22:45:28Z

Sometimes the series are joined together properly, but values are missing and lengths of series are different.

> select non_negative_derivative(mean(n)) + min(n) from m where time > now() - 3s group by time(1s), * slimit 4
name: m
tags: t1=1, t2=1
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:49Z
2016-10-14T22:45:51Z    1
2016-10-14T22:45:52Z    1

name: m
tags: t1=1, t2=2
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:50Z    1
2016-10-14T22:45:49Z
2016-10-14T22:45:52Z    2

name: m
tags: t1=1, t2=3
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:50Z    2
2016-10-14T22:45:51Z    2
2016-10-14T22:45:49Z

name: m
tags: t1=1, t2=4
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:45:50Z    3
2016-10-14T22:45:51Z    3
2016-10-14T22:45:52Z    3
2016-10-14T22:45:49Z
2016-10-14T22:45:50Z
2016-10-14T22:45:51Z
2016-10-14T22:45:52Z

Every once in a while the results are correct. It seems to be less likely, the more series that are scanned in the query.

> select non_negative_derivative(mean(n)) + min(n) from m where time > now() - 3s group by time(1s), * slimit 4
name: m
tags: t1=1, t2=1
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:47:55Z    1
2016-10-14T22:47:56Z    1
2016-10-14T22:47:57Z    1
2016-10-14T22:47:58Z    1

name: m
tags: t1=1, t2=2
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:47:55Z    2
2016-10-14T22:47:56Z    2
2016-10-14T22:47:57Z    2
2016-10-14T22:47:58Z    2

name: m
tags: t1=1, t2=3
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:47:55Z    3
2016-10-14T22:47:56Z    3
2016-10-14T22:47:57Z    3
2016-10-14T22:47:58Z    3

name: m
tags: t1=1, t2=4
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:47:55Z    4
2016-10-14T22:47:56Z    4
2016-10-14T22:47:57Z    4
2016-10-14T22:47:58Z    4

The issue is not present when filtering to a single series:

> select non_negative_derivative(mean(n)) + min(n) from m where time > now() - 3s and t1 = '5' and t2 = '6' group by time(1s), * slimit 4
name: m
tags: t1=5, t2=6
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:51:11Z    30
2016-10-14T22:51:12Z    30
2016-10-14T22:51:13Z    30
2016-10-14T22:51:14Z

But even filtering to two series will show the issue:

> select non_negative_derivative(mean(n)) + min(n) from m where time > now() - 3s and t1 = '5' and (t2 = '6' or t2 = '7') group by time(1s), * slimit 4
name: m
tags: t1=5, t2=6
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:52:35Z
2016-10-14T22:52:37Z    30
2016-10-14T22:52:38Z    30

name: m
tags: t1=5, t2=7
time                    non_negative_derivative_min
----                    ---------------------------
2016-10-14T22:52:36Z    30
2016-10-14T22:52:35Z
2016-10-14T22:52:38Z    35
2016-10-14T22:52:37Z
2016-10-14T22:52:38Z

Workaround

Until this issue is fixed, avoid selecting non_negative_derivative and a separate aggregate in the same query.

If you require a calculation using non_negative_derivative with another aggregate, consider using separate SELECT INTO statements (manually or with a continuous query) so that you don't have to evaluate non_negative_derivative and the other aggregate at the same time, but rather can do a simple select of the precalculated non_negative_derivative and other aggregate.

@jwilder
Copy link
Contributor

jwilder commented Oct 18, 2016

Fixed via #7473

@jwilder jwilder closed this as completed Oct 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants