GROUP BY does not handle buckets correctly #2005

jnutzmann · 2015-03-18T21:27:08Z

I am running a fork of master right now looking to add the offset functionality that is being discussed in #1837. In the process of doing that, I discovered a bug in how buckets are created. Here is the symptom:

These two queries produce completely different results (for all rows), but have the same timestamps in the time column (note the difference in timestamp):

select count(value) from duty_cycle_hose_2 where time > '2015-03-18T15:00:00Z' - 11d group by time(1h) limit 5

select count(value) from duty_cycle_hose_2 where time > '2015-03-18T15:30:00Z' - 11d group by time(1h) limit 5

What's happening is there are different sources for the bucket times compared to the times that result. This code creates the array resultTimes that specifies the time in the results.

(line 137 in engine.go)

// ensure that the start time for the results is on the start of the window
startTimeBucket := m.TMin / m.interval * m.interval

for i, _ := range resultTimes {
    t := startTimeBucket + (int64(i) * m.interval * int64(m.stmt.Offset+1))
    resultTimes[i] = t
    // we always include time so we need one more column than we have aggregates
    vals := make([]interface{}, 0, len(aggregates)+1)
    resultValues[i] = append(vals, time.Unix(0, t).UTC())
}

The actual buckets are based off m.TMin and are created NextInterval(). This simply adds m.interval to T.Min to get the new bucket start time, so the buckets start from the minimum time in the query and not the time where in resultValues. This essentially shifts all the buckets by the modulo (in my example, 30 minutes).

I can create a PR to fix, but I need to know the desired outcome. The cleanest way (in code) would be to adjust T.Min to start at the natural boundary, however this might be counter-intuitive to users if they get points back that are before their specified start time. Thoughts?

The text was updated successfully, but these errors were encountered:

jnutzmann · 2015-03-19T13:55:16Z

To clarify, if your where clause contains time > '2015-03-18T15:00:00Z' you would get the "correct" buckets where values are between 15:00-16:00 for time 15:00, 16:00-17:00 for time 16:00, etc. But, if your where clause contains time > '2015-03-18T15:30:00Z' you would get the incorrect buckets where values are between 15:30-16:30 for time 15:00, 16:30-17:30 for time 16:00, etc. I discovered this when I was using now() and getting slightly different results every time the query returned.

otoolep · 2015-03-20T00:15:22Z

Fixed by the patch. Thanks again @jnutzmann

jnutzmann mentioned this issue Mar 19, 2015

Fixing bucket alignment for group by #2016

Merged

otoolep closed this as completed Mar 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GROUP BY does not handle buckets correctly #2005

GROUP BY does not handle buckets correctly #2005

jnutzmann commented Mar 18, 2015

jnutzmann commented Mar 19, 2015

otoolep commented Mar 20, 2015

GROUP BY does not handle buckets correctly #2005

GROUP BY does not handle buckets correctly #2005

Comments

jnutzmann commented Mar 18, 2015

jnutzmann commented Mar 19, 2015

otoolep commented Mar 20, 2015