Optimize top() and bottom() using an incremental aggregator #8394

jsternberg · 2017-05-16T18:39:44Z

The previous version of top() and bottom() would gather all of the
points to use in a slice, filter them (if necessary), then use a
slightly modified heap sort to retrieve the top or bottom values.

This performed horrendously from the standpoint of memory. Since it
consumed so much memory and spent so much time in allocations (along
with sorting a potentially very large slice), this affected speed too.

These calls have now been modified so they keep the top or bottom points
in a min or max heap. For top(), a new point will read the minimum
value from the heap. If the new point is greater than the minimum point,
it will replace the minimum point and fix the heap with the new value.
If the new point is smaller, it discards that point. For bottom(), the
process is the opposite.

It will then sort the final result to ensure the correct ordering of the
selected points.

When top() or bottom() contain a tag to select, they have now been
modified so this query:

SELECT top(value, host, 2) FROM cpu

Essentially becomes this query:

SELECT top(value, 2), host FROM (
    SELECT max(value) FROM cpu GROUP BY host
)

This should drastically increase the performance of all top() and
bottom() queries.

Rebased/mergable
Tests pass
CHANGELOG.md updated

jsternberg · 2017-05-16T18:42:50Z

Benchmark comparison:

benchmark                    old ns/op     new ns/op     delta
BenchmarkSelect_Top_1K-8     381236352     64565949      -83.06%

benchmark                    old allocs     new allocs     delta
BenchmarkSelect_Top_1K-8     110            66             -40.00%

benchmark                    old bytes     new bytes     delta
BenchmarkSelect_Top_1K-8     529304706     12168         -100.00%

There should also be an improvement when using SELECT top(value, host, <n>) ..., but the code to benchmark this is less straightforward since the old implementation worked completely differently than this new implementation. I can try if needed, but it wouldn't be a 1-1 match between the code.

discoduck2x · 2017-05-17T06:35:17Z

@jsternberg very interesting , is this part of the nightly build now?

jsternberg · 2017-05-17T14:02:36Z

This has not been merged yet. It still needs to be reviewed by someone before it gets merged.

The previous version of `top()` and `bottom()` would gather all of the points to use in a slice, filter them (if necessary), then use a slightly modified heap sort to retrieve the top or bottom values. This performed horrendously from the standpoint of memory. Since it consumed so much memory and spent so much time in allocations (along with sorting a potentially very large slice), this affected speed too. These calls have now been modified so they keep the top or bottom points in a min or max heap. For `top()`, a new point will read the minimum value from the heap. If the new point is greater than the minimum point, it will replace the minimum point and fix the heap with the new value. If the new point is smaller, it discards that point. For `bottom()`, the process is the opposite. It will then sort the final result to ensure the correct ordering of the selected points. When `top()` or `bottom()` contain a tag to select, they have now been modified so this query: SELECT top(value, host, 2) FROM cpu Essentially becomes this query: SELECT top(value, 2), host FROM ( SELECT max(value) FROM cpu GROUP BY host ) This should drastically increase the performance of all `top()` and `bottom()` queries.

jsternberg added the review label May 16, 2017

jsternberg force-pushed the js-top-bottom-performance branch from 9a3e169 to 5858d94 Compare May 16, 2017 18:40

jsternberg mentioned this pull request May 16, 2017

CQ or Backfill using Top() with tag does not create tag in result #7129

Closed

jsternberg force-pushed the js-top-bottom-performance branch 2 times, most recently from bfd0640 to 58def26 Compare May 16, 2017 20:45

jsternberg added this to the 1.3.0 milestone May 16, 2017

jsternberg force-pushed the js-top-bottom-performance branch from 58def26 to 4e68349 Compare May 17, 2017 16:34

jsternberg requested a review from benbjohnson May 18, 2017 14:12

jsternberg force-pushed the js-top-bottom-performance branch from 4e68349 to 7b9b55b Compare May 19, 2017 16:56

benbjohnson approved these changes May 19, 2017

View reviewed changes

jsternberg merged commit 4bdce21 into master May 19, 2017

jsternberg deleted the js-top-bottom-performance branch May 19, 2017 19:32

jsternberg removed the review label May 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize top() and bottom() using an incremental aggregator #8394

Optimize top() and bottom() using an incremental aggregator #8394

jsternberg commented May 16, 2017 •

edited

Loading

jsternberg commented May 16, 2017

discoduck2x commented May 17, 2017

jsternberg commented May 17, 2017

Optimize top() and bottom() using an incremental aggregator #8394

Optimize top() and bottom() using an incremental aggregator #8394

Conversation

jsternberg commented May 16, 2017 • edited Loading

jsternberg commented May 16, 2017

discoduck2x commented May 17, 2017

jsternberg commented May 17, 2017

jsternberg commented May 16, 2017 •

edited

Loading