Refactor the math engine to compile the query and use eval #9563

jsternberg · 2018-03-13T16:33:00Z

This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.

It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.

This also makes use of the new eval functionality that was recently
added to the influxql package.

No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.

This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:

SELECT max(value) * 2, max(value) / 2 FROM cpu

This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.

Fixes #9511.

hercules-influx · 2018-03-13T16:33:30Z

During a run of megacheck the following issues were discovered:

hercules-influx · 2018-03-13T17:22:11Z

During a run of megacheck the following issues were discovered:

hercules-influx · 2018-03-13T17:23:09Z

During a run of megacheck the following issues were discovered:

hercules-influx · 2018-03-13T17:24:30Z

During a run of megacheck the following issues were discovered:

/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1155:6: type integerFloatTransformIterator is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1180:6: type integerFloatTransformFunc is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1182:6: type integerFloatCastIterator is unused (U1000)
/tmp/309915216/src/github.com/influxdata/influxdb/query/iterator.go:1204:6: type unsignedFloatCastIterator is unused (U1000)

stuartcarnie

Looks good to me overall. I have added some comments / question in context.

stuartcarnie · 2018-03-16T13:51:54Z

query/iterator.go

+var SkipDefault = interface{}(0)
+
+// NewIteratorScanner produces an IteratorScanner for the Iterator.
+func NewIteratorScanner(input Iterator, keys []string, defaultValue interface{}) IteratorScanner {


Given the reduction in complexity and locking by removing the aux iterator, have you seen any perf improvements as a result of this change?

I think we need to test it. I'm not sure how thorough our benchmarks are though.

I just pushed a change to reduce memory allocations. There was a regression in memory allocations because the time was being taken care of by eval which resulted in an extra allocation of an interface for every row. Here's the numbers for the one with the memory optimization that avoids the extra allocation:

benchmark old ns/op new ns/op delta BenchmarkCountIterator_1K-8 42116 43269 +2.74% BenchmarkCountIterator_100K-8 4191261 4113898 -1.85% BenchmarkCountIterator_1M-8 41485285 39940193 -3.72% BenchmarkSampleIterator_1k-8 464 461 -0.65% BenchmarkSampleIterator_100k-8 410 412 +0.49% BenchmarkSampleIterator_1M-8 321 306 -4.67% BenchmarkDistinctIterator_1K-8 74546 72119 -3.26% BenchmarkDistinctIterator_100K-8 6858535 6741477 -1.71% BenchmarkDistinctIterator_1M-8 68455510 69030348 +0.84% BenchmarkModeIterator_1K-8 103948 111675 +7.43% BenchmarkModeIterator_100K-8 34101326 34786063 +2.01% BenchmarkModeIterator_1M-8 408322854 232866597 -42.97% BenchmarkSelect_Raw_1K-8 1724663 197585 -88.54% BenchmarkSelect_Raw_100K-8 1808846670 192179458 -89.38% BenchmarkSelect_Dedupe_1K-8 38699956 30059134 -22.33% benchmark old allocs new allocs delta BenchmarkCountIterator_1K-8 11 11 +0.00% BenchmarkCountIterator_100K-8 11 11 +0.00% BenchmarkCountIterator_1M-8 11 11 +0.00% BenchmarkSampleIterator_1k-8 3 3 +0.00% BenchmarkSampleIterator_100k-8 3 3 +0.00% BenchmarkSampleIterator_1M-8 3 3 +0.00% BenchmarkDistinctIterator_1K-8 18 18 +0.00% BenchmarkDistinctIterator_100K-8 18 18 +0.00% BenchmarkDistinctIterator_1M-8 18 18 +0.00% BenchmarkModeIterator_1K-8 22 22 +0.00% BenchmarkModeIterator_100K-8 42 42 +0.00% BenchmarkModeIterator_1M-8 53 53 +0.00% BenchmarkSelect_Raw_1K-8 2068 2091 +1.11% BenchmarkSelect_Raw_100K-8 2000123 2000096 -0.00% BenchmarkSelect_Dedupe_1K-8 302008 302032 +0.01% benchmark old bytes new bytes delta BenchmarkCountIterator_1K-8 928 928 +0.00% BenchmarkCountIterator_100K-8 945 945 +0.00% BenchmarkCountIterator_1M-8 1101 1032 -6.27% BenchmarkSampleIterator_1k-8 400 400 +0.00% BenchmarkSampleIterator_100k-8 400 400 +0.00% BenchmarkSampleIterator_1M-8 400 400 +0.00% BenchmarkDistinctIterator_1K-8 7052 7053 +0.01% BenchmarkDistinctIterator_100K-8 7058 7054 -0.06% BenchmarkDistinctIterator_1M-8 7032 7032 +0.00% BenchmarkModeIterator_1K-8 197152 197152 +0.00% BenchmarkModeIterator_100K-8 44999200 44999200 +0.00% BenchmarkModeIterator_1M-8 529293856 529293856 +0.00% BenchmarkSelect_Raw_1K-8 44086 53304 +20.91% BenchmarkSelect_Raw_100K-8 40010976 48006068 +19.98% BenchmarkSelect_Dedupe_1K-8 4209411 4210828 +0.03%

There's still more memory allocated in general for raw queries for number of bytes used, but it's not as large as the previous change. I'll keep trying to track it down.

CPU usage is generally better across the board with this technique though.

stuartcarnie · 2018-03-16T13:53:15Z

tsdb/shard.go

@@ -1206,6 +1206,20 @@ func (a Shards) MapType(measurement, field string) influxql.DataType {
 	return typ
 }

+func (a Shards) CallType(name string, args []influxql.DataType) (influxql.DataType, error) {


Why is CallType added to the Shard type – I do not understand the relevance

I'm not sure this is needed anymore.

The original purpose was to be similar to MapType. This function was supposed to say, "The shards implement these functions and here are the return types as reported by the storage engine." It still may be worth doing that, but it seems like this method is captured by another struct I created to update the type mapper.

I think I'm going to keep this for consistency and remove the other struct that intercepts this method call.

Ok, I finally remember now.

So yes, this is meant to expose the return values for a function and eventually do some type checking too so we can error out earlier in the iterator creation process. The hookups for more user friendly error messages and validation aren't done, but they also aren't that important since later in the system it will find out the types are bad anyway.

The reason this is being added to shard is for consistency. The field mapper that is used in query/functions.go wraps a group of shards to expose all of the values from the query engine. Technically, that section of code can handle returning the types since both the storage engine and the query engine are capable of handling call iterators. I like this here though so there's no surprise that the Shards type can handle a mean() call, but doesn't implement CallType to tell the type checker what type it is. I've now refactored it though so they use the same underlying code base though so there is less confusion and less of a chance they will become disconnected.

This change makes it so that we simplify the math engine so it doesn't use a complicated set of nested iterators. That way, we have to change math in one fewer place. It also greatly simplifies the query engine as now we can create the necessary iterators, join them by time, name, and tags, and then use the cursor interface to read them and use eval to compute the result. It makes it so the auxiliary iterators and all of their complexity can be removed. This also makes use of the new eval functionality that was recently added to the influxql package. No math functions have been added, but the scaffolding has been included so things like trigonometry functions are just a single commit away. This also introduces a small breaking change. Because of the call optimization, it is now possible to use the same selector multiple times as a selector. So if you do this: SELECT max(value) * 2, max(value) / 2 FROM cpu This will now return the timestamp of the max value rather than zero since this query is considered to have only a single selector rather than multiple separate selectors. If any aspect of the selector is different, such as different selector functions or different arguments, it will consider the selectors to be aggregates like the old behavior.

stuartcarnie

LGTM 👍

ghost assigned jsternberg Mar 13, 2018

ghost added the review label Mar 13, 2018

jsternberg added the kind/feature-request label Mar 13, 2018

jsternberg force-pushed the js-9511-math-eval-iterators branch from 130a252 to 3e80c89 Compare March 13, 2018 17:21

jsternberg force-pushed the js-9511-math-eval-iterators branch from 3e80c89 to c255f02 Compare March 13, 2018 17:22

jsternberg force-pushed the js-9511-math-eval-iterators branch from c255f02 to 3b6356c Compare March 13, 2018 17:23

jsternberg added the breaking change label Mar 13, 2018

jsternberg force-pushed the js-9511-math-eval-iterators branch 5 times, most recently from 37b7a03 to acf48df Compare March 14, 2018 21:03

jsternberg requested a review from stuartcarnie March 15, 2018 14:38

stuartcarnie reviewed Mar 16, 2018

View reviewed changes

jsternberg force-pushed the js-9511-math-eval-iterators branch 2 times, most recently from 20ea562 to 2494c79 Compare March 19, 2018 17:06

jsternberg force-pushed the js-9511-math-eval-iterators branch from 2494c79 to f8d60a8 Compare March 19, 2018 20:01

stuartcarnie approved these changes Mar 20, 2018

View reviewed changes

jsternberg merged commit 49ed5f3 into master Mar 20, 2018

jsternberg deleted the js-9511-math-eval-iterators branch March 20, 2018 19:13

ghost removed the review label Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the math engine to compile the query and use eval #9563

Refactor the math engine to compile the query and use eval #9563

jsternberg commented Mar 13, 2018 •

edited

Loading

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

stuartcarnie left a comment

stuartcarnie Mar 16, 2018

jsternberg Mar 16, 2018

jsternberg Mar 19, 2018

stuartcarnie Mar 16, 2018

jsternberg Mar 19, 2018

jsternberg Mar 19, 2018

stuartcarnie left a comment

Refactor the math engine to compile the query and use eval #9563

Refactor the math engine to compile the query and use eval #9563

Conversation

jsternberg commented Mar 13, 2018 • edited Loading

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

hercules-influx commented Mar 13, 2018

stuartcarnie left a comment

Choose a reason for hiding this comment

stuartcarnie Mar 16, 2018

Choose a reason for hiding this comment

jsternberg Mar 16, 2018

Choose a reason for hiding this comment

jsternberg Mar 19, 2018

Choose a reason for hiding this comment

stuartcarnie Mar 16, 2018

Choose a reason for hiding this comment

jsternberg Mar 19, 2018

Choose a reason for hiding this comment

jsternberg Mar 19, 2018

Choose a reason for hiding this comment

stuartcarnie left a comment

Choose a reason for hiding this comment

jsternberg commented Mar 13, 2018 •

edited

Loading