Refactor query engine for distributed query support #3299

otoolep · 2015-07-12T07:58:09Z

With this change, the query engine code gathers information about shards and tagsets by working with individual shards, collating the information, and returning that to the client. However it does not assume that any particular shard is local, and accesses all shards through abstracted Shard Mappers, of which there are two types -- a type for Raw queries and a second type for Aggregate queries. There are corresponding Executors for each type of Mapper, but both types of Executors share the
same interface.

There is also a new Query Planner, which is much simpler than the previous planner. This planner is now part of the tsdb package, and is responsible for determining which shards must be accessed. and accessing those shards either locally, or over the network using the cluster service.

Remaining work

The only remaining work is to have the cluster service "wrap" the two new Mappers types, such that remote shards can be mapped as easily as local shards. This should be quite straightforward -- for remote nodes the Co-ordinator node will hit (through the cluster service) a remote endpoint, which will in turn instantiate a suitable Mapper. This Mapper will communicate with the Co-ordinator node over a persistent TCP connection, allowing state to be maintained between the Co-ordinator node and remote node. The new Mappers should require no changes to support this.

The split between Raw and Aggregate queries was put in place because shards are driven significantly differently in each case, and making this separation explicit makes the code easier to develop and understand. However as a result there is some code duplication between the new Mappers and Executors, and further refactoring will take place. This will occur after the distributed query support has been fully completed.

This code is ready for merging as is, and no functional difference should be visible to users of the system. The next patch will then clearly show the changes for network support. Once that change is, distributed query support will be complete, and the current restrictions on replication factor can be removed.

Testing

All pre-existing query tests pass, and some new tests have been added for the Mapeprs. Where changes have been made, it was for reasons such as error message changes. Where the changes to some tests may be questioned, in-line notes have been added.

otoolep · 2015-07-12T08:07:00Z

cmd/influxd/run/server_test.go

@@ -2310,12 +2309,6 @@ func TestServer_Query_LimitAndOffset(t *testing.T) {
 			params:  url.Values{"db": []string{"db0"}},
 		},
 		&Query{
-			name:    "limit + offset equal to the  number of points with group by time",


This test appears to have been duped.

otoolep · 2015-07-12T08:08:28Z

#3009

otoolep · 2015-07-12T08:19:57Z

@pauldix @dgnorton

otoolep · 2015-07-12T08:21:39Z

tsdb/mapper.go

+	"github.com/influxdb/influxdb/influxql"
+)
+
+// RawMapper is for retrieving data, for a query, from a given shard. It implements the


The ShardMapper interface doesn't actually exist, and this comment needs correction.

otoolep · 2015-07-13T17:59:04Z

I just pushed up a change and it failed, looking into it now.

With this change, the query engine code gathers information about shards and tagsets by working with individual shards, collating the information, and returning that to the client. It does not assume that any particular shard is local, and accesses all shards through abstracted Mappers, of which there are two types -- a Mapper type for Raw queries and a second type for Aggregate queries. There are corresponding Executors for each type of Mapper, but both types of Executors share the same interface.

otoolep · 2015-07-14T04:57:48Z

Closing in favour of #3320.

otoolep added the 2 - Working label Jul 12, 2015

otoolep reviewed Jul 12, 2015
View reviewed changes

This was referenced Jul 12, 2015

Distributed query support (Raw) #3231

Closed

Support distributed queries #3009

Closed

otoolep reviewed Jul 12, 2015
View reviewed changes

otoolep force-pushed the dq_squashed branch 3 times, most recently from ae52109 to aa5a1c6 Compare July 13, 2015 17:50

otoolep force-pushed the dq_squashed branch from aa5a1c6 to 9d8aef0 Compare July 13, 2015 18:03

otoolep force-pushed the dq_squashed branch from 9d8aef0 to 2b12a83 Compare July 13, 2015 20:39

otoolep mentioned this pull request Jul 13, 2015

Streaming Distributed Queries #3317

Closed

otoolep closed this Jul 14, 2015

otoolep removed the 2 - Working label Jul 14, 2015

otoolep reopened this Aug 20, 2015

otoolep added the 2 - Working label Aug 20, 2015

otoolep closed this Aug 20, 2015

otoolep removed the 2 - Working label Aug 20, 2015

otoolep deleted the dq_squashed branch August 20, 2015 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor query engine for distributed query support #3299

Refactor query engine for distributed query support #3299

otoolep commented Jul 12, 2015

otoolep Jul 12, 2015

otoolep commented Jul 12, 2015

otoolep commented Jul 12, 2015

otoolep Jul 12, 2015

otoolep commented Jul 13, 2015

otoolep commented Jul 14, 2015

Refactor query engine for distributed query support #3299

Refactor query engine for distributed query support #3299

Conversation

otoolep commented Jul 12, 2015

Remaining work

Testing

otoolep Jul 12, 2015

Choose a reason for hiding this comment

otoolep commented Jul 12, 2015

otoolep commented Jul 12, 2015

otoolep Jul 12, 2015

Choose a reason for hiding this comment

otoolep commented Jul 13, 2015

otoolep commented Jul 14, 2015