Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor query engine for distributed query support #3299

Closed
wants to merge 1 commit into from

Conversation

otoolep
Copy link
Contributor

@otoolep otoolep commented Jul 12, 2015

With this change, the query engine code gathers information about shards and tagsets by working with individual shards, collating the information, and returning that to the client. However it does not assume that any particular shard is local, and accesses all shards through abstracted Shard Mappers, of which there are two types -- a type for Raw queries and a second type for Aggregate queries. There are corresponding Executors for each type of Mapper, but both types of Executors share the
same interface.

There is also a new Query Planner, which is much simpler than the previous planner. This planner is now part of the tsdb package, and is responsible for determining which shards must be accessed. and accessing those shards either locally, or over the network using the cluster service.

Remaining work

The only remaining work is to have the cluster service "wrap" the two new Mappers types, such that remote shards can be mapped as easily as local shards. This should be quite straightforward -- for remote nodes the Co-ordinator node will hit (through the cluster service) a remote endpoint, which will in turn instantiate a suitable Mapper. This Mapper will communicate with the Co-ordinator node over a persistent TCP connection, allowing state to be maintained between the Co-ordinator node and remote node. The new Mappers should require no changes to support this.

The split between Raw and Aggregate queries was put in place because shards are driven significantly differently in each case, and making this separation explicit makes the code easier to develop and understand. However as a result there is some code duplication between the new Mappers and Executors, and further refactoring will take place. This will occur after the distributed query support has been fully completed.

This code is ready for merging as is, and no functional difference should be visible to users of the system. The next patch will then clearly show the changes for network support. Once that change is, distributed query support will be complete, and the current restrictions on replication factor can be removed.

Testing

All pre-existing query tests pass, and some new tests have been added for the Mapeprs. Where changes have been made, it was for reasons such as error message changes. Where the changes to some tests may be questioned, in-line notes have been added.

@@ -2310,12 +2309,6 @@ func TestServer_Query_LimitAndOffset(t *testing.T) {
params: url.Values{"db": []string{"db0"}},
},
&Query{
name: "limit + offset equal to the number of points with group by time",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test appears to have been duped.

@otoolep
Copy link
Contributor Author

otoolep commented Jul 12, 2015

#3009

@otoolep
Copy link
Contributor Author

otoolep commented Jul 12, 2015

@pauldix @dgnorton

"github.com/influxdb/influxdb/influxql"
)

// RawMapper is for retrieving data, for a query, from a given shard. It implements the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ShardMapper interface doesn't actually exist, and this comment needs correction.

@otoolep otoolep force-pushed the dq_squashed branch 3 times, most recently from ae52109 to aa5a1c6 Compare July 13, 2015 17:50
@otoolep
Copy link
Contributor Author

otoolep commented Jul 13, 2015

I just pushed up a change and it failed, looking into it now.

With this change, the query engine code gathers information about
shards and tagsets by working with individual shards, collating the
information, and returning that to the client. It does not assume that any
particular shard is local, and accesses all shards through abstracted
Mappers, of which there are two types -- a Mapper type for Raw queries
and a second type for Aggregate queries. There are corresponding
Executors for each type of Mapper, but both types of Executors share the
same interface.
@otoolep
Copy link
Contributor Author

otoolep commented Jul 14, 2015

Closing in favour of #3320.

@otoolep otoolep closed this Jul 14, 2015
@otoolep otoolep reopened this Aug 20, 2015
@otoolep otoolep closed this Aug 20, 2015
@otoolep otoolep deleted the dq_squashed branch August 20, 2015 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant