Handle distributed queries when shards != data nodes #2336

jwilder · 2015-04-18T06:31:22Z

This PR has 3 changes:

There was previously a explict panic put in the query engine to prevent
queries where the number of shards was not equal to the number of data nodes
in the cluster. This was waiting for the distributed queries branch to land
but was not removed when that landed. This PR removes that panic.
Adds a change to prefer local shard data over remote data to prevent unnecessary network traffic when deciding whether to use local LocalMapper or a RemoteMapper
Changes the order of closing the broker and raft log to avoid a panic when closing the server.

svscorp · 2015-04-18T08:38:37Z

👍

otoolep · 2015-04-18T16:59:52Z

@jwilder -- one thing you should be aware of is that Test3NodeClusterPartiallyReplicated is skipped so no actual distributed queries are getting run (not via that test anyway). Do you want to try enabling that test and seeing how your code runs? This is the test that is causing problems in CI, so we need to understand why and fix it before we can consider DQ complete.

otoolep · 2015-04-19T19:00:09Z

cmd/influxd/run.go

@@ -105,14 +105,14 @@ func (s *Node) Close() error {
 		}
 	}

-	if s.Broker != nil {
-		if err := s.Broker.Close(); err != nil {
+	if s.raftLog != nil {


Good catch.

otoolep · 2015-04-19T23:48:17Z

tx.go


-				shard := sg.Shards[0]
+				// pick a shard to query
+				shard := sg.Shards[rand.Intn(len(sg.Shards))]


I might be missing something, but I don't see how this can work. Shard groups are created here:

https://github.com/influxdb/influxdb/blob/master/server.go#L1090

If there are 3 nodes, and a replication factor of 1, then 3 shards are created on the cluster, each with different data (there is no replication, sharding takes place purely for write throughput). Therefore when a shard group is selected for query, then selecting only 1 of the shards at random means that 2/3 of the data in that shard group is not queried. It seems that this code assumes that all shards in a shard group contain the same data, which is not always the case.

Am I missing something? Furthermore I'm pretty sure this whole thing needs to be more complex than this. Say I have to query series IDs 3 & 4, and a certain shard group is the one I want (determined by time). It may be possible that I don't need to query 1 of the shards in the shard group, because I know that data for Series IDs 3 & 4 doesn't exist in that 1 shard, only the other two. I can determine this by reversing the shard routing that takes place at write-time.

If my reasoning is correct, our testing need work, since it should catch this, but it did not. If it's not correct, can you explain what I am missing?

I am also curios how will it behave with 3 nodes and replicaN = 2

https://github.com/influxdb/influxdb/blob/master/server.go#L1112

Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed.

Closing the broker before the raft log can trigger this panic since the raft log depends on the broker via the FSM. panic: apply: broker apply: broker already closed goroutine 29164 [running]: github.com/influxdb/influxdb/raft.(*Log).applier(0xc20833b040, 0xc20802bd40) /Users/jason/go/src/github.com/influxdb/influxdb/raft/log.go:1386 +0x278 created by github.com/influxdb/influxdb/raft.func·002 /Users/jason/go/src/github.com/influxdb/influxdb/raft/log.go:389 +0x764

otoolep · 2015-04-20T19:59:07Z

tx.go

+				shards := map[*Shard][]uint64{}
+				for _, sid := range t.SeriesIDs {
+					shard := sg.ShardBySeriesID(sid)
+					shards[shard] = append(shards[shard], sid)


OK, very good. You're using "map" as a set.

To make it even clearer, you might just like to store struct{} in there, and store nil as the value, since we don't care about the value. Right now sid will be overwritten with newer values so it's no use.

otoolep · 2015-04-20T20:01:39Z

Looks good -- I think you've got it. All makes sense to me. +1

otoolep · 2015-04-20T20:04:15Z

Don't forget the changelog.

Handle distributed queries when shards != data nodes

toddboom · 2015-04-20T20:17:38Z

@jwilder merged. i did the changelog on master.

jwilder added the 2 - Working label Apr 18, 2015

jwilder force-pushed the 2272-fix branch from 5e40559 to fc6d1c1 Compare April 19, 2015 05:27

otoolep reviewed Apr 19, 2015
View reviewed changes

jwilder force-pushed the 2272-fix branch 2 times, most recently from ec35428 to 9612ea7 Compare April 19, 2015 22:44

otoolep reviewed Apr 19, 2015
View reviewed changes

jwilder added 4 commits April 20, 2015 09:23

Re-enable Test3NodeClusterPartiallyReplicated

fd4a698

Make 3 node failover test parallel

94f50ac

jwilder force-pushed the 2272-fix branch from 292b3f2 to 94f50ac Compare April 20, 2015 15:25

otoolep reviewed Apr 20, 2015
View reviewed changes

toddboom added a commit that referenced this pull request Apr 20, 2015

Merge pull request #2336 from influxdb/2272-fix

30b56ce

Handle distributed queries when shards != data nodes

toddboom merged commit 30b56ce into master Apr 20, 2015

toddboom removed the 2 - Working label Apr 20, 2015

toddboom deleted the 2272-fix branch April 20, 2015 20:16

toddboom restored the 2272-fix branch April 20, 2015 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle distributed queries when shards != data nodes #2336

Handle distributed queries when shards != data nodes #2336

jwilder commented Apr 18, 2015

svscorp commented Apr 18, 2015

otoolep commented Apr 18, 2015

otoolep Apr 19, 2015

otoolep Apr 19, 2015

svscorp Apr 20, 2015

otoolep Apr 20, 2015

otoolep Apr 20, 2015

otoolep commented Apr 20, 2015

otoolep commented Apr 20, 2015

toddboom commented Apr 20, 2015

Handle distributed queries when shards != data nodes #2336

Handle distributed queries when shards != data nodes #2336

Conversation

jwilder commented Apr 18, 2015

svscorp commented Apr 18, 2015

otoolep commented Apr 18, 2015

otoolep Apr 19, 2015

Choose a reason for hiding this comment

otoolep Apr 19, 2015

Choose a reason for hiding this comment

svscorp Apr 20, 2015

Choose a reason for hiding this comment

otoolep Apr 20, 2015

Choose a reason for hiding this comment

otoolep Apr 20, 2015

Choose a reason for hiding this comment

otoolep commented Apr 20, 2015

otoolep commented Apr 20, 2015

toddboom commented Apr 20, 2015