Handle distributed queries when shards != data nodes #2327

jwilder · 2015-04-17T22:36:18Z

There was previously an explict panic put in the query engine to prevent
queries where the number of shards was not equal to the number of data nodes
in the cluster. This was waiting for the distributed queries branch to land
but was not removed when that landed.

There may be a more efficient way to do fix this but this fix simply queries
all the shards and merges their outputs. Previously, the code assumed that
only one shard would be hit. Querying multiple shards ended up producing
duplicate values during the map phase so the map output needed to be merged
as opposed to appended to avoid the dups.

Fixes #2272 There was previously a explict panic put in the query engine to prevent queries where the number of shards was not equal to the number of data nodes in the cluster. This was waiting for the distributed queries branch to land but was not removed when that landed. There may be a more efficient way to do fix this but this fix simply queries all the shards and merges their outputs. Previously, the code assumed that only one shard would be hit. Querying multiple shards ended up producing duplicate values during the map phase so the map output needed to be merged as opposed to appended to avoid the dups.

otoolep · 2015-04-17T23:01:31Z

influxql/engine.go

@@ -204,6 +204,50 @@ func (m *MapReduceJob) Execute(out chan *Row, filterEmptyResults bool) {
 	out <- row
 }

+// mergeOutputs merges two sorted slices of rawQueryMapOutput such that duplicate


I don't follow -- how can we end up with duplicate timestamps? Data should only be de-duped if the series and timestamp is the same. But for a given series we should only be hitting 1 shard. I don't see how dupes can arise.

It's happens from removing the hard coding of sg.shards[0]here: https://github.com/influxdb/influxdb/blob/master/tx.go#L147

It's now looping over each shard.

jwilder added the 2 - Working label Apr 17, 2015

Increase circle test timeout to 7m

9ca0fe9

otoolep reviewed Apr 17, 2015
View reviewed changes

jwilder closed this Apr 18, 2015

jwilder removed the 2 - Working label Apr 18, 2015

jwilder deleted the 2272 branch April 21, 2015 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle distributed queries when shards != data nodes #2327

Handle distributed queries when shards != data nodes #2327

jwilder commented Apr 17, 2015

otoolep Apr 17, 2015

jwilder Apr 17, 2015

Handle distributed queries when shards != data nodes #2327

Handle distributed queries when shards != data nodes #2327

Conversation

jwilder commented Apr 17, 2015

otoolep Apr 17, 2015

Choose a reason for hiding this comment

jwilder Apr 17, 2015

Choose a reason for hiding this comment