Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed query load balancing and failover #2301

Merged
merged 6 commits into from
Apr 17, 2015
Merged

Distributed query load balancing and failover #2301

merged 6 commits into from
Apr 17, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Apr 15, 2015

This PR implements load-balancing and failover for distributed queries

It works through the distributed queries themselves. When a distributed query is run it needs to find the data nodes holding the data for a given shard. When the RemoteMapper runs, a random data node is chosen so that requests are randomized across shards to spread load in the normal case. When a request fails, it is marked as down using exponential back w/ a 5 min cap. The RemoteMapper will then continue on to the next data node available to service that shard. Subsequent RemoteMapper calls will skip down nodes until their timeout period expires.

The offline state for a DataNode is attached to the existing Server.DataNode map so it should be able to be used in other functionality as well.

Fixes #2242 #2243 #2190

@toddboom
Copy link
Contributor

looks good - 👍 with the changes Philip suggested

"time"
)

// Balancer represent a load-balancing algorithm for a set of DataNodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: represent -> represents.

@otoolep
Copy link
Contributor

otoolep commented Apr 16, 2015

Very nice, I have some minor feedback I'd like to see addressed, but I think this should work well.

Don't forget the changelog.

jwilder added 6 commits April 17, 2015 11:28
By setting it, data node requests can be served by the http handler
before the data node is actually ready.

Possible fix for:

2015/04/14 11:33:54 http: panic serving 10.0.1.8:62661: runtime error: invalid memory address or nil pointer dereference
goroutine 11467 [running]:
net/http.func·011()
	/usr/local/go/src/net/http/server.go:1130 +0xcc
github.com/influxdb/influxdb.(*Server).broadcast(0xc20805cc00, 0xc208220000, 0x5d25e0, 0xc208869e80, 0x0, 0x0, 0x0)
	/Users/jason/go/src/github.com/influxdb/influxdb/server.go:568 +0x227
github.com/influxdb/influxdb.(*Server).CreateDataNode(0xc20805cc00, 0xc2081c6e70, 0x0, 0x0)
	/Users/jason/go/src/github.com/influxdb/influxdb/server.go:859 +0xe6
github.com/influxdb/influxdb/httpd.(*Handler).serveCreateDataNode(0xc20842ea00, 0x19378c0, 0xc2082207e0, 0xc2083191e0)
Adds a Balancer interface to allow RemoteMappers to send data node
requests to multiple nodes.  It also provides the ability to failed
requests to mark the data node as offline using exponential
backoff with a 5 min max wait time.

Fixes #2242
@jwilder
Copy link
Contributor Author

jwilder commented Apr 17, 2015

All comments addressed.

jwilder added a commit that referenced this pull request Apr 17, 2015
Distributed query load balancing and failover
@jwilder jwilder merged commit 8ee8218 into master Apr 17, 2015
@jwilder jwilder deleted the races branch April 23, 2015 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributed Query should balance requests
3 participants