-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed query load balancing and failover #2301
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
looks good - 👍 with the changes Philip suggested |
"time" | ||
) | ||
|
||
// Balancer represent a load-balancing algorithm for a set of DataNodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: represent -> represents.
Very nice, I have some minor feedback I'd like to see addressed, but I think this should work well. Don't forget the changelog. |
By setting it, data node requests can be served by the http handler before the data node is actually ready. Possible fix for: 2015/04/14 11:33:54 http: panic serving 10.0.1.8:62661: runtime error: invalid memory address or nil pointer dereference goroutine 11467 [running]: net/http.func·011() /usr/local/go/src/net/http/server.go:1130 +0xcc github.com/influxdb/influxdb.(*Server).broadcast(0xc20805cc00, 0xc208220000, 0x5d25e0, 0xc208869e80, 0x0, 0x0, 0x0) /Users/jason/go/src/github.com/influxdb/influxdb/server.go:568 +0x227 github.com/influxdb/influxdb.(*Server).CreateDataNode(0xc20805cc00, 0xc2081c6e70, 0x0, 0x0) /Users/jason/go/src/github.com/influxdb/influxdb/server.go:859 +0xe6 github.com/influxdb/influxdb/httpd.(*Handler).serveCreateDataNode(0xc20842ea00, 0x19378c0, 0xc2082207e0, 0xc2083191e0)
Adds a Balancer interface to allow RemoteMappers to send data node requests to multiple nodes. It also provides the ability to failed requests to mark the data node as offline using exponential backoff with a 5 min max wait time. Fixes #2242
All comments addressed. |
jwilder
added a commit
that referenced
this pull request
Apr 17, 2015
Distributed query load balancing and failover
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements load-balancing and failover for distributed queries
It works through the distributed queries themselves. When a distributed query is run it needs to find the data nodes holding the data for a given shard. When the
RemoteMapper
runs, a random data node is chosen so that requests are randomized across shards to spread load in the normal case. When a request fails, it is marked as down using exponential back w/ a 5 min cap. TheRemoteMapper
will then continue on to the next data node available to service that shard. SubsequentRemoteMapper
calls will skip down nodes until their timeout period expires.The offline state for a DataNode is attached to the existing
Server.DataNode
map so it should be able to be used in other functionality as well.Fixes #2242 #2243 #2190