Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heal raft cluster #4685

Merged
merged 9 commits into from
Nov 11, 2015
Merged

Heal raft cluster #4685

merged 9 commits into from
Nov 11, 2015

Conversation

corylanou
Copy link
Contributor

This PR enables the ability for a node to be promoted to a raft node when needed to fully heal the raft cluster.

  • Add tests
  • Add Changelog

}

if sz == 0 {
return internal.RPCType_Error, nil, fmt.Errorf("invalid message size: %d", sz)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwilder this originally returned 0 which didn't seem valid, so I updated it to internal.RPCType_Error

@jwilder
Copy link
Contributor

jwilder commented Nov 6, 2015

Needs a changelog and just some minor naming comments, but 👍.

@@ -63,6 +65,8 @@ func (r *rpc) proxyLeader(conn *net.TCPConn) {
defer leaderConn.Close()

leaderConn.Write([]byte{MuxRPCHeader})
// re-write the original message to the leader
leaderConn.Write(buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Was the red blob removed below moved into this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to always forward everything to the leader. No, depending on the message, it may not be intended for the leader. In the case of promoting a new raft node, we explicitly don't want to talk to the leader, but talk to the node that needs to be promoted to the raft cluster. The read was refactored into two other methods to make it easier to interrogate the message coming in. Then, if we did want to talk to the leader, we have to re-write the original message, which is what we are doing above.

@otoolep
Copy link
Contributor

otoolep commented Nov 10, 2015

This is a pretty significant feature, that some people may not want. I could envision some unstable clusters where nodes come and go, and auto-promotion makes it harder to understand what is happening.

So can I suggest a config option to allow this to be disabled? In the meta section?

https://github.com/influxdb/influxdb/blob/master/etc/config.sample.toml#L27

raft-promotion-disable=false

for example.

@corylanou
Copy link
Contributor Author

@otoolep I agree about the config setting to disable this feature. I'll add that.

I'll do it as raft-promotion-enabled and make it on by default.

if err := s.rpc.enableRaft(n.Host, peers); err != nil {
return fmt.Errorf("error notifying raft peer: %s", err)
}
s.Logger.Printf("promoted nodeID %d, host %s to raft peer", n.ID, n.Host)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otoolep added log entry for successful promotion of node to raft peer.

@@ -35,6 +35,7 @@
- [#4721](https://github.com/influxdb/influxdb/pull/4721): Export tsdb.InterfaceValues
- [#4681](https://github.com/influxdb/influxdb/pull/4681): Increase default buffer size for collectd and graphite listeners
- [#4659](https://github.com/influxdb/influxdb/pull/4659): Support IF EXISTS for DROP DATABASE
- [#4685](https://github.com/influxdb/influxdb/pull/4685): Heal Raft Cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think this could be a bit more descriptive. E.g. "Cluster auto-promotes data node to Raft nodes on loss of Raft node". Just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only promotes a raft node when it sees a gap in the peers. It can only see a gap in the peers if someone issues a DROP SERVER.

They type of healing you are referring too is beyond the scope of this PR, and I'm not sure we would actually want to take those steps or not.

@otoolep
Copy link
Contributor

otoolep commented Nov 11, 2015

@corylanou -- what will happen in the following scenario?

  • 3 Raft nodes, 1 data node.
  • A network partition causes 1 of the Raft nodes to be disconnected from the cluster.
  • The data node is promoted.
  • The partition heals.
  • The disconnected Raft node can now contact the cluster again.

What is the status of the Raft node that comes back? What if it has data on it, that was solely replicated on that node? Will that data be queryable?

@corylanou
Copy link
Contributor Author

If a partition happens, we don't change any of the peers. You have to manually issue a drop server before another node would promote itself.

@otoolep
Copy link
Contributor

otoolep commented Nov 11, 2015

Ah, OK, this only happens in response to DROP NODE? OK, makes sense.

@otoolep
Copy link
Contributor

otoolep commented Nov 11, 2015

+1

corylanou added a commit that referenced this pull request Nov 11, 2015
@corylanou corylanou merged commit 8ec4d04 into master Nov 11, 2015
@corylanou corylanou removed the review label Nov 11, 2015
@corylanou corylanou deleted the heal-raft-cluster branch November 11, 2015 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants