Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support incremental cluster joins #3478

Merged
merged 9 commits into from
Jul 28, 2015
Merged

Support incremental cluster joins #3478

merged 9 commits into from
Jul 28, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Jul 27, 2015

Overview

This PR is a follow up to #3372 and implements more the of functionality for #2966. Specifically, it allows servers to become raft peers when joining. This allows single node cluster to be expanded to larger clusters and automatically use a large number of nodes for raft consensus to provide better availability. By default, this is still hard-code to 3 nodes max, but could be made configurable. The first three nodes added to the cluster will become raft peer members. The remaining are data-only nodes.

To add a new member to the cluster, you should start influxd with the -join flag. The -join flag tags a host:port value. Multiple host:ports can be specific by comma separating them. For example:

influxd -join host1:8088,host2:8088,host3:8088

The port should be the cluster port (default 8088). Not the API port (default 8086).

The existing [meta].Peers config var can also be used and is equivalent to the -join flag. If a both are specified, the -join flag override the config value.

If nodes are restarted that have previously joined a cluster, their existing cluster state on disk will be used and any join or config variable will be ignored.

Additionally, the SHOW SERVERS statement now has a raft column that indicates whether the node is running raft or not.

Not implemented

  • Leaving the cluster - The infrastructure is in place to remove a raft peer, but is not currently exposed. Removing a non-raft peer is not supported at this time.
  • Promoting/demoting raft peers - The infrastructure to promote/demote nodes to become raft peers is in place but not exposed or wired up currently.
  • Changing hostnames - The hostname:port used when joining the cluster can't be changed after the fact. This will be addressed in Should update metastore and cluster if IP or hostname changes #3421

@pauldix
Copy link
Member

pauldix commented Jul 28, 2015

overall lgtm. w00t clustering!!

jwilder added 9 commits July 28, 2015 09:40
This change adds the first 3 nodes to the cluster as raft peers. Other
nodes are data-only.
* Test add new nodes that become raft peers
* Test restarting a cluster w/ 3 raft nodes and 3 non-raft nodes
Removes the two separate variables in the meta.Config.  -join will
now override the Peers var.
Reports whether the not is part of the raft consensus cluster or not.
There is a race when stopping servers where the meta.Store is closing
but the server has not signaled it is closing so the reporting goroutine
repeeatedly errors out in fast loop during this time.  It creates a lot
of noise in the logs.
jwilder added a commit that referenced this pull request Jul 28, 2015
Support incremental cluster joins
@jwilder jwilder merged commit 1536cd5 into master Jul 28, 2015
@jwilder jwilder deleted the jw-cluster branch July 28, 2015 16:05
@beckettsean
Copy link
Contributor

@jwilder is there a difference between the following scenarios? (assume servers are launched in alphabetical order)

B, C, & D join A

Server A: influxd -config /path/to/config
Server B: influxd -config /path/to/config -join serverA:8088
Server C: influxd -config /path/to/config -join serverA:8088
Server D: influxd -config /path/to/config -join serverA:8088

B joins A, C joins B & A, D joins C, B, & A

Server A: influxd -config /path/to/config
Server B: influxd -config /path/to/config -join serverA:8088
Server C: influxd -config /path/to/config -join serverA:8088,serverB:8088
Server D: influxd -config /path/to/config -join serverA:8088,serverB:8088,serverC:8088

Putting it another way, why would I specify multiple servers in the -join flag?

@beckettsean
Copy link
Contributor

"The existing [meta].Peers config var can also be used and is equivalent to the -join flag. If a both are specified, the -join flag override the config value."

So what's the reason for the -join flag? If I set the peers correctly in [meta] the server does the right thing on a restart. If I just use the -join flag it only does the right thing on the first launch (or I must pass -join every time).

@pauldix
Copy link
Member

pauldix commented Jul 29, 2015

@beckettsean peers and join only get used on the first startup. Generally I would prefer to not even have peers since people think it's something that gets used all the time if it's in the config.

Once it has joined the cluster, it writes a file to the local disk that contains the information. On restart it will see this file is there and use that to reconnect to the cluster.

@pauldix
Copy link
Member

pauldix commented Jul 29, 2015

There's no difference in the situation you describe. The list is just to have other ones to try to join to if the first fails. The only thing that matters is the order in which the four systems join the cluster because the 4th won't run the consensus protocol

@beckettsean
Copy link
Contributor

Thanks, Paul. I tend to agree, that if [meta] peers is a one-time thing it shouldn't be in the config at all. Maybe we can deprecate that with 0.9.3 and remove in 0.9.4?

Is the cluster file on disk human readable and/or editable? If I blow it away presumably the node forgets about the cluster on restart unless there's a -join or [meta] section involved?

Understood about the first three make the consensus group and all subsequent are not participants. It makes sense that the -join command could specify fallback servers in case one doesn't respond, and that makes it idempotent with respect to startup timing for each process.

@kfitzpatrick
Copy link
Contributor

It would be great to have something that could be run while the database is up or in the config file that I can change and bounce the service. When I'm spinning up machines I don't want to have to pass a special argument to the machine. it makes the monit scripts a pain to deal with and maintain automatically.

Also, if a new node is added, then I have to go back into all the existing nodes somehow and change everything.

Reason this matters: I would like to add the ability to add nodes dynamically to a cluster instead of having to bring them all up at the same time in our hosted environment.

@pauldix
Copy link
Member

pauldix commented Jul 31, 2015

@kfitzpatrick you don't need to do anything to existing nodes. They all get notified of the new member automatically. This design allows you to bring up new nodes at any time and join them to a cluster

@kfitzpatrick
Copy link
Contributor

So let's see if I got this. I create node A. I bring up another node (B) and have to start it with "--join A:8088". Or, if it's not removed, add A to B's Peers in the config and bring it up and now they all know about each other.

Correct, @pauldix ?

@pauldix
Copy link
Member

pauldix commented Jul 31, 2015

right, except I'm not sure what you mean by "Or, if it's not removed". I assume you meant that if you don't specify the join argument you have to list A in B's peers? If so, then you're correct

@toddboom
Copy link
Contributor

toddboom commented Aug 1, 2015

i think he meant "if it's not removed from the config file". maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants