Simplify cluster startup for scripting and deployment #5602

corylanou · 2016-02-09T20:37:59Z

This PR will allow the specifying of the meta nodes with the -join argument. It will require that you specify all current nodes for restart. These arguments no longer need to change between restarts.

Example of starting a 3 node meta/data cluster:

# node 1
influxd -config ~/influx1.toml -join localhost:8191,localhost:8291,localhost8391

# node 2
influxd -config ~/influx2.toml -join localhost:8191,localhost:8291,localhost8391

# node 3
influxd -config ~/influx3.toml -join localhost:8191,localhost:8291,localhost8391

Will result in this server configuration:

> show servers
name: data_nodes
----------------
id      http_addr       tcp_addr
3       localhost:8186  localhost:8188
4       localhost:8286  localhost:8288
6       localhost:8386  localhost:8388


name: meta_nodes
----------------
id      http_addr       tcp_addr
1       localhost:8291  localhost:8288
2       localhost:8191  localhost:8188
5       localhost:8391  localhost:8388

This has also been tested with bringing up a cluster, and then having a new node join.

# node 1
influxd -config ~/influx1.toml -join localhost:8191,localhost:8291

# node 2
influxd -config ~/influx2.toml -join localhost:8191,localhost:8291

Wait for the cluster to be healthy:

> show servers
name: data_nodes
----------------
id      http_addr       tcp_addr
3       localhost:8186  localhost:8188
4       localhost:8286  localhost:8288


name: meta_nodes
----------------
id      http_addr       tcp_addr
1       localhost:8291  localhost:8288
2       localhost:8191  localhost:8188

Join new node. Notice you need to specify all nodes in the -join argument

# node 3
influxd -config ~/influx3.toml -join localhost:8191,localhost:8291,localhost8391

Also, to restart nodes 1 and 2, you now need to pass all three nodes. This is typically done via scripted automation.

e-dard · 2016-02-12T11:18:46Z

cmd/influxd/run/server.go

@@ -645,13 +652,16 @@ func (s *Server) initializeMetaClient() error {
 	if err := s.MetaClient.Open(); err != nil {
 		return err
 	}
-
-	if s.TSDBStore != nil {
+	for {


Not sure if this is preferable or not, but an alternative to the break/continue stuff could be:

n, err := s.MetaClient.CreateDataNode(s.httpAPIAddr, s.tcpAddr) for ; err != nil; n, err = s.MetaClient.CreateDataNode(s.httpAPIAddr, s.tcpAddr) { log.Printf("Unable to create data node. retry in 1s: %s", err.Error()) time.Sleep(time.Second) } s.Node.ID = n.ID

or even:

n, err := s.MetaClient.CreateDataNode(s.httpAPIAddr, s.tcpAddr) for err != nil { log.Printf("Unable to create data node. retry in 1s: %s", err.Error()) time.Sleep(time.Second) n, err = s.MetaClient.CreateDataNode(s.httpAPIAddr, s.tcpAddr) } s.Node.ID = n.ID

jwilder · 2016-02-12T15:43:30Z

👍 Maybe squash the WIP commit of mine though?

e-dard · 2016-02-12T18:22:51Z

LGTM 👍

…ned to the available pool

No longer needed now that peers are pull from the meta nodes.

Simplify cluster startup for scripting and deployment

corylanou force-pushed the cluster-startup branch 2 times, most recently from 1c31896 to bb7d548 Compare February 10, 2016 21:35

corylanou changed the title ~~WIP - Cluster startup~~ Simplify cluster startup for scripting and deployment Feb 11, 2016

corylanou force-pushed the cluster-startup branch from a181a7e to fbf0696 Compare February 11, 2016 14:15

corylanou mentioned this pull request Feb 11, 2016

[0.10.0] Nodes don't re-join cluster after a cluster-wide service restart #5464

Closed

jwilder added this to the 0.11.0 milestone Feb 11, 2016

e-dard reviewed Feb 12, 2016
View reviewed changes

corylanou force-pushed the cluster-startup branch from fbf0696 to a0282c5 Compare February 12, 2016 14:08

corylanou force-pushed the cluster-startup branch 2 times, most recently from 6c1f87f to 7b357da Compare February 12, 2016 17:54

jwilder force-pushed the cluster-startup branch from 7b357da to 3c25b67 Compare February 12, 2016 18:31

jwilder and others added 13 commits February 12, 2016 11:32

fix build after rebase on master

e1effa6

sane cluster starting with join args

d9f1df0

passing test suite... hopefully

807354f

fix adhoc joining of cluster

f861d58

fix data race

b17293f

specify bind address meta test

7e62201

specify raft bind address with real random ports

92e8516

misc fixes and changelog

360f405

make meta test suite less racy

df5d587

give less time to lose lease on random port for test

e9a2c33

fix race condition

52077b2

ask for a free port immediatly before using to prevent it being retur…

1b25c0c

…ned to the available pool

Remove peers.json

ddcfac7

No longer needed now that peers are pull from the meta nodes.

address pr feedback

7ad31fa

jwilder force-pushed the cluster-startup branch from 3c25b67 to 7ad31fa Compare February 12, 2016 18:35

Fix race in peerStore

cd56854

jwilder added a commit that referenced this pull request Feb 12, 2016

Merge pull request #5602 from influxdata/cluster-startup

ef571fc

Simplify cluster startup for scripting and deployment

jwilder merged commit ef571fc into master Feb 12, 2016

jwilder deleted the cluster-startup branch February 12, 2016 19:23

corylanou mentioned this pull request Feb 13, 2016

Clustering Startup/Configuration Clean up #5673

Closed

6 tasks

mvadu mentioned this pull request Feb 17, 2016

Error - testing Cluster setup on Windows #5715

Closed

PierreF mentioned this pull request Mar 16, 2016

[0.11.0-rc1] Cluster with -join need ALL node to restart #6027

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify cluster startup for scripting and deployment #5602

Simplify cluster startup for scripting and deployment #5602

corylanou commented Feb 9, 2016

e-dard Feb 12, 2016

jwilder commented Feb 12, 2016

e-dard commented Feb 12, 2016

Simplify cluster startup for scripting and deployment #5602

Simplify cluster startup for scripting and deployment #5602

Conversation

corylanou commented Feb 9, 2016

e-dard Feb 12, 2016

Choose a reason for hiding this comment

jwilder commented Feb 12, 2016

e-dard commented Feb 12, 2016