Separate broker and data nodes #2175

jwilder · 2015-04-06T18:52:23Z

This PR adds the ability to start up a cluster with dedicated broker and data nodes. It fixes #1934. It also allows cluster communication to be separated from the public API via separate ports and/or interfaces. By default they all share the same port and interface.

When starting separate data and broker nodes, you now must disable the role in the config.

# Broker only
[data]
enabled = false

and/or

# Data only
[broker]
enabled = false

By default, they are both false and must be explicitly enabled. The sample config has an example w/ both enabled. ~~This also means that starting influxd w/o a config file will currently exit w/ an error. (We may want to add additional flags to specify a role (e.g. -enable-data, -enable-broker))~~

To join a cluster, you start a node and join it to any other member w/ the -join http://host:port[,http://host:port].... The port should be the cluster port and the host can any node in the cluster.

There are several config level changes. Some options have been removed or replaced:

Added - Port - The cluster port is now the default port used for cluster endpoints and API endpoints. The default is 8086.
Added - [api].Port - The API port used for API endpoints. If not specified, 8086 is used.
Added - [broker].Enabled - Determines whether the node runs as a broker. Default false.
Added - [data].Enabled - Determines whether the node runs as a data node. Default false.
Removed - [broker].Port - This has been replaced w/ the cluster port
Removed - [data].Port - This has been replaced w/ the cluster port
Removed - [cluster] - This whole section was removed as it was not used. The actual values are at the root of the config.
Removed - [Initialization].JoinURLs - This section has been removed. Join URLs are specified on the command line via the -join flag. If a node was already joined to a cluster, then these flags are ignored (and logged as ignored) to help avoid confusion.

While this PR adds the ability to start a cluster w/ separate broker and data nodes, there are still several open issues and more extensive testing necessary:

otoolep · 2015-04-06T19:56:10Z

cmd/influxd/config.go

+}
+
+// Snapshot represents the configuration for a snapshot service
+type Snapshot struct {


I wonder if it would be worth moving this block into data, somewhat like Retention since snapshotting can only be performed on data nodes.

I think it's best as is, but it would be worth commenting the file, letting the user know this only applies to data nodes.

otoolep · 2015-04-06T20:07:32Z

cmd/influxd/config.go

-	return fmt.Sprintf("%s:%d", c.BindAddress, c.Broker.Port)
+// ClusterAddr returns the binding address for the cluster
+func (c *Config) ClusterAddr() string {
+	return fmt.Sprintf("%s:%d", c.BindAddress, c.Port)


We should standardize and use net.JoinHostPort() here too.

otoolep · 2015-04-06T21:54:16Z

cmd/influxd/main.go

@@ -72,9 +72,15 @@ func main() {
 	// Extract name from args.
 	switch cmd {
 	case "run":


You can simply write case "run", "":

The args are slightly different so still need to check the first is run or not. Using case "run", "" causes a panic when starting influx w/o any args.

Ah, OK. I see. Still seems a pity to replicate that code right there. How about:

case "", "run": if cmd == "run" { args = args[1:] } cmd := NewRunCommand() if err := cmd.Run(args...); err != nil { log.Fatalf("run: %s", err) } }

Might not be worth it.

otoolep · 2015-04-06T22:06:49Z

OK, took a first pass. Makes sense at a high level, but I think the construction is off. The fact that RunCommand calls Open on itself from within Run is a red flag. I think the RunCommand should create a node -- but do the minimum amount possible to that node -- perhaps allocate the server and broker objects depending on the config passed in. Also construct the handlers.

But I think too much is being done in RunCommand. I think we need to create a proper top-level Node object, which encapsulates the stuff it's doing.

otoolep · 2015-04-06T22:31:33Z

cmd/influxd/run.go

-		if clusterID := b.Broker.ClusterID(); clusterID != 0 {
-			go s.StartReportingLoop(clusterID)
+	if !cmd.config.ReportingDisabled {
+


This blank line looks superfluous.

toddboom · 2015-04-06T22:31:47Z

👍 on this once we can get the suite green

otoolep · 2015-04-06T22:32:26Z

cmd/influxd/run.go

@@ -288,7 +398,7 @@ func openBroker(path string, u url.URL, initializing bool, joinURLs []url.URL, r
 	}
 	log.Printf("broker opened at %s", path)

-	// Attach the broker as the finite state machine of the raft log.
+	// Attach the broker as the f	inite state machine of the raft log.


Match Broker and Data config var names

Simplified signature and state now depends on indexes vs directory existence

Adds a simple test to start a separate broker and data node. Data nodes still need a separate set join URLs which is not in place yet.

To add a new data node, it currently needs a broker and another data node to join. Temporarily adding a JoinURLs option to the Data node section so a standalone data node can be created but the intent is that this will be removed. Ideally, the the joinURL could point to either a data node or a broker and it would get the required URLs from that host but that is not possible currently.

How a cluster is setup has changed and this test is failing w/ panic: assert failed: invalid initial server id: 2 [recovered] There is an existing multi-node test w/ a broker and two data nodes so we're still covering this case and will need to come back to it.

This is the port that all cluster communication will take place over. It will replace the separate data and broker ports.

This removes all join URLs from the config. To join a node to a cluster, the URL of another member of the cluster should be passed on the command line w/ the -join flag. The join URLs can now be any node regardless of whether the node is a broker only or data only node. At join time, the receiving node will redirect the request to a valid broker or data node if it cannot handle the request itself.

When starting multiple servers concurrently, they can race to connect to each other. This change just has the join attempts retry to make cluster setup easier.

If a node is restarted and it had already joined the cluster, ignore and log that the join urls are being ignored and existing cluster state will be used.

Removed unused items and add new ones

Server is very overloaded currently so use Node to represent the container that holds onto a broker and data node (server currently)

applySetTopicMaxIndex() was updating the topics.indexByUrl w/o locking it. WARNING: DATA RACE Write by goroutine 1365: runtime.mapassign1() /usr/local/go/src/runtime/hashmap.go:376 +0x0 github.com/influxdb/influxdb/messaging.(*Broker).applySetTopicMaxIndex() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:496 +0x198 github.com/influxdb/influxdb/messaging.(*Broker).Apply() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:542 +0x33a github.com/influxdb/influxdb.(*Broker).Apply() <autogenerated>:1 +0x78 github.com/influxdb/influxdb/messaging.(*RaftFSM).Apply() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:614 +0x24f github.com/influxdb/influxdb/raft.(*Log).applyNextUnappliedEntry() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1431 +0x75c github.com/influxdb/influxdb/raft.(*Log).applier() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1369 +0x18f Previous read by goroutine 1540: runtime.mapiterinit() /usr/local/go/src/runtime/hashmap.go:535 +0x0 github.com/influxdb/influxdb/messaging.(*Topic).DataURLs() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:681 +0x11d github.com/influxdb/influxdb/cmd/influxd.(*Handler).serveMetadata() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:95 +0x3fd github.com/influxdb/influxdb/cmd/influxd.(*Handler).ServeHTTP() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:45 +0x540 net/http.serverHandler.ServeHTTP() /usr/local/go/src/net/http/server.go:1703 +0x1f6 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1204 +0x1087

Add #2175

jwilder · 2015-04-07T14:57:02Z

All comments addressed.

Separate broker and data nodes

jwilder added the 2 - Working label Apr 6, 2015

otoolep reviewed Apr 6, 2015
View reviewed changes

jwilder force-pushed the data-broker-1934 branch from 4845a5f to 4395172 Compare April 6, 2015 20:06

otoolep reviewed Apr 6, 2015
View reviewed changes

jwilder force-pushed the data-broker-1934 branch 3 times, most recently from eca5845 to 5b7efb0 Compare April 6, 2015 21:46

otoolep reviewed Apr 6, 2015
View reviewed changes

jwilder added 8 commits April 6, 2015 16:37

Refactor help into command

88f9810

Refactor execRun to RunCommand

82b3108

Add logger to RunCommand

070bb3c

Allow passing config to RunCommand.Open

5918d48

Convert flag.String to flag.StringVar

358bb9b

Add a enabled config option for broker and data options

dcb3e85

Rename ContinuouseQuery Disable to Disabled

eb9956b

Match Broker and Data config var names

Add tests for config data/broker enabled

384e3f3

jwilder added 18 commits April 6, 2015 16:38

Move openBroker to RunCommand.openBroker

6fa0ea0

Refactor openServer/openBroker

388b1c2

Simplified signature and state now depends on indexes vs directory existence

First pass at separate data and broker nodes

5d6536f

Adds a simple test to start a separate broker and data node. Data nodes still need a separate set join URLs which is not in place yet.

Remove unused cluster config section

73f6f8c

Add Cluster port

23819d1

This is the port that all cluster communication will take place over. It will replace the separate data and broker ports.

Replace broker port w/ cluster port

4a7cae4

Separate cluster and API endpoints

b85eba5

Replace data port w/ cluster port

5e3c26a

Replace broker url w/ cluster url

d06c4ea

Rename DataAddrUDP to APIAddrUDP

9af362b

Re-enable 3 node test

01ee3fe

Handle server unavailable response

aa5696c

When starting multiple servers concurrently, they can race to connect to each other. This change just has the join attempts retry to make cluster setup easier.

Ignore join urls if restarting a node

5482153

If a node is restarted and it had already joined the cluster, ignore and log that the join urls are being ignored and existing cluster state will be used.

Update config.toml.sample

ba61643

Removed unused items and add new ones

Rename Server to Node

01c2de6

Server is very overloaded currently so use Node to represent the container that holds onto a broker and data node (server currently)

jwilder force-pushed the data-broker-1934 branch from 5b7efb0 to 01c2de6 Compare April 6, 2015 22:38

jwilder added 3 commits April 6, 2015 21:16

Fix typos in comments

9e109e8

Update CHANGELOG

fd11797

Add #2175

pauldix added a commit that referenced this pull request Apr 7, 2015

Merge pull request #2175 from influxdb/data-broker-1934

a72707b

Separate broker and data nodes

pauldix merged commit a72707b into master Apr 7, 2015

pauldix removed the 2 - Working label Apr 7, 2015

jwilder deleted the data-broker-1934 branch April 7, 2015 16:00

This was referenced Apr 7, 2015

Lock down broker, data node handlers #1425

Closed

API handler and broker/data node handlers should be able to run on different ports #1426

Closed

jwilder mentioned this pull request Apr 8, 2015

Bring back config join URLs #2201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate broker and data nodes #2175

Separate broker and data nodes #2175

jwilder commented Apr 6, 2015

otoolep Apr 6, 2015

otoolep Apr 6, 2015

otoolep Apr 6, 2015

otoolep Apr 6, 2015

jwilder Apr 6, 2015

otoolep Apr 6, 2015

otoolep commented Apr 6, 2015

otoolep Apr 6, 2015

toddboom commented Apr 6, 2015

otoolep Apr 6, 2015

jwilder commented Apr 7, 2015

Separate broker and data nodes #2175

Separate broker and data nodes #2175

Conversation

jwilder commented Apr 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

otoolep commented Apr 6, 2015

Choose a reason for hiding this comment

toddboom commented Apr 6, 2015

Choose a reason for hiding this comment

jwilder commented Apr 7, 2015