-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate broker and data nodes #2175
Conversation
} | ||
|
||
// Snapshot represents the configuration for a snapshot service | ||
type Snapshot struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be worth moving this block into data
, somewhat like Retention
since snapshotting can only be performed on data nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best as is, but it would be worth commenting the file, letting the user know this only applies to data nodes.
4845a5f
to
4395172
Compare
return fmt.Sprintf("%s:%d", c.BindAddress, c.Broker.Port) | ||
// ClusterAddr returns the binding address for the cluster | ||
func (c *Config) ClusterAddr() string { | ||
return fmt.Sprintf("%s:%d", c.BindAddress, c.Port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should standardize and use net.JoinHostPort()
here too.
eca5845
to
5b7efb0
Compare
@@ -72,9 +72,15 @@ func main() { | |||
// Extract name from args. | |||
switch cmd { | |||
case "run": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply write case "run", "":
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The args are slightly different so still need to check the first is run
or not. Using case "run", ""
causes a panic when starting influx
w/o any args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. I see. Still seems a pity to replicate that code right there. How about:
case "", "run":
if cmd == "run" {
args = args[1:]
}
cmd := NewRunCommand()
if err := cmd.Run(args...); err != nil {
log.Fatalf("run: %s", err)
}
}
Might not be worth it.
OK, took a first pass. Makes sense at a high level, but I think the construction is off. The fact that But I think too much is being done in |
if clusterID := b.Broker.ClusterID(); clusterID != 0 { | ||
go s.StartReportingLoop(clusterID) | ||
if !cmd.config.ReportingDisabled { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blank line looks superfluous.
👍 on this once we can get the suite green |
@@ -288,7 +398,7 @@ func openBroker(path string, u url.URL, initializing bool, joinURLs []url.URL, r | |||
} | |||
log.Printf("broker opened at %s", path) | |||
|
|||
// Attach the broker as the finite state machine of the raft log. | |||
// Attach the broker as the f inite state machine of the raft log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
Match Broker and Data config var names
Simplified signature and state now depends on indexes vs directory existence
Adds a simple test to start a separate broker and data node. Data nodes still need a separate set join URLs which is not in place yet.
To add a new data node, it currently needs a broker and another data node to join. Temporarily adding a JoinURLs option to the Data node section so a standalone data node can be created but the intent is that this will be removed. Ideally, the the joinURL could point to either a data node or a broker and it would get the required URLs from that host but that is not possible currently.
How a cluster is setup has changed and this test is failing w/ panic: assert failed: invalid initial server id: 2 [recovered] There is an existing multi-node test w/ a broker and two data nodes so we're still covering this case and will need to come back to it.
This is the port that all cluster communication will take place over. It will replace the separate data and broker ports.
This removes all join URLs from the config. To join a node to a cluster, the URL of another member of the cluster should be passed on the command line w/ the -join flag. The join URLs can now be any node regardless of whether the node is a broker only or data only node. At join time, the receiving node will redirect the request to a valid broker or data node if it cannot handle the request itself.
When starting multiple servers concurrently, they can race to connect to each other. This change just has the join attempts retry to make cluster setup easier.
If a node is restarted and it had already joined the cluster, ignore and log that the join urls are being ignored and existing cluster state will be used.
Removed unused items and add new ones
Server is very overloaded currently so use Node to represent the container that holds onto a broker and data node (server currently)
5b7efb0
to
01c2de6
Compare
applySetTopicMaxIndex() was updating the topics.indexByUrl w/o locking it. WARNING: DATA RACE Write by goroutine 1365: runtime.mapassign1() /usr/local/go/src/runtime/hashmap.go:376 +0x0 github.com/influxdb/influxdb/messaging.(*Broker).applySetTopicMaxIndex() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:496 +0x198 github.com/influxdb/influxdb/messaging.(*Broker).Apply() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:542 +0x33a github.com/influxdb/influxdb.(*Broker).Apply() <autogenerated>:1 +0x78 github.com/influxdb/influxdb/messaging.(*RaftFSM).Apply() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:614 +0x24f github.com/influxdb/influxdb/raft.(*Log).applyNextUnappliedEntry() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1431 +0x75c github.com/influxdb/influxdb/raft.(*Log).applier() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1369 +0x18f Previous read by goroutine 1540: runtime.mapiterinit() /usr/local/go/src/runtime/hashmap.go:535 +0x0 github.com/influxdb/influxdb/messaging.(*Topic).DataURLs() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:681 +0x11d github.com/influxdb/influxdb/cmd/influxd.(*Handler).serveMetadata() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:95 +0x3fd github.com/influxdb/influxdb/cmd/influxd.(*Handler).ServeHTTP() /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:45 +0x540 net/http.serverHandler.ServeHTTP() /usr/local/go/src/net/http/server.go:1703 +0x1f6 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1204 +0x1087
All comments addressed. |
This PR adds the ability to start up a cluster with dedicated broker and data nodes. It fixes #1934. It also allows cluster communication to be separated from the public API via separate ports and/or interfaces. By default they all share the same port and interface.
When starting separate data and broker nodes, you now must disable the role in the config.
and/or
By default, they are both false and must be explicitly enabled. The sample config has an example w/ both enabled.
This also means that startinginfluxd
w/o a config file will currently exit w/ an error. (We may want to add additional flags to specify a role (e.g.-enable-data
,-enable-broker
))To join a cluster, you start a node and join it to any other member w/ the
-join http://host:port[,http://host:port]...
. The port should be the cluster port and the host can any node in the cluster.There are several config level changes. Some options have been removed or replaced:
Port
- The cluster port is now the default port used for cluster endpoints and API endpoints. The default is 8086.[api].Port
- The API port used for API endpoints. If not specified, 8086 is used.[broker].Enabled
- Determines whether the node runs as a broker. Default false.[data].Enabled
- Determines whether the node runs as a data node. Default false.[broker].Port
- This has been replaced w/ the cluster port[data].Port
- This has been replaced w/ the cluster port[cluster]
- This whole section was removed as it was not used. The actual values are at the root of the config.[Initialization].JoinURLs
- This section has been removed. Join URLs are specified on the command line via the-join
flag. If a node was already joined to a cluster, then these flags are ignored (and logged as ignored) to help avoid confusion.While this PR adds the ability to start a cluster w/ separate broker and data nodes, there are still several open issues and more extensive testing necessary: