Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate broker and data nodes #2175

Merged
merged 33 commits into from
Apr 7, 2015
Merged

Separate broker and data nodes #2175

merged 33 commits into from
Apr 7, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Apr 6, 2015

This PR adds the ability to start up a cluster with dedicated broker and data nodes. It fixes #1934. It also allows cluster communication to be separated from the public API via separate ports and/or interfaces. By default they all share the same port and interface.

When starting separate data and broker nodes, you now must disable the role in the config.

# Broker only
[data]
enabled = false

and/or

# Data only
[broker]
enabled = false

By default, they are both false and must be explicitly enabled. The sample config has an example w/ both enabled. This also means that starting influxd w/o a config file will currently exit w/ an error. (We may want to add additional flags to specify a role (e.g. -enable-data, -enable-broker))

To join a cluster, you start a node and join it to any other member w/ the -join http://host:port[,http://host:port].... The port should be the cluster port and the host can any node in the cluster.

There are several config level changes. Some options have been removed or replaced:

  • Added - Port - The cluster port is now the default port used for cluster endpoints and API endpoints. The default is 8086.
  • Added - [api].Port - The API port used for API endpoints. If not specified, 8086 is used.
  • Added - [broker].Enabled - Determines whether the node runs as a broker. Default false.
  • Added - [data].Enabled - Determines whether the node runs as a data node. Default false.
  • Removed - [broker].Port - This has been replaced w/ the cluster port
  • Removed - [data].Port - This has been replaced w/ the cluster port
  • Removed - [cluster] - This whole section was removed as it was not used. The actual values are at the root of the config.
  • Removed - [Initialization].JoinURLs - This section has been removed. Join URLs are specified on the command line via the -join flag. If a node was already joined to a cluster, then these flags are ignored (and logged as ignored) to help avoid confusion.

While this PR adds the ability to start a cluster w/ separate broker and data nodes, there are still several open issues and more extensive testing necessary:

}

// Snapshot represents the configuration for a snapshot service
type Snapshot struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be worth moving this block into data, somewhat like Retention since snapshotting can only be performed on data nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best as is, but it would be worth commenting the file, letting the user know this only applies to data nodes.

@jwilder jwilder force-pushed the data-broker-1934 branch from 4845a5f to 4395172 Compare April 6, 2015 20:06
return fmt.Sprintf("%s:%d", c.BindAddress, c.Broker.Port)
// ClusterAddr returns the binding address for the cluster
func (c *Config) ClusterAddr() string {
return fmt.Sprintf("%s:%d", c.BindAddress, c.Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should standardize and use net.JoinHostPort() here too.

@jwilder jwilder force-pushed the data-broker-1934 branch 3 times, most recently from eca5845 to 5b7efb0 Compare April 6, 2015 21:46
@@ -72,9 +72,15 @@ func main() {
// Extract name from args.
switch cmd {
case "run":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply write case "run", "":

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args are slightly different so still need to check the first is run or not. Using case "run", "" causes a panic when starting influx w/o any args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I see. Still seems a pity to replicate that code right there. How about:

case "", "run":
   if cmd == "run" {
       args = args[1:]
   }
   cmd := NewRunCommand()
    if err := cmd.Run(args...); err != nil {
        log.Fatalf("run: %s", err)
    }
}

Might not be worth it.

@otoolep
Copy link
Contributor

otoolep commented Apr 6, 2015

OK, took a first pass. Makes sense at a high level, but I think the construction is off. The fact that RunCommand calls Open on itself from within Run is a red flag. I think the RunCommand should create a node -- but do the minimum amount possible to that node -- perhaps allocate the server and broker objects depending on the config passed in. Also construct the handlers.

But I think too much is being done in RunCommand. I think we need to create a proper top-level Node object, which encapsulates the stuff it's doing.

if clusterID := b.Broker.ClusterID(); clusterID != 0 {
go s.StartReportingLoop(clusterID)
if !cmd.config.ReportingDisabled {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blank line looks superfluous.

@toddboom
Copy link
Contributor

toddboom commented Apr 6, 2015

👍 on this once we can get the suite green

@@ -288,7 +398,7 @@ func openBroker(path string, u url.URL, initializing bool, joinURLs []url.URL, r
}
log.Printf("broker opened at %s", path)

// Attach the broker as the finite state machine of the raft log.
// Attach the broker as the f inite state machine of the raft log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

jwilder added 18 commits April 6, 2015 16:38
Simplified signature and state now depends on indexes vs directory
existence
Adds a simple test to start a separate broker and
data node.  Data nodes still need a separate set
join URLs which is not in place yet.
To add a new data node, it currently needs a broker
and another data node to join.  Temporarily adding
a JoinURLs option to the Data node section so a
standalone data node can be created but the intent is
that this will be removed.

Ideally, the the joinURL could point to either a data node
or a broker and it would get the required URLs from that
host but that is not possible currently.
How a cluster is setup has changed and this test is failing w/
panic: assert failed: invalid initial server id: 2 [recovered]

There is an existing multi-node test w/ a broker and two data
nodes so we're still covering this case and will need to come
back to it.
This is the port that all cluster communication will take place
over.  It will replace the separate data and broker ports.
This removes all join URLs from the config.  To join a node to a
cluster, the URL of another member of the cluster should be passed
on the command line w/ the -join flag.  The join URLs can now be
any node regardless of whether the node is a broker only or data
only node.  At join time, the receiving node will redirect the
request to a valid broker or data node if it cannot handle the request
itself.
When starting multiple servers concurrently, they can race to connect
to each other.  This change just has the join attempts retry to make
cluster setup easier.
If a node is restarted and it had already joined the cluster,
ignore and log that the join urls are being ignored and existing
cluster state will be used.
Removed unused items and add new ones
Server is very overloaded currently so use Node to represent the
container that holds onto a broker and data node (server currently)
@jwilder jwilder force-pushed the data-broker-1934 branch from 5b7efb0 to 01c2de6 Compare April 6, 2015 22:38
jwilder added 3 commits April 6, 2015 21:16
applySetTopicMaxIndex() was updating the topics.indexByUrl w/o locking it.

WARNING: DATA RACE
Write by goroutine 1365:
  runtime.mapassign1()
      /usr/local/go/src/runtime/hashmap.go:376 +0x0
  github.com/influxdb/influxdb/messaging.(*Broker).applySetTopicMaxIndex()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:496 +0x198
  github.com/influxdb/influxdb/messaging.(*Broker).Apply()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:542 +0x33a
  github.com/influxdb/influxdb.(*Broker).Apply()
      <autogenerated>:1 +0x78
  github.com/influxdb/influxdb/messaging.(*RaftFSM).Apply()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:614 +0x24f
  github.com/influxdb/influxdb/raft.(*Log).applyNextUnappliedEntry()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1431 +0x75c
  github.com/influxdb/influxdb/raft.(*Log).applier()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/raft/log.go:1369 +0x18f

Previous read by goroutine 1540:
  runtime.mapiterinit()
      /usr/local/go/src/runtime/hashmap.go:535 +0x0
  github.com/influxdb/influxdb/messaging.(*Topic).DataURLs()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/messaging/broker.go:681 +0x11d
  github.com/influxdb/influxdb/cmd/influxd.(*Handler).serveMetadata()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:95 +0x3fd
  github.com/influxdb/influxdb/cmd/influxd.(*Handler).ServeHTTP()
      /home/ubuntu/.go_project/src/github.com/influxdb/influxdb/cmd/influxd/handler.go:45 +0x540
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:1703 +0x1f6
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1204 +0x1087
@jwilder
Copy link
Contributor Author

jwilder commented Apr 7, 2015

All comments addressed.

pauldix added a commit that referenced this pull request Apr 7, 2015
@pauldix pauldix merged commit a72707b into master Apr 7, 2015
@jwilder jwilder deleted the data-broker-1934 branch April 7, 2015 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should be able to have clusters with dedicated brokers and data nodes
5 participants