Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back config join URLs #2201

Merged
merged 8 commits into from
Apr 9, 2015
Merged

Bring back config join URLs #2201

merged 8 commits into from
Apr 9, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Apr 8, 2015

This PR does the following:

  1. Brings back [Initialization].join-urls which were removed as part of Separate broker and data nodes #2175
  2. Updates the init script to allow passing options to via /etc/influxdb/defaults if desired.
  3. Fixes -join flag so it will override the config file join-urls when passed but not when the node has already joined a cluster.
  4. Add better logging in the server around what join URLs are used and not used
  5. Increases the max join attempts a node will try before giving up

@otoolep
Copy link
Contributor

otoolep commented Apr 8, 2015

Don't forget the CHANGELOG.

type Initialization struct {
// JoinURLs are cluster URLs to use when joining a node to a cluster the first time it boots. After,
// a node is joined to a cluster, these URLS are ignored. These will be overriden at runtime if
// the node is started witha `-join` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: witha -> with a

@otoolep
Copy link
Contributor

otoolep commented Apr 8, 2015

Makes sense, some minor feedback that should be addressed before commit. +1

jwilder added a commit that referenced this pull request Apr 8, 2015
jwilder added 8 commits April 8, 2015 20:49
Removing this option causes issues when deploying influxd
via configuration management. We can now define the same
set of join URLs in the config file across nodes.

This also ensures that the `-flag` option overrides the
config file setting if passed.
Command-line options can be set in /etc/default/influxdb using
the INFLUXD_OPTS env var.
3 was fairly arbitrary and would cause errors such as:

2015/04/08 14:01:12 join: failed to connect data node: {http  <nil> influxdb.local:8191   }: unable to join
2015/04/08 14:01:12 join: failed to connect data node to any specified server

in the tests.  This can happen when the nodes are slow to startup. The limit is set
arbitarily higher to avoid this error but still give up if it can't connect
after a minute.
If the node is running a broker and a data node, always have the
data node client connect to the local broker since it will already
be initialized or joined.
Make it more explicit when existing cluster state is being used
versus join URLs.  Also consolidate some duplicated `if index==0`
checks.
2015/04/08 22:27:01 no broker or server configured to handle messaging endpoints
2015/04/08 22:27:02 join: failed to connect data node: http://box296:9012: unable to join
2015/04/08 22:27:02 join: failed to connect data node to any specified server

There is a race when joining a data only node to a broker and another data only node between the
data node heartbeater and the join operation.  If the heartbeater
fire before the join attempt, it's possible for the booting data node
to be selected as the first data node for redirection by the broker.
The join attempt would request a data node endpoint on the broker "/data_nodes"
but since the broker cannot handle it, it would redirect to a valid broker.

During this race, the broker would redirect the request back to the same server.  If
this happens, the data node would get stuck and not be able to join because it's
still booting.

To work around this, the redirect is randonmized and the join calls will not attempt
to call itself and instead re-request the original URL.  A better fix might be to
not start the heartbeater until after the datanode has joined or initialized.
jwilder added a commit that referenced this pull request Apr 9, 2015
@jwilder jwilder merged commit 019110c into master Apr 9, 2015
@jwilder jwilder deleted the jw-join-urls branch April 9, 2015 02:56
mark-rushakoff pushed a commit that referenced this pull request Jan 11, 2019
Update from function example to bucket_name instead of telegraf/autogen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants