Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cluster-wide restart issue. #2283

Merged
merged 2 commits into from
Apr 14, 2015
Merged

Fix cluster-wide restart issue. #2283

merged 2 commits into from
Apr 14, 2015

Conversation

benbjohnson
Copy link
Contributor

Overview

This pull request fixes some error checking and adds a delay when starting to allow the brokers to elect a leader when starting in the case where the whole cluster is restarted.

if len(joinURLs) > 0 {
joinServer(s, cmd.config.ClusterURL(), joinURLs)
return s
}

if err := s.Initialize(cmd.config.ClusterURL()); err != nil {
log.Fatalf("server initialization error: %s", err)
log.Fatalf("server initialization error(0): %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make better log messages than this. :-) E.g.

server initialization error (server ID 0) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 0 is to delineate the two places where Initialize could have returned an error. Here and L606. Don't think it's the server ID but could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you could be right. It would still be better to see clearer log messages to help the next guy. E.g. server initialization error during ..... -- I presume the software is in different states depending at the different points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file needs some rework so I just wanted to differentiate between this error message and another with the same text below (as @jwilder mentioned).

@otoolep
Copy link
Contributor

otoolep commented Apr 14, 2015

+1 from me. CHANGELOG needs to be updated too.

Issue opened regarding possible re-work of delay: #2285

@benbjohnson
Copy link
Contributor Author

CHANGELOG updated, merging.

benbjohnson added a commit that referenced this pull request Apr 14, 2015
Fix cluster-wide restart issue.
@benbjohnson benbjohnson merged commit 03d5ffc into master Apr 14, 2015
@benbjohnson benbjohnson deleted the join-fix branch April 14, 2015 20:44
@otoolep
Copy link
Contributor

otoolep commented Apr 14, 2015

Adding 1 final note here for posterity.

This issue is specifically about multi-node restart, because at restart-time the cluster already exists as a multi-node cluster. Therefore leader election requires more than 1 node to vote. This is not the case when a multi-node cluster is being created for the first time because it always starts as a single-node, and leader-election takes place synchronously with the start of the first broker.

Single-node clusters would not suffer from this restart problem either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants