-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cluster-wide restart issue. #2283
Conversation
if len(joinURLs) > 0 { | ||
joinServer(s, cmd.config.ClusterURL(), joinURLs) | ||
return s | ||
} | ||
|
||
if err := s.Initialize(cmd.config.ClusterURL()); err != nil { | ||
log.Fatalf("server initialization error: %s", err) | ||
log.Fatalf("server initialization error(0): %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make better log messages than this. :-) E.g.
server initialization error (server ID 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 0
is to delineate the two places where Initialize
could have returned an error. Here and L606. Don't think it's the server ID but could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you could be right. It would still be better to see clearer log messages to help the next guy. E.g. server initialization error during .....
-- I presume the software is in different states depending at the different points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole file needs some rework so I just wanted to differentiate between this error message and another with the same text below (as @jwilder mentioned).
+1 from me. CHANGELOG needs to be updated too. Issue opened regarding possible re-work of delay: #2285 |
CHANGELOG updated, merging. |
Fix cluster-wide restart issue.
Adding 1 final note here for posterity. This issue is specifically about multi-node restart, because at restart-time the cluster already exists as a multi-node cluster. Therefore leader election requires more than 1 node to vote. This is not the case when a multi-node cluster is being created for the first time because it always starts as a single-node, and leader-election takes place synchronously with the start of the first broker. Single-node clusters would not suffer from this restart problem either. |
Overview
This pull request fixes some error checking and adds a delay when starting to allow the brokers to elect a leader when starting in the case where the whole cluster is restarted.