-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix raft node getting stuck in candidate state #2418
Conversation
When running multiple nodes locally w/ raft tracing, it's very difficult to determine which node is logging. This adds the nodes state(leader,follower,candiate) and id to all the log messages so we can tarce the nodes states more easily.
} | ||
} | ||
|
||
func (l *Log) printf(msg string, v ...interface{}) { | ||
l.Logger.Printf(fmt.Sprintf("%s[%d]: ", l.state, l.id)+msg+"\n", v...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A worthy change. I was doing something similar when debugging myself.
During an election, a node can sometimes get stuck in candidate state causing it to never read from the new leader. This would prevent it from incrementing it's index and staying consisistent w/ the leader.
+1 on green build. |
Fix raft node getting stuck in candidate state
@benbjohnson -- please double-check this change. |
This may have been fixed by PR #2418.
l.leaderID = hb.leaderID | ||
l.unlock() | ||
return Follower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this demotion occur from the message on the l.terms
channel received below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbjohnson -- yes, that looks correct. There should be no need to return follower
here, since the case
statement below should be triggered by the signal sent by mustSetTermIfHigher
.
This PR fixes an issue where a raft peer gets stuck in candidate state and never increments it's index. This happens after an election and somewhat sporadically. The root issue appears to be that the node in candidate state should return to follower state if it starts receiving heartbeats from a new leader. The node was not returning to follower state causing it to become inconsistent w/ the cluster.
In addition to this fix, when running multiple nodes locally w/ raft tracing, it's very difficult
to determine which node is logging. This adds the nodes state(leader,follower,candidate) and
id to all the log messages so we can trace the nodes states more easily.
Also fixes test output alignment when there is a failure in an integration test.