-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: step down immediately on leader transfer #130123
raft: step down immediately on leader transfer #130123
Conversation
91ce7d8
to
b7fe394
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. Talking about this in the pod meeting yesterday and then coming to review the PR made it a breeze
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 2 of 2 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/raft.go
line 372 at r3 (raw file):
// leadTransferee, if set, is the id of the leader transfer target during a // pending leadership transfer. The value is set while the outgoing leader // (this node) is catching the target up on its log. Once the target is caught
Consider adding some words about how this field is used when the leader is catching up the target. You mentioned in yesterday's pod meeting that it's to reject incoming proposals, right?
nit: maybe a [1] pattern that we use elsewhere to not distract from the main idea of the comment.
pkg/raft/raft.go
line 376 at r3 (raw file):
// it has irrevocably compromised its leadership term by giving the target
// permission to overthrow it.
ChatGPT couldn't have dramatized this better 😂
pkg/raft/raft.go
line 1760 at r3 (raw file):
// Transfer leadership to third party. r.logger.Infof("%x [term %d] starts to transfer leadership to %x", r.id, r.Term, leadTransferee) // Transfer leadership should be finished in one electionTimeout, so reset r.electionElapsed.
nit: should we expand this comment to say that if it doesn't, the outgoing leader is free to campaign at a higher term.
pkg/raft/raft.go
line 1768 at r3 (raw file):
} else { pr.MsgAppProbesPaused = false r.maybeSendAppend(leadTransferee)
Consider adding a comment here that explains how we handle the state transition when we're trying to transfer leadership to a follower that's behind on it's log. Might be worth including a pointer to MsgAppResp
where we call transferLeader
?
All of this was very easy to follow along because we talked about this yesterday and you've got a great explanation in your commit message, but if I were just reading the code, I'd think this was a bit under documented and subtle.
pkg/raft/raft.go
line 2301 at r3 (raw file):
} func (r *raft) transferLeader(to pb.PeerID) {
Should we add an assertion to say r.state == StateLeader && r.lead == r.id
?
pkg/raft/raft.go
line 2306 at r3 (raw file):
} func (r *raft) abortLeaderTransfer() {
Should we add the assertion from above here as well?
pkg/raft/testdata/confchange_v2_replace_leader.txt
line 158 at r3 (raw file):
raft-state ---- 1: StateFollower (Voter) Term:1 Lead:0
Unfortunate how the Lead: 0
diff came about due to
Lines 103 to 119 in c921401
if s.RaftState == StateFollower && s.Lead == r.id { | |
// A raft leader's term ends when it is shut down. It'll rejoin its peers as | |
// a follower when it comes back up, but its Lead and Term field may still | |
// correspond to its pre-restart leadership term. We expect this to quickly | |
// be updated when it hears from the new leader, if one was elected in its | |
// absence, or when it campaigns. | |
// | |
// The layers above raft (in particular kvserver) do not handle the case | |
// where a raft node's state is StateFollower but its lead field points to | |
// itself. They expect the Lead field to correspond to the current leader, | |
// which we know we are not. For their benefit, we overwrite the Lead field | |
// to None. | |
// | |
// TODO(arul): the layers above should not conflate Lead with current | |
// leader. Fix that and get rid of this overwrite. | |
s.HardState.Lead = None | |
} |
pkg/raft/raft_test.go
line 3201 at r4 (raw file):
} require.Equal(t, expState, r.state) require.Equal(t, uint64(3), r.Term)
Do you think it's worth adding a variant of this test where the node that receives the MsgTimeoutNow
(n1) from n2, T2
hasn't heard about n3 being leader at T3
?
b7fe394
to
aefbeff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/raft/raft.go
line 372 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Consider adding some words about how this field is used when the leader is catching up the target. You mentioned in yesterday's pod meeting that it's to reject incoming proposals, right?
nit: maybe a [1] pattern that we use elsewhere to not distract from the main idea of the comment.
Done.
pkg/raft/raft.go
line 376 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
// it has irrevocably compromised its leadership term by giving the target
// permission to overthrow it.ChatGPT couldn't have dramatized this better 😂
There's no point writing code comments if you aren't going to make them dramatic 😂
pkg/raft/raft.go
line 1760 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: should we expand this comment to say that if it doesn't, the outgoing leader is free to campaign at a higher term.
Done.
pkg/raft/raft.go
line 1768 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Consider adding a comment here that explains how we handle the state transition when we're trying to transfer leadership to a follower that's behind on it's log. Might be worth including a pointer to
MsgAppResp
where we calltransferLeader
?All of this was very easy to follow along because we talked about this yesterday and you've got a great explanation in your commit message, but if I were just reading the code, I'd think this was a bit under documented and subtle.
Done.
pkg/raft/raft.go
line 2301 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we add an assertion to say
r.state == StateLeader && r.lead == r.id
?
Done.
pkg/raft/raft.go
line 2306 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we add the assertion from above here as well?
We call this in reset
as well, so it doesn't necessarily assume that a leadership transfer is in progress. I'll leave as is for now.
pkg/raft/raft_test.go
line 3201 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Do you think it's worth adding a variant of this test where the node that receives the
MsgTimeoutNow
(n1) fromn2, T2
hasn't heard about n3 being leader atT3
?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5, 3 of 3 files at r6, 2 of 2 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/raft.go
line 2306 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We call this in
reset
as well, so it doesn't necessarily assume that a leadership transfer is in progress. I'll leave as is for now.
Want to add a TODO to add this assertion in the future? Feel free to put my name on it.
pkg/raft/raft.go
line 1773 at r6 (raw file):
} else { // If the transfer target is not initially caught up on its log, we don't // sent it a MsgTimeoutNow immediately. Instead, we eagerly try to catch
nit: s/sent/send/
This commit adds assertions on `raft.state` at the beginning of `tickElection` and `tickHeartbeat`. Epic: None Release note: None
Part of cockroachdb#129807. This commit updates the handling of raft leadership transfers to have the outgoing leader step down immediately after the MsgTimeoutNow is sent. We retain the `leadTransferee` field because there is still a "pending", reversible transfer state while the leader is catching the follower up on its log. However, once the catch up is complete, the leader now steps down, as it has irrevocably compromised its leadership term by giving the target permission to overthrow it. If the transfer fails after the MsgTimeoutNow has been sent, the leader (who has stepped down to a follower) must call a new election at a new term in order to reestablish leadership. Release note: None
This commit adds a new `TestLeaderTransferDifferentTerms`, which verifies that a MsgTimeoutNow will only be respected if it is from the current term or from a new term. Epic: None Release note: None
This commit adds a new `TestLeaderTransferStaleFollower`, which verifies that a MsgTimeoutNow received by a stale follower (a follower still at an earlier term) will cause the follower to call an election which it can not win. Epic: None Release note: None
aefbeff
to
b0f7d0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)
pkg/raft/raft.go
line 2306 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Want to add a TODO to add this assertion in the future? Feel free to put my name on it.
Done.
pkg/raft/raft.go
line 1773 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: s/sent/send/
Done.
Part of #129807.
This commit updates the handling of raft leadership transfers to have the outgoing leader step down immediately after the MsgTimeoutNow is sent. We retain the
leadTransferee
field because there is still a "pending", reversible transfer state while the leader is catching the follower up on its log. However, once the catch up is complete, the leader now steps down, as it has irrevocably compromised its leadership term by giving the target permission to overthrow it.If the transfer fails after the MsgTimeoutNow has been sent, the leader (who has stepped down to a follower) must call a new election at a new term in order to reestablish leadership.
Release note: None