Cherry-picks for 2.10.23-RC.5 #6171

neilalexander · 2024-11-25T11:33:03Z

Includes the following:

Signed-off-by: Neil Twigg neil@nats.io

Co-authored-by: Reuben Ninan <reuben@nats.io> Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>

…te request A candidate could incorrectly revert to an older term without resetting if an old AE arrived with a term that is at least newer than the pterm but not necessarily newer than the term. Also a node that is isolated for a period of time with a high term number should cause the rest of the cluster to move forward with that term number, rather than going backwards to the leader term. Co-authored-by: Reuben Ninan <reuben@nats.io> Signed-off-by: Neil Twigg <neil@nats.io>

The stepdown channel interleaves with other channels such as the apply queue, leader change notifications etc in the `runAs` goroutines in an unpredictable order, so processing a stepdown request might be delayed behind other work. Doing this inline should be safer with stronger guarantees. Signed-off-by: Neil Twigg <neil@nats.io>

Beforehand when we were trying to run a catchup, we were reverting the `term` back to `pterm`. We can't ever move the term backwards safely and the catchup itself does not rely on this behaviour in order to work (as the catchup entries are matched only on `pindex`), so don't revert it. Signed-off-by: Neil Twigg <neil@nats.io>

Log consistency should only be enforced when handling append entries, not in other types of RPC. In this case, the higher term could cause us to blow away our entire log if a node with a higher term but a more out-of-date log comes along as leader. Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>

Otherwise we may end up in a situation where we are dropping append entries from a leader that is obviously behind, but we cannot help to bring the cluster forward to make progress. Signed-off-by: Neil Twigg <neil@nats.io>

Candidates should not be stepping down based on their pterm but rather their actual term. Signed-off-by: reubenninan <reuben@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Derek Collison <derek@nats.io>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Antithesis testing has found that late or out-of-order delivery of these snapshots, likely due to latency or thread pauses, can cause stream assignments to be reverted which results in assets being deleted and recreated. There may also be a race condition where the metalayer comes up before network connectivity to all other nodes is fully established so we may end up generating snapshots that don't include assets we don't know about yet. We will want to audit all uses of `SendSnapshot` as it somewhat breaks the consistency model, especially now that we have fixed a significant number of Raft bugs that `SendSnapshot` usage may have been papering over. Further Antithesis runs without this code run fine and have eliminated a number of unexpected calls to `processStreamRemoval`. We've also added a new unit test `TestJetStreamClusterHardKillAfterStreamAdd` for a long-known issue, as well as a couple tweaks to the ghost consumer tests to make them reliable. Signed-off-by: Neil Twigg <neil@nats.io> --------- Signed-off-by: Neil Twigg <neil@nats.io> Signed-off-by: Maurice van Veen <github@mauricevanveen.com> Co-authored-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Neil Twigg <neil@nats.io> Co-authored-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Neil Twigg <neil@nats.io>

… at `seq=ae.pindex+1` (#5987) This PR makes three complementary fixes to the way how catchup and truncating is handled. Specifically: - when doing `n.loadEntry(index)` we need to pass where the AppendEntry is in terms of stream sequence, this is equal to `ae.pindex+1` since the `ae.pindex` is the value before it's stored in the stream. - start catchup from `n.commit`, we could have messages past our commit that have been invalidated and need to be truncated since there was a switch between leaders - because we catchup from `n.commit`, we check if our local AppendEntry matches terms with the incoming AppendEntry, we only need to truncate if the terms don't match Signed-off-by: Maurice van Veen <github@mauricevanveen.com> --------- Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

…mmitted Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

If we haven't recovered the `pterm`/`pindex` from a snapshot, and the WAL is empty, then we shouldn't tell other nodes that we have a log that we don't have (i.e. by vote request). Instead leave both `pterm` and `pindex` as zero to correctly signal that we don't know anything about the state of the log. Signed-off-by: Neil Twigg <neil@nats.io>

…g shutdown Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

…sequent catchup messages Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

This should fix some logical races where multiple sets of goroutines can fight over the same store directory, i.e. when shutting down and recreating a group rapidly. Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Neil Twigg <neil@nats.io>

…g shutdown Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

derekcollison

LGTM

neilalexander and others added 30 commits November 22, 2024 17:01

NRG: Ignore AEs from older terms

785321b

Co-authored-by: Reuben Ninan <reuben@nats.io> Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Ensure proposal and AE response queues drain after stepdown

ddb3ad3

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: De-flake TestNRGSwitchStateClearsQueues

9e1d1c3

Signed-off-by: Neil Twigg <neil@nats.io>

De-flake TestNRGSwitchStateClearsQueues

d6f1f9c

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Send AE response when term is lower than ours

541e5b1

Otherwise we may end up in a situation where we are dropping append entries from a leader that is obviously behind, but we cannot help to bring the cluster forward to make progress. Signed-off-by: Neil Twigg <neil@nats.io>

Fix candidate stepdown logic

86b0cae

Candidates should not be stepping down based on their pterm but rather their actual term. Signed-off-by: reubenninan <reuben@nats.io>

De-flake TestNRGSwitchStateClearsQueues

85ad22d

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Don't revert pterm to beginning of log when installing snapshots

1929e30

Signed-off-by: Neil Twigg <neil@nats.io>

Fixed deadlock when removing a peer that happened to be the leader.

85236e2

Signed-off-by: Derek Collison <derek@nats.io>

Fix drift in WAL, truncate AppendEntry without quorum

947b7c5

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Do not revert term on truncate WAL

71e9e87

Signed-off-by: Neil Twigg <neil@nats.io> Co-authored-by: Maurice van Veen <github@mauricevanveen.com>

Fix data race in TestNRGTermDoesntRollBackToPtermOnCatchup

f55f34e

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Revert implementation from #5987

7ec99f3

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Correct pterm if mismatched & don't truncate what was already co…

a5b7a4f

…mmitted Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Add tests for correcting pterm with committed entries

2dfb1d4

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

[FIXED] Catchup must not extend past requested sequence range

6e36c52

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

[FIXED] Don't replace leader's snapshot during shutdown

aa643dd

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Always write term/vote and peer state synchronously

2681ce0

Signed-off-by: Neil Twigg <neil@nats.io>

[FIXED] Don't remove snapshot if truncate to applied

ff22481

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Don't switch to candidate when waiting for pending applies

e40112e

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Don't delete RAFT state if stream/consumer creation failed durin…

fa5128d

…g shutdown Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Vote request cancels catchup, new leader could have rejected sub…

a3b9178

…sequent catchup messages Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

NRG: Wait for goroutines to shutdown when recreating group

a64c5bd

This should fix some logical races where multiple sets of goroutines can fight over the same store directory, i.e. when shutting down and recreating a group rapidly. Signed-off-by: Neil Twigg <neil@nats.io>

neilalexander and others added 6 commits November 25, 2024 10:21

NRG: Update group peers if mismatched

cc204b0

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Refactor shutdown, update switchState to CAS

a838553

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Use correct sequence when truncating to previous pterm/pindex

9266be8

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

De-flake TestNRGCandidateDontStepdownDueToLeaderOfPreviousTerm

9e4c421

Signed-off-by: Neil Twigg <neil@nats.io>

De-flake TestNRGSimpleElection

7c0bec1

Signed-off-by: Neil Twigg <neil@nats.io>

NRG: Don't delete RAFT state if stream/consumer creation failed durin…

4606175

…g shutdown Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

neilalexander marked this pull request as ready for review November 25, 2024 12:04

neilalexander requested a review from a team as a code owner November 25, 2024 12:04

derekcollison approved these changes Nov 25, 2024

View reviewed changes

neilalexander merged commit 24acb45 into release/v2.10.23 Nov 25, 2024
5 checks passed

neilalexander deleted the neil/21023rc5 branch November 25, 2024 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-picks for 2.10.23-RC.5 #6171

Cherry-picks for 2.10.23-RC.5 #6171

neilalexander commented Nov 25, 2024 •

edited

Loading

derekcollison left a comment

Cherry-picks for 2.10.23-RC.5 #6171

Cherry-picks for 2.10.23-RC.5 #6171

Conversation

neilalexander commented Nov 25, 2024 • edited Loading

derekcollison left a comment

Choose a reason for hiding this comment

neilalexander commented Nov 25, 2024 •

edited

Loading