-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport-2.0: storage: prevent unbounded raft log growth without quorum #27868
backport-2.0: storage: prevent unbounded raft log growth without quorum #27868
Conversation
Is this ready to merge? We want to get it in to 2.0.5 (which starts qualifying today). |
44d09d8
to
10a7d66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r1, 3 of 3 files at r2, 3 of 5 files at r4, 2 of 2 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained
Fixes cockroachdb#27772. This change adds safeguards to prevent cases where a raft log would grow without bound during loss of quorum scenarios. It also adds a new test that demonstrates that the raft log does not grow without bound in these cases. There are two cases that need to be handled to prevent the unbounded raft log growth observed in cockroachdb#27772. 1. When the leader proposes a command and cannot establish a quorum. In this case, we know the leader has the entry in its log, so there's no need to refresh it with `reasonTicks`. To avoid this, we no longer use `refreshTicks` as a leader. 2. When a follower proposes a command that is forwarded to the leader who cannot establish a quorum. In this case, the follower can't be sure (currently) that the leader got the proposal, so it needs to refresh using `reasonTicks`. However, the leader now detects duplicate forwarded proposals and avoids appending redundant entries to its log. It does so by maintaining a set of in-flight forwarded proposals that it has received during its term as leader. This set is reset after every leadership change. Both of these cases are tested against in the new TestLogGrowthWhenRefreshingPendingCommands. Without both of the safeguards introduced in this commit, the test fails. Release note (bug fix): Prevent unbounded growth of the raft log caused by a loss of quorum.
Release note: None
Fixes cockroachdb#27878. The test was flaky because a NodeLiveness update was getting stuck when killing a majority of nodes in a Range. The fix was to tie NodeLiveness' update context to its stopper so that liveness updates are canceled when their node begins to shut down. This was a real issue that I suspect would occasionally make nodes hang on shutdown when their liveness range was becoming unavailable. There are probably other issues in the same class of bugs, but stressing `TestLogGrowthWhenRefreshingPendingCommands` isn't showing anything else. In doing so, the change needed to extend `stopper.WithCancel` into `stopper.WithCancelOnQuiesce` and `stopper.WithCancelOnStop`. Release note: None
Before this change, `WithCancelOnQuiesce` and `WithCancelOnStop` were dangerous to use because they would never clean up memory. This meant that any use of the methods that happened more than a constant number of times would slowly leak memory. This was an issue in `client.Txn.rollback`. This change fixes the methods so that it's possible for callers to clean up after themselves. Added a warning to `Stopper.AddCloser` because this had a similar issue. Release note: None
Fixes cockroachdb#27908. Release note: None
Before this change the Stopper made no guarantee that `Closers` registered by the `AddCloser` method or `Contexts` tied to the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly cleaned up. In cases where these methods were called concurrently with the Stopper stopping, it was possible for the resources to leak. This change fixes this by handling cases where these methods are called concurrently with or after the Stopper has stopped. This allows them to provide stronger external guarantees. Release note: None
10a7d66
to
7ffc9e1
Compare
bors r+ |
Build failed |
Flake is #28228. bors r+ |
Build failed |
Same flake. Retrying one more time before taking more drastic measures. bors r+ |
27868: backport-2.0: storage: prevent unbounded raft log growth without quorum r=nvanbenschoten a=nvanbenschoten Backport 2/2 commits from #27774. /cc @cockroachdb/release --- Fixes #27772. This change adds safeguards to prevent cases where a raft log would grow without bound during loss of quorum scenarios. It also adds a new test that demonstrates that the raft log does not grow without bound in these cases. There are two cases that need to be handled to prevent the unbounded raft log growth observed in #27772. 1. When the leader proposes a command and cannot establish a quorum. In this case, we know the leader has the entry in its log, so there's no need to refresh it with `reasonTicks`. To avoid this, we no longer use `refreshTicks` as a leader. 2. When a follower proposes a command that is forwarded to the leader who cannot establish a quorum. In this case, the follower can't be sure (currently) that the leader got the proposal, so it needs to refresh using `reasonTicks`. However, the leader now detects duplicate forwarded proposals and avoids appending redundant entries to its log. It does so by maintaining a set of in-flight forwarded proposals that it has received during its term as leader. This set is reset after every leadership change. Both of these cases are tested against in the new TestLogGrowthWhenRefreshingPendingCommands. Without both of the safeguards introduced in this commit, the test fails. Release note (bug fix): Prevent loss of quorum situations from allowing unbounded growth of a Range's Raft log. 28225: release-2.0: importccl: Preserve '\r\n' during CSV import r=dt a=dt Backport 1/1 commits from #28181. /cc @cockroachdb/release --- See #25344. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: neeral <neeral@users.noreply.github.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Build succeeded |
Backport 2/2 commits from #27774.
/cc @cockroachdb/release
Fixes #27772.
This change adds safeguards to prevent cases where a raft log
would grow without bound during loss of quorum scenarios. It
also adds a new test that demonstrates that the raft log does
not grow without bound in these cases.
There are two cases that need to be handled to prevent the
unbounded raft log growth observed in #27772.
quorum. In this case, we know the leader has the entry in
its log, so there's no need to refresh it with
reasonTicks
.To avoid this, we no longer use
refreshTicks
as a leader.leader who cannot establish a quorum. In this case, the
follower can't be sure (currently) that the leader got the
proposal, so it needs to refresh using
reasonTicks
. However,the leader now detects duplicate forwarded proposals and
avoids appending redundant entries to its log. It does so
by maintaining a set of in-flight forwarded proposals that
it has received during its term as leader. This set is reset
after every leadership change.
Both of these cases are tested against in the new
TestLogGrowthWhenRefreshingPendingCommands. Without both of
the safeguards introduced in this commit, the test fails.
Release note (bug fix): Prevent loss of quorum situations from
allowing unbounded growth of a Range's Raft log.