You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replicated consumers could skip redeliveries of non-acked messages when:
- A message was delivered to the client, but there was no quorum on
updating delivered state across replicas. Then a replicated ack came in,
which would up the starting sequence, skipping redelivery of messages
below. The following code caused that issue:
```go
// Match leader logic on checking if ack is ahead of delivered.
// This could happen on a cooperative takeover with high speed deliveries.
if sseq > o.state.Delivered.Stream {
o.state.Delivered.Stream = sseq + 1
}
```
- A message was delivered to the client, but there was no quorum on
updating delivered state across replicas. Then the consumer leader steps
down, and becomes leader again. It would not reset `o.sseq` back down to
agreed state, skipping redelivery of messages below. The following code
caused that issue:
```go
// If o.sseq is greater don't update. Don't go backwards on o.sseq if leader.
if !o.isLeader() || o.sseq <= state.Delivered.Stream {
o.sseq = state.Delivered.Stream + 1
}
```
Other included commits fix various code/tests that depended on above
lines of code:
- `TestJetStreamSuperClusterConsumerDeliverNewBug` started flaking. It
would never guarantee that all replicas agreed on the same consumer
state.
- The issue lied in `o.store.SetStarting(o.sseq - 1)` always being
called, without being based on replicated state. Which meant that when
the storage directory was purged, this state would not reliably come
back. Now `o.updateSkipped(o.sseq)` is called for the very first time of
becoming leader. Ensuring all replicas agree on the initial starting
sequence, skipped ahead or not. It has also been changed to not only
skip ahead `o.sseq`, but also reflect this in the underlying stored
state.
- The test has also been made stricter, not only checking the state on
the consumer leader, but all replicas. And also checking both the
in-memory state and the replicated state being exactly what they are
supposed to be.
- `TestJetStreamBasicDeliverSubject` started failing due to a misplaced
`return` in `o.selectStartingSeqNo()`. The return is now removed.
- `TestJetStreamClusterConsumerDeliveredSyncReporting` had a small
correctness issue, as skipping ahead `o.sseq` would not be reflected in
the underlying store. Now before the first fetch we expect
stream/consumer sequence 0, after that fetch we expect stream/consumer
sequence 1, and after the last fetch we expect a consumer sequence 1,
and a skipped ahead stream sequence 11.
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
0 commit comments