-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Consumer Stops Receiving Messages with Large Backlogs Post-Processing #22435
Comments
I confirm bumping the client to v3.2.2 doesn't fix the issue. I also confirm using a |
I confirm that I can't reproduce the issue using v3.2.1 brokers. |
Trying to understand the logic of #22191 and the changes regarding "backlogged" cursors before that, PRs #19343, #6766, #4066, #162 . The change made in #4066 to inactive cursors in the checkBackloggedCursor method seems suspicious, and might be the reason why a hack such as #9789 was needed. In any case, it seems that #22191 should be reverted as the first step. However, the proper fix seems to be to sort out various issues in this area. |
One of the problems is a possible race condition here: pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Lines 977 to 979 in 902728e
In ManagedLedger, tasks are executed on 2 threads: the executor thread and the scheduler thread. Reminds me of this old comment on an experimental PR: https://github.com/apache/pulsar/pull/11387/files#r693112234 |
If revert, the OOM issue will be there. |
Great work on the fix #22454, @Technoboy- . It looks like removing #22191 changes to |
Search before asking
Read release policy
Version
3.2.2
Minimal reproduce step
Open consumer on a subscription using very large backlog by passing "old" MessageId.
What did you expect to see?
Consume all messages and wait for upcoming messages when the backlog is consumed.
What did you see instead?
Note that the following paragraph mentions a
NonDurable
consumer.Initially, everything functions correctly—the process begins with a backlog of (let's say) 100,000 messages, which gradually decreases to 0 as we approach the present ("now"). However, once the backlog is fully processed, the consumer unexpectedly stops receiving new messages, leading to an increase in the backlog again. The consumer is still connected. This behavior is consistently reproducible with my topics that have a substantial amount of data.
I've also noticed that when the starting point (since) is relatively close to the current time (now), this problem does not occur.
Anything else?
We do not know if this bug was introduced by v3.2.2. We didn't see it before. We are currently rollbacking brokers to 3.2.1 to confirm this.
Otherwise, it may be related to #22191.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: