Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Key_Shared subscription doesn't deliver messages in the replay queue when no new messages are produced #23845

Open
3 tasks done
lhotari opened this issue Jan 13, 2025 · 4 comments
Assignees
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@lhotari
Copy link
Member

lhotari commented Jan 13, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Pulsar 4.0.1

Minimal reproduce step

Exact steps to reproduce aren't yet confirmed.

This problem was faced in a test where there was a large number of consumers that were scaled in a way where consumers were added and removed. The problem was noticed at the end of the test case, where all messages didn't get delivered to consumers and remained in the backlog.

In the topic stats for the subscription, msgInReplay showed a positive value and in internal stats for the subscription subscriptionHavePendingRead was true. By looking at the code, it seems to be a case that isn't handled for PersistentDispatcherMultipleConsumers/PersistentStickyKeyDispatcherMultipleConsumers.

What did you expect to see?

The cursor shouldn't go into completely into "waiting" state when there are messages in the replay queue.

What did you see instead?

Messages in the replay queue don't get dispatched to consumers.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari
Copy link
Member Author

lhotari commented Jan 13, 2025

There's already a solution to cancel a pending read when a hash gets unblocked:

private void stickyKeyHashUnblocked(int stickyKeyHash) {
if (log.isDebugEnabled()) {
if (stickyKeyHash > -1) {
log.debug("[{}] Sticky key hash {} is unblocked", getName(), stickyKeyHash);
} else {
log.debug("[{}] Some sticky key hashes are unblocked", getName());
}
}
reScheduleReadWithKeySharedUnblockingInterval();
}
private void reScheduleReadWithKeySharedUnblockingInterval() {
rescheduleReadHandler.rescheduleRead();
}

There seems to be a bug in this behavior so that it didn't catch the case that was encountered. One possible reason for this is that a consumer didn't have permits when the unblocking happened. There would need to be some logic to handle that case.

@walkinggo
Copy link

It looks like this piece of code triggers a re-read here. I tried the related test cases, but no errors occurred. Could you provide more details about what you meant by "permit"? @lhotari

@lhotari
Copy link
Member Author

lhotari commented Jan 14, 2025

It looks like this piece of code triggers a re-read here. I tried the related test cases, but no errors occurred. Could you provide more details about what you meant by "permit"? @lhotari

@walkinggo You can read more about permits here: https://pulsar.apache.org/docs/4.0.x/developing-binary-protocol/#flow-control . Just to be clear, I'm not looking for contributions to address this particular issue, I've assigned it to myself and currently working it.

@walkinggo
Copy link

ok,i got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Development

No branches or pull requests

2 participants