-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pulsar-broker] Allow broker to discover and unblock stuck subscription #9789
Conversation
@@ -120,11 +120,13 @@ public int hashCode() { | |||
|
|||
@Override | |||
public boolean equals(Object obj) { | |||
if (obj == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null scenario is already covered by the instanceof
check
# Broker periodically checks if subscription is stuck and unblock if flag is enabled. (Default is disabled) | ||
unblockStuckSubscriptionEnabled=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description of the flag, the broker will periodically check if stuck and unblock, Should we inform users what is the default frequency and how to change the frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check depends on the rate and this check is performed in the same stats-update task so, this feature doesn't require additional configuration.
This could be some sort of locked up state which is prevented by @merlimat 's change #9787 . @rdhabalia do you have some environment where you could check if #9787 helps to fix the issue you are seeing? |
@lhotari this issue is a more functional issue (mostly related expiry-check and regression bug which we saw in past with key-shared sub) and it doesn't happen due to deadlock (validated thread-dump) or slower processing because sub gets stuck until one manually unload the topic. so, #9787 won't exactly address this issue. |
/pulsarbot run-failure-checks |
It seems like this is somewhat of a bandaid. How do we find the root cause of why subscriptions are getting stuck? This seems like it might be related to #6054 |
@rdhabalia Thanks for doing this work. It looks like it's going to help a lot with cases where the subscriptions stop receiving messages. |
I realized that this won't help when negative permits are occurring... So, there's still more work to do to unblock stuck subscriptions. |
…on (#9789) We have been frequently seeing issue where subscription gets stuck on different topics and broker is not dispatching messages though consumer has available-permits and no pending reads (example #9788). It can happen due to regression bug or unknown issue when expiry runs.. one of the workarounds is manually unload the topic and reload it which is not feasible if this happens frequently to many topics. Or broker should have the capability to discover such stuck subscriptions and unblock them. Below example shows that: subscription has available-permit>0, there is no pending reads, cursor's read-position is not moving forward and that builds the backlog until we unload the topic. It happens frequently due to unknown reason: ``` STATS-INTERNAL: "sub1" : { "markDeletePosition" : "11111111:15520", "readPosition" : "11111111:15521", "waitingReadOp" : false, "pendingReadOps" : 0, "messagesConsumedCounter" : 115521, "cursorLedger" : 585099247, "cursorLedgerLastEntry" : 597, "individuallyDeletedMessages" : "[]", "lastLedgerSwitchTimestamp" : "2021-02-25T19:55:50.357Z", "state" : "Open", "numberOfEntriesSinceFirstNotAckedMessage" : 1, "totalNonContiguousDeletedMessagesRange" : 0, STATS: "sub1" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "msgBacklog" : 30350, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 0, "type" : "Shared", "msgRateExpired" : 0.0, "consumers" : [ { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "consumerName" : "C1", "availablePermits" : 723, "unackedMessages" : 0, "blockedConsumerOnUnackedMsgs" : false, "metadata" : { }, "connectedSince" : "2021-02-25T19:55:50.358285Z", ``` ![image](https://user-images.githubusercontent.com/2898254/109894631-ab62d980-7c42-11eb-8dcc-a1a5f4f5d14e.png) Add capability in broker to periodically check if subscription is stuck and unblock it if needed. This check is controlled by flag and for initial release it can be disabled by default (and we can enable by default in later release) It helps broker to handle stuck subscription and logs the message for later debugging.
Motivation
We have been frequently seeing issue where subscription gets stuck on different topics and broker is not dispatching messages though consumer has available-permits and no pending reads (example #9788). It can happen due to regression bug or unknown issue when expiry runs.. one of the workarounds is manually unload the topic and reload it which is not feasible if this happens frequently to many topics. Or broker should have the capability to discover such stuck subscriptions and unblock them.
Below example shows that:
subscription has available-permit>0, there is no pending reads, cursor's read-position is not moving forward and that builds the backlog until we unload the topic. It happens frequently due to unknown reason:
Modification
Add capability in broker to periodically check if subscription is stuck and unblock it if needed. This check is controlled by flag and for initial release it can be disabled by default (and we can enable by default in later release)
Result
It helps broker to handle stuck subscription and logs the message for later debugging.