Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pubsub acks are unsuccessful #3567

Closed
stetra opened this issue Aug 16, 2018 · 11 comments
Closed

Pubsub acks are unsuccessful #3567

stetra opened this issue Aug 16, 2018 · 11 comments
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.

Comments

@stetra
Copy link

stetra commented Aug 16, 2018

When I use the streaming pull subscriber through the MessageReceiver interface, it seems acks usually expire, and even if they don't expire, the number of backlogged messages never changes (num_undelivered_messages in Stackdriver). This only seems to happen if messages take a while to be processed, i.e., more than a few minutes.

It's strange that acks are expired in the first place because I do see that the client is sending modify ack deadline requests (mod_ack_deadline_message_operation_count in Stackdriver). The subscription itself has an ack deadline of 10 minutes so these messages are processed well within the deadline.

Even when acks are successful (i.e., non-expired), num_undelivered_messages is unaffected in Stackdriver. I do see the previously acked messages being redelivered as well.

Another thing I noticed that even though the client is performing streaming pulls, in Stackdriver I see metric data coming in for pull_ack_message_operation_count but none for streaming_pull_ack_message_operation_count. Similarly I see results for mod_ack_deadline_message_operation_count but not for streaming_pull_mod_ack_deadline_message_operation_count. Is this expected?

I am using version 1.37.1 of the google-cloud-pubsub library. I have uploaded my code here: https://gist.github.com/stetra/d757fed41cb67d4a73dd7487ca4d452e

The following Stackdriver graphs show that successful acks are apparently happening, yet the backlog is not decreasing.
acks
num_undelivered_messages
modacks

In summary, these are my questions:

  1. Despite successful acks occurring, why is the number of unacknowledged messages not decreasing?

  2. Why are expired acks occurring in the first place?

  3. Why is the client performing streaming pulls with non-streaming acks and modacks?

@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Aug 17, 2018
@pongad pongad self-assigned this Aug 18, 2018
@pongad pongad added type: question Request for information or clarification. Not an issue. api: pubsub Issues related to the Pub/Sub API. and removed triage me I really want to be triaged. labels Aug 18, 2018
@pongad
Copy link
Contributor

pongad commented Aug 18, 2018

Why is the client performing streaming pulls with non-streaming acks and modacks?

gRPC bidirectional streaming is not as straightforward as one might think. Here are some details.

  • To support very high-performance subscriptions, pubsub streaming pull keeps sending messages until gRPC's transport level flow control kicks in and tells it to stop.
  • gRPC has buffering. If the client library doesn't take messages from the buffer, gRPC keeps receiving messages from server and put in buffer until it's full before flow control kicks in.
    • Note that getting rid of buffering on the client doesn't make this point go away. There's probably also buffering from proxies/firewalls/etc between server and client.
  • If a stream closes due to the server hanging up (it happens every so often), the server sends an "EOF message"[1] that acts like pubsub messages we send and receive. The EOF message is just not visible to the application. This means the EOF message itself can be buffered.
  • If the application works through the buffered messages relatively slowly, we don't realize that the server is going away until we eventually see the EOF message. If we send modacks and acks to the stream during, the server is not there to see them. In effect, the modacks and acks won't do anything until we see the EOF message and establish a new connection. This might take a while.
  • Hence we send non-streaming acks and modacks instead. These are more likely to land on a working server.

[1] I'm not an expert on gRPC implementation, but this is accurate to the best of my understanding.

The above problem mostly affects long-running processes. The best fix for this is probably to use polling pull, we plan to bring it back.

Despite successful acks occurring, why is the number of unacknowledged messages not decreasing?
Why are expired acks occurring in the first place?

These two questions are related. "Expired acks" shouldn't exist. AFAIK, acks should always "work". Even if the message is expired and redelivered, an ack should still stop pubsub from sending it out in the future. Could you let us know what date and time (the graph says 10AM, but I'm not sure which timezone) you're seeing this and what your subscription name is? I'll investigate with pubsub team to see what's going on.

@stetra
Copy link
Author

stetra commented Aug 20, 2018

Thanks for the reply. I did not realize that expired acks behaved that way. Is there documentation for those metric statuses in Stackdriver? E.g., for pull_ack_message_operation_count there's expired and success, but perhaps there's more statuses that I'm not aware of.

The above screenshots are from August 16 around 10 AM Pacific Time, but the issue is still happening at the moment (August 20, ~2:45 PM Pacific Time). The subscription is d-steven-long-sub on project lofty-outcome-860.

Screenshots:
num_undelivered_messages2
acks2

Please let me know if you need anything else.

@pongad
Copy link
Contributor

pongad commented Aug 20, 2018

@stetra I contacted the pubsub team with your info. I'll keep you posted.

@kimkyung-goog
Copy link

Hi @stetra, I am trying to debug this issue.
When you observe this behavior, what was the rate of publishing messages? how many messages per second were being published?
Also, do you observe this issue whenever you run your Repro.java, or does it happen only occasionally?

@anilmuppalla
Copy link

@pongad you mentioned bringing polling pull back, do you have an ETA on that?

@pongad
Copy link
Contributor

pongad commented Sep 15, 2018

I don't want to promise anything right now, but I'll try to work on it over the next couple of weeks if my availability opens up.

@matthewrj
Copy link

I am seeing similar behaviour where I get a lot of expired responses to acks and the number of undelivered messages stops going down. It seems to happen in 5-20 minute intervals. I am using the java client library version 1.45.0.

My subscriber is created as follows:

Subscriber
  .newBuilder(subscriptionName, receiver)
  .setSystemExecutorProvider(
    FixedExecutorProvider.create(Executors.newScheduledThreadPool(8)))
  .setExecutorProvider(FixedExecutorProvider.create(
    Executors.newScheduledThreadPool(128)))
  .setFlowControlSettings(
    FlowControlSettings
      .newBuilder()
      .setMaxOutstandingElementCount(128)
      .build())
  .build()

expired_acks
undelivered_messages

@pongad
Copy link
Contributor

pongad commented Sep 28, 2018

#3743 might help. Let's see where that gets us first.

@matthewrj
Copy link

The processing of messages is also slow in our case. Handing the processing off to another thread pool and keeping the message receiver lightweight appears to have fixed the issue.

@QuestionAndAnswer
Copy link

I'm also seeing such behavior that @stetra faced, but now on node.js environment. Therefore it may be not related to exactly java implementation of the library.

Here is the link on the question on SO, in case if this might help.

https://stackoverflow.com/questions/54597310/google-cloud-pubsub-not-ack-messages

@ajaaym
Copy link
Contributor

ajaaym commented May 3, 2019

@stetra This issue will be resolved once we bring back polling pull. Closing this issue and we will track in #3500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

8 participants