-
Notifications
You must be signed in to change notification settings - Fork 74
Broker retry retries sending failed events immediately #1100
Comments
The go pubsub client we're using doesn't provide a way to directly write AckDeadline. Instead it manages the AckDeadline for us. We can directly call RPC to set the AckDeadline (e.g. like how it's done in the client) but it's hard to tell how well it can work with the client we're using (which also sets the AckDeadline dynamically). If we stick with the existing client, the only way to alternate AckDeadline seems to be 1) explicitly holding on to the messages and call For release 1, I think it's ok to proceed with the option 1 as it could provide knobs on how fast we |
Unfortunately calling As we get more sophisticated in our needs and use of pubsub, it could makes sense to use the lower level client more. The docs point to an example of using the base client for slow message processing. But looking at the complexity of the pubsub client, I think our usage would end up more complex than the example. I'm fine with option 1 for release 1. We don't have to hold on to the entire message, just the ids. The high-level client uses a map of ids and a ticker to periodically ack and nack messages, and we could do the same relatively cheaply. We could use a hierarchical timing wheel if we wanted to get sophisticated. |
I recall one of my first issues when I joined the project was doing some exponential backoff in the old pubsub channel... Some pieces might be useful... knative/eventing#775 |
@grantr I found googleapis/google-cloud-go#1942, from which I realized pubsub is adding a RetryPolicy support when creating a subscription @nachocano thanks for sharing! I think I can "steal" this code and do option 1 for the short term. |
While slow acking would be an improvement, it seems less than ideal for retry status be tracked by the client for the following reasons:
Long term we may want to consider a solution which moves retry state out of the client, such as:
|
@ian-mi (points 1,2) Not sharing state among pods is a major drawback. Point 3 is kinda "mitigated" by point 5 as we do set that limit. But we still have the problem from point 5. Point 4 seems irrelevant to this topic? In the near future, hopefully the go pubsub client will provide option to set retry policy (which is already provided by pubsub API - see my comment above) so that we don't have to manage that by ourselves. |
A pubsub client retry policy should not affect any of these issues as the state is still managed by the client. I wouldn't say that point 5 mitigates point 3. Even with a limit on the number of unacked messages per topic, we could easily end up with thousands of retained messages per retry topic will leak memory and slow the entire retry pod. Additionally the limit on unacked messages could cause a topic be starved completely if at some point gets flooded with invalid messages. |
@ian-mi I don't think it's client retry policy. It appears to be part of the subscription API. If we limit outstanding bytes and messages, in theory we can limit how much memory can be consumed (with a fixed number of subscribers). That kinda mitigates point 3. What you said now are different problems which we could anyway run into if we don't dead-letter messages. |
Ah, that's great! I took another look, you are right it is part of the subscription API although not officially documented yet. If it works the way I would expect this should address most or all of the issues I've pointed out. This support was merged yesterday so hopefully we can migrate to this API in the near future. |
Problem
A short explanation of the problem, including relevant restrictions.
Persona:
Which persona is this feature for?
Exit Criteria
A retried message delivery that fails is retried only after some static timeout.
Time Estimate (optional):
How many developer-days do you think this may take to resolve?
Additional context (optional)
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered: