PublishEventAsync Blocking When Delivering Locally? #1280

christophwille · 2024-05-01T15:11:08Z

I condensed a larger solution into a minimal sample here:

https://github.com/christophwille/poc-oh/blob/main/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs

Mostly bog-standard dapr pubsub. However, the subscriber sits in the same process as the publisher. I connected Redis pubsub. Now here is the part that I don't get: when PublishEventAsync is called, it blocks while the subscriber is running (even with retries!). This is illustrated by the breakpoints I added (they really execute serially in that order).

My expectation was that PublishEventAsync sends the message & returns immediately. And then processing will start with maybe a small delay - but definitely not blocking all the way in the call chain. Am I doing something wrong, is my view how it should work wrong, ...?

As-is this kills the implementation of an outbox (custom, where sensitive data is retained in the database and only the id is pubsub'd) where a State is transitioned from Pending to Delivered right after the call to PublishEventAsync, because the consumers always runs before there is any chance to write the state change to the database.

Should that go into dapr/dapr? Because looks like underlying behavior.

philliphoff · 2024-05-01T16:03:49Z

My understanding is that PublishEventAsync() will complete once the Dapr sidecar has successfully delivered the message to the underlying pub-sub component. Delivery to the ultimate subscriber of the message by that component, whether directly or again through the Dapr sidecar is a disconnected and independenet task. I'd be careful about judging timing based on breakpoints; the debugger adds a significant amount of latency and can greatly alter the apparent flow of asynchronous tasks, especially if those tasks are inherently short lived (e.g. because everything's running locally so publish and delivery happen very quickly).

Even without debugger latency, given the parallelism of publish and delivery, it's entirely possible in the normal course of things that breakpoint #2 is hit before breakpoint #3, as processing of HTTP incoming requests can happen at the same time as processing of outgoing messages wraps up.

christophwille · 2024-05-01T16:14:09Z

Our real-life application had instrumentation that showed exactly the behavior that the breakpoints exhibit - including the retries.

We had a loop delivering multiple events after each other, and even that was serial. Yes, the sample might seem simple, but the behavior is "PublishEventAsync delivering to Redis, calling the sub endpoint, and only then returning".

philliphoff · 2024-05-01T16:40:23Z

Do you see this same behavior with any other pub-sub component? I'm curious as to whether it might be a Redis-specific phenomenon.

christophwille · 2024-05-01T17:31:43Z

No, I haven't tried anything other than Redis yet (we started seeing the issue on local developer boxes).

christophwille · 2024-05-01T18:27:33Z

I modified the sample to write logs (https://github.com/christophwille/poc-oh/blob/f94ae80bed3e46d4ce46b478cd2dcd1b32744d93/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs#L36) - and I am running it without a debugger:

  enqueue: entering 06:22:54.567 PM
  sub: entering 06:22:54.705 PM
  enqueue: after PublishEventAsync 06:22:54.713 PM
  sub: leaving 06:23:04.714 PM

Now that I actually have async code in the subscriber (even if it is a Task.Delay only), it yields. Still, the subscriber is entered before PublishEventAsync comes back. That is very strange to say the least.

christophwille · 2024-05-02T09:10:05Z

Ok, one more thing came to my mind - given that dapr turns around immediately and calls the endpoint, how on earth is that going to scale? The intent was to have one Hangfire job that publishes all messages that are backed up (Hangfire allows single-instance jobs across multiple container instances) and then consumed in scaled-out fashion. Now that the message comes in immediately on the same instance, would a rejection on first processing mean it would come in at the same instance again on retry or would then there be a distribution across all instances?

christophwille · 2024-05-02T10:59:21Z

I gave it a try with ServiceBus:

  enqueue: entering 10:55:37.896 AM
  enqueue: after PublishEventAsync 10:55:38.955 AM
  sub: entering 10:55:38.968 AM
  sub: leaving 10:55:48.972 AM

Thus it seems to be specific to Redis that entering the subscriber is quicker than PublishEventAsync returning. And the effect of it "looking like" being "publish+call subscriber" in a single step.

Maybe I wouldn't have stumbled onto this strange ordering if I had done the usual "publish event, delete row" instead of "publish event, update row status".

christophwille closed this as completed May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PublishEventAsync Blocking When Delivering Locally? #1280

PublishEventAsync Blocking When Delivering Locally? #1280

christophwille commented May 1, 2024

philliphoff commented May 1, 2024

christophwille commented May 1, 2024 •

edited

Loading

philliphoff commented May 1, 2024

christophwille commented May 1, 2024

christophwille commented May 1, 2024

christophwille commented May 2, 2024

christophwille commented May 2, 2024 •

edited

Loading

PublishEventAsync Blocking When Delivering Locally? #1280

PublishEventAsync Blocking When Delivering Locally? #1280

Comments

christophwille commented May 1, 2024

philliphoff commented May 1, 2024

christophwille commented May 1, 2024 • edited Loading

philliphoff commented May 1, 2024

christophwille commented May 1, 2024

christophwille commented May 1, 2024

christophwille commented May 2, 2024

christophwille commented May 2, 2024 • edited Loading

christophwille commented May 1, 2024 •

edited

Loading

christophwille commented May 2, 2024 •

edited

Loading