Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PublishEventAsync Blocking When Delivering Locally? #1280

Closed
christophwille opened this issue May 1, 2024 · 7 comments
Closed

PublishEventAsync Blocking When Delivering Locally? #1280

christophwille opened this issue May 1, 2024 · 7 comments

Comments

@christophwille
Copy link

I condensed a larger solution into a minimal sample here:

https://github.com/christophwille/poc-oh/blob/main/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs

Mostly bog-standard dapr pubsub. However, the subscriber sits in the same process as the publisher. I connected Redis pubsub. Now here is the part that I don't get: when PublishEventAsync is called, it blocks while the subscriber is running (even with retries!). This is illustrated by the breakpoints I added (they really execute serially in that order).

WhyIsPublishEventAsyncActuallyBlockingOnDelivery

My expectation was that PublishEventAsync sends the message & returns immediately. And then processing will start with maybe a small delay - but definitely not blocking all the way in the call chain. Am I doing something wrong, is my view how it should work wrong, ...?

As-is this kills the implementation of an outbox (custom, where sensitive data is retained in the database and only the id is pubsub'd) where a State is transitioned from Pending to Delivered right after the call to PublishEventAsync, because the consumers always runs before there is any chance to write the state change to the database.

Should that go into dapr/dapr? Because looks like underlying behavior.

@philliphoff
Copy link
Collaborator

My understanding is that PublishEventAsync() will complete once the Dapr sidecar has successfully delivered the message to the underlying pub-sub component. Delivery to the ultimate subscriber of the message by that component, whether directly or again through the Dapr sidecar is a disconnected and independenet task. I'd be careful about judging timing based on breakpoints; the debugger adds a significant amount of latency and can greatly alter the apparent flow of asynchronous tasks, especially if those tasks are inherently short lived (e.g. because everything's running locally so publish and delivery happen very quickly).

Even without debugger latency, given the parallelism of publish and delivery, it's entirely possible in the normal course of things that breakpoint #2 is hit before breakpoint #3, as processing of HTTP incoming requests can happen at the same time as processing of outgoing messages wraps up.

@christophwille
Copy link
Author

christophwille commented May 1, 2024

Our real-life application had instrumentation that showed exactly the behavior that the breakpoints exhibit - including the retries.

We had a loop delivering multiple events after each other, and even that was serial. Yes, the sample might seem simple, but the behavior is "PublishEventAsync delivering to Redis, calling the sub endpoint, and only then returning".

@philliphoff
Copy link
Collaborator

Do you see this same behavior with any other pub-sub component? I'm curious as to whether it might be a Redis-specific phenomenon.

@christophwille
Copy link
Author

No, I haven't tried anything other than Redis yet (we started seeing the issue on local developer boxes).

@christophwille
Copy link
Author

I modified the sample to write logs (https://github.com/christophwille/poc-oh/blob/f94ae80bed3e46d4ce46b478cd2dcd1b32744d93/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs#L36) - and I am running it without a debugger:

  enqueue: entering 06:22:54.567 PM
  sub: entering 06:22:54.705 PM
  enqueue: after PublishEventAsync 06:22:54.713 PM
  sub: leaving 06:23:04.714 PM

Now that I actually have async code in the subscriber (even if it is a Task.Delay only), it yields. Still, the subscriber is entered before PublishEventAsync comes back. That is very strange to say the least.

@christophwille
Copy link
Author

Ok, one more thing came to my mind - given that dapr turns around immediately and calls the endpoint, how on earth is that going to scale? The intent was to have one Hangfire job that publishes all messages that are backed up (Hangfire allows single-instance jobs across multiple container instances) and then consumed in scaled-out fashion. Now that the message comes in immediately on the same instance, would a rejection on first processing mean it would come in at the same instance again on retry or would then there be a distribution across all instances?

@christophwille
Copy link
Author

christophwille commented May 2, 2024

I gave it a try with ServiceBus:

  enqueue: entering 10:55:37.896 AM
  enqueue: after PublishEventAsync 10:55:38.955 AM
  sub: entering 10:55:38.968 AM
  sub: leaving 10:55:48.972 AM

Thus it seems to be specific to Redis that entering the subscriber is quicker than PublishEventAsync returning. And the effect of it "looking like" being "publish+call subscriber" in a single step.

Maybe I wouldn't have stumbled onto this strange ordering if I had done the usual "publish event, delete row" instead of "publish event, update row status".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants