-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PublishEventAsync Blocking When Delivering Locally? #1280
Comments
My understanding is that Even without debugger latency, given the parallelism of publish and delivery, it's entirely possible in the normal course of things that breakpoint #2 is hit before breakpoint #3, as processing of HTTP incoming requests can happen at the same time as processing of outgoing messages wraps up. |
Our real-life application had instrumentation that showed exactly the behavior that the breakpoints exhibit - including the retries. We had a loop delivering multiple events after each other, and even that was serial. Yes, the sample might seem simple, but the behavior is "PublishEventAsync delivering to Redis, calling the sub endpoint, and only then returning". |
Do you see this same behavior with any other pub-sub component? I'm curious as to whether it might be a Redis-specific phenomenon. |
No, I haven't tried anything other than Redis yet (we started seeing the issue on local developer boxes). |
I modified the sample to write logs (https://github.com/christophwille/poc-oh/blob/f94ae80bed3e46d4ce46b478cd2dcd1b32744d93/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs#L36) - and I am running it without a debugger:
Now that I actually have async code in the subscriber (even if it is a Task.Delay only), it yields. Still, the subscriber is entered before PublishEventAsync comes back. That is very strange to say the least. |
Ok, one more thing came to my mind - given that dapr turns around immediately and calls the endpoint, how on earth is that going to scale? The intent was to have one Hangfire job that publishes all messages that are backed up (Hangfire allows single-instance jobs across multiple container instances) and then consumed in scaled-out fashion. Now that the message comes in immediately on the same instance, would a rejection on first processing mean it would come in at the same instance again on retry or would then there be a distribution across all instances? |
I gave it a try with ServiceBus:
Thus it seems to be specific to Redis that entering the subscriber is quicker than PublishEventAsync returning. And the effect of it "looking like" being "publish+call subscriber" in a single step. Maybe I wouldn't have stumbled onto this strange ordering if I had done the usual "publish event, delete row" instead of "publish event, update row status". |
I condensed a larger solution into a minimal sample here:
https://github.com/christophwille/poc-oh/blob/main/src/DaprPubSubMinimal/DaprPubSubMinimal/Program.cs
Mostly bog-standard dapr pubsub. However, the subscriber sits in the same process as the publisher. I connected Redis pubsub. Now here is the part that I don't get: when PublishEventAsync is called, it blocks while the subscriber is running (even with retries!). This is illustrated by the breakpoints I added (they really execute serially in that order).
My expectation was that PublishEventAsync sends the message & returns immediately. And then processing will start with maybe a small delay - but definitely not blocking all the way in the call chain. Am I doing something wrong, is my view how it should work wrong, ...?
As-is this kills the implementation of an outbox (custom, where sensitive data is retained in the database and only the id is pubsub'd) where a State is transitioned from Pending to Delivered right after the call to PublishEventAsync, because the consumers always runs before there is any chance to write the state change to the database.
Should that go into dapr/dapr? Because looks like underlying behavior.
The text was updated successfully, but these errors were encountered: