Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestAsync doesn't resume after broker restart (2.x) #272

Closed
cocowalla opened this issue Sep 24, 2017 · 7 comments
Closed

RequestAsync doesn't resume after broker restart (2.x) #272

cocowalla opened this issue Sep 24, 2017 · 7 comments

Comments

@cocowalla
Copy link

Sorry, another issue relating to the broker going down!

Using RawRabbit 2.x with request/response, I have an IBusClient that calls RequestAsync. When the broker goes down after at least one request has already been sent, I see these messages logged from RawRabbit:

The existing connection is not open.
Connection is recoverable. Waiting for 'Recovery' event to be triggered.

When the broker comes back up, I see:

Connection has been recovered!

...which sounds very promising, but actually the call to RequestAsync appears hung, as it never returns.

@pardahlman
Copy link
Owner

Hello, hi! No worries - the issues you raise are relevant for the client, so don't stop 👍

Is your scenario that the broker goes down after a request is sent, before the response is received? And does this occur when a RPC call of the same message types have been successfully performed?

@cocowalla
Copy link
Author

It's after an RPC call of the same message type has been successfully performed (both sent and received)

@pardahlman
Copy link
Owner

I took a look at this, and it turns out that the issue was mitigated in the consumer factory in a recent commit (7b9a12242).

I tried to reproduce the issue you described by:

  1. completing a RPC
  2. restart the broker
  3. perform a RPC with same messages as in 1

Doing this with the current code in branch 2.0 executed as expected.

@cocowalla
Copy link
Author

I still have the same issue. Slightly different steps to reproduce:

  1. Complete an RPC
  2. Take down the broker
  3. Start an RPC
  4. Bring the broker back up

The call to RequestAsync from step 3 never completes.

pardahlman added a commit that referenced this issue Sep 29, 2017
This fix was originally intended for RPC calls not
returning response if the broker goes down before
the response is sent. However, the issue is that
in case of broker goes down and then recovers we
need to assure two things:

1. wait until the recovery has taken place
2. sanity check if the channel has been reset, and if so do not ack
@pardahlman
Copy link
Owner

I've got some good news and some bad news 😉

The good news is that I've identified the problem. It was also related to the publish sequence that was being reset when the broker was restarted. RawRabbit didn't check to see if the delivery tag made sense for the channel, and could try to ack on a delivery tag that the broker did not recognize. With a fix for this in place I was able to restart the broker mid-RPC and get the response.

The bad news is that it does seem to wrok with direct RPC (which is what RawRabbit uses by default). I don't know if the "pseudo queue" used for response is configured with auto-delete (and thus removed when the consumer disconnects).

Here's an example of how the request was configured when I got it to work:

await requester.RequestAsync<BasicRequest, BasicResponse>(new BasicRequest(), ctx => ctx
  .UseRequestConfiguration(cfg => cfg
    .ConsumeResponse(r => r
      .Consume(c => c
        .WithRoutingKey("response_key"))
      .FromDeclaredQueue(q => q
        .WithName("response_queue"))
      .OnDeclaredExchange(e => e
        .WithName("response_exchange")
      )
    )
  ), ct: cs.Token);

Two things to note:

  1. Might be a good idea to provide a cancellation token if expecting broker disconnects as recovery takes some time and a cancellation token overrides the default request timeout
  2. The response queue needs to have AutoDelete set to false, otherwise it may be removed before the response is received. In a real-life-scenario, I would probably use a guid to create a unique routing key in order to guarantee that the response is routed to the correct application (in case of multi instance/multi thread etc)

@cocowalla
Copy link
Author

Thanks for looking into this, I'll give it a try tomorrow!

Regarding a cancellation token, yes, I think what I'll do is use one to timeout requests after a while, then retry. That way, even if the request hangs while the broker is down, it will timeout eventually regardless.

@cocowalla
Copy link
Author

Excellent, the config you provided works as described!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants