Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DefaultCloseTimeoutMilliseconds configurable #1318

Closed
ajbeaven opened this issue Jun 22, 2021 · 9 comments
Closed

Make DefaultCloseTimeoutMilliseconds configurable #1318

ajbeaven opened this issue Jun 22, 2021 · 9 comments

Comments

@ajbeaven
Copy link

ajbeaven commented Jun 22, 2021

I operate a chat app that involves maintaining a list of active users. When a user connects they are added to this list of active users, and when they disconnect they are removed. All clients are notified when a user is added to or removed from this list.

Here's some simplified code showing the disconnect:

public override async Task OnDisconnectedAsync(Exception exception) {
    var currentUserId = GetCurrentUserId();
    activeUsers.RemoveAll(u => u.UserId = currentUserId);

    await Task.Delay(10000);
    if(!activeUsers.Any(u => u.UserId == currentUserId)) {
        await ChatService.MarkUserOffline(currentUserId);
        await Clients.All.SendAsync("UserOffline", currentUserId);
    }

    await base.OnDisconnectedAsync(exception);
}

It's a bit more involved than this as the server side needs to deal with many complexities surrounding handling of online users (eg. chat requests sent to users that aren't active when they're caught moving between pages), but this should give you a good idea of the general gist.

You'll note that there is a Task.Delay here. Before this delay was added, when a user refreshed the page or moved to a different page, clients would receive a UserOffline message so the user was immediately removed from the list of online users. They reappeared shortly after though when the new page was loaded and the client reconnected. To avoid this "flickering" the delay was added to only notify clients that the user left if they hadn't reconnected after a 10 second delay. A 10 second delay should be enough to load even the slowest pages on the site and have the client reconnect whereas a delay of less than 5 seconds would in some cases be too short.

Unfortunately ASRS seems to have a problem with this delay because it throws the following exception

Microsoft.Azure.SignalR.Common.AzureSignalRException: Cancelled running application task, probably caused by time out.
  ?, in async Task ServiceConnection.ProcessIncomingMessageAsync(ClientConnectionContext connection)
  ?, in async Task ServiceConnection.ProcessClientConnectionAsync(ClientConnectionContext connection)

It looks like this occurs because ServiceConnection.cs forcibly closes the disconnect task after 5 seconds:

namespace Microsoft.Azure.SignalR
{
    internal partial class ServiceConnection : ServiceConnectionBase
    {
        private const int DefaultCloseTimeoutMilliseconds = 5000;
        ….

        private async Task ProcessIncomingMessageAsync(ClientConnectionContext connection)
        {
            // Wait for the application task to complete
            // application task can end when exception, or Context.Abort() from hub
            var app = ProcessApplicationTaskAsyncCore(connection);
 
            var cancelTask = connection.ApplicationAborted.AsTask();  // < ========= DefaultCloseTimeoutMilliseconds
            var task = await Task.WhenAny(app, cancelTask);
 
            if (task == app)
            {
                await task;
            }
            else
            {
                // cancel the application task, to end the outgoing task
                connection.Application.Input.CancelPendingRead();
                throw new AzureSignalRException("Cancelled running application task, probably caused by time out.");
            }
        }

This exception occurs each time a client disconnects, so there are a lot of exceptions being thrown.

As an aside, I'm porting this application over from SignalR for ASP.NET to SignalR for .NET Core. The ASP.NET version acts the same way, but doesn't appear to have a problem with a 10 second timeout in the disconnect (or at least I haven't captured any exceptions thrown as a result).

Describe the solution you'd like

I was hoping that ASRS could introduce a way to configure the DefaultCloseTimeoutMilliseconds service, so I could increase it to something like 12000ms to avoid the timeout exception.

@vicancy
Copy link
Member

vicancy commented Jun 25, 2021

With a rethink, I think leaving applicationTask running makes more sense.

@ajbeaven
Copy link
Author

Thanks a lot for working on this @vicancy, it's really appreciated. So you're thinking you won't cancel the ProcessApplicationTaskAsyncCore() task at all? Do you think you'll still end up exposing the clientTimeout?

@vicancy
Copy link
Member

vicancy commented Jun 25, 2021

Yeah, I think we need to be consistent with the self-host SignalR which does not cancel the application task. So I'd like to fix the issue by not canceling the application task instead.

@ajbeaven
Copy link
Author

Either one works for me!

@vicancy
Copy link
Member

vicancy commented Jun 25, 2021

Just realized why we have this timeout is because client connection lifetime impacts the auto-reconnect of the server-connection. When a server-connection drops, it first clean up all the existing client connections, waiting for the client connections to complete their lifetime, and then restart another server-connection. If there is some malformed code inside one client hub, it can impact the whole stability of the server-connection if there is no timeout.

For your case, I think the solution is to start a new Task for this instead of blocking the disconnect method:

    public class NotificationService
    {
        private readonly IHubContext<ChatHub> _context;

        public NotificationService(IHubContext<ChatHub> context)
        {
            _context = context;
        }

        public async Task NotifyUserOffline(Func<string, bool> exists, string currentUserId)
        {
            await Task.Delay(10000);
            if (!exists(currentUserId))
            {
                await _context.Clients.All.SendAsync("UserOffline", currentUserId);
            }
        }
    }
    public class ChatHub : Hub
    {
        private readonly NotificationService _service;
        public ChatHub(NotificationService service)
        {
            _service = service;
        }
        public override async Task OnDisconnectedAsync()
        {
            _ = _service.NotifyUserOffline(user => ..your logic here..., currentUserId);
            await base.OnDisconnectedAsync();
        }
  }

@ajbeaven
Copy link
Author

I suppose that makes sense. The production code actually uses IHostApplicationLifetime to supply a cancellation token for Task.Delay; I'd have expected something like that would be used to avoid the sorts of issues you mentioned but no trouble.

Thanks for that code :) This approach will mean some funky exception handling but I should be able to make it work.

@ajbeaven
Copy link
Author

The fire and forget nature of this solution has made things pretty messy in my non-simplified code. If you ever do end up exposing the clientTimeout, please update this issue as I think that's probably preferable to me at this stage.

Re: not exposing the clientTimeout - is there a risk of instability with ASRS if this were to be make configurable or is this just a use case you don't want to support? (totally fine if that's the case as I know this is probably uncommon. I'm just interested to know the rationale).

@vicancy
Copy link
Member

vicancy commented Jul 7, 2021

Hi @ajbeaven, please try the latest package 1.9.1 which no longer cancel the application task

@ajbeaven
Copy link
Author

ajbeaven commented Jul 7, 2021

@vicancy, that seems to have fixed it. Thanks heaps!

@ajbeaven ajbeaven closed this as completed Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants