Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reliable timeouts to socket operations #17711

Open
GSPP opened this issue Jun 27, 2016 · 10 comments
Open

Add reliable timeouts to socket operations #17711

GSPP opened this issue Jun 27, 2016 · 10 comments
Labels
area-System.Net.Sockets enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@GSPP
Copy link

GSPP commented Jun 27, 2016

Socket operations do not support timeouts sufficiently in the following cases:

  1. All async operations do not respect the configured timeout.
  2. Connect cannot be made to observe a timeout. I think there is no timeout for Accept either but that's usually an operation that is never supposed to time out.

The workarounds usually are quite nasty. For (1) people build their own (tedious and often flawed) timeouts. For (2) people often use an event plus an async IO. In case the timeout fires they close the socket. (This actually triggers this unavoidable race condition resulting in an access violation!)

Note the amount of confusion that these problems cause: https://www.google.com/webhp?complete=1&hl=en&gws_rd=cr,ssl&ei=#complete=1&hl=en&q=socket+beginreceive+timeout (I sometimes like to point to Google and Stack Overflow to show a nice sample of real-world user complaints. I have opened this ticket because I'm repeatedly facing this issue as well.)

I understand that the underlying Winsock (and now Linux) APIs make some of this hard. Since these issues are so widespread I believe it's worth the work to fix this at the managed code level (by implementing the timeouts and aborts in .NET code as opposed to making the native APIs do that).

As an open concern for (1) I see the question what happens to an IO that is cancelled at the managed level due to timeout or cancellation but still running at the native level. This can cause data to be discarded. The only clean way to solve this that I could find is to terminate the connection when a read or write ends up being cancelled. In my experience this matches perfectly with real world requirements. In case of a timeout usually the calling code backs out and just wants to shut everything down.

To summarize the ticket: Please add timeouts to all possibly blocking socket operations. Related ticket: Allow the user to cancel anything at will (this is needed in addition to the timeout).. Also related: Add Task-based async methods which should take a CancellationToken and observe the configured timeout.

@GSPP
Copy link
Author

GSPP commented Jun 27, 2016

I just found out that operating systems limit the maximum timeout that is effectively supported. My Windows 7 uses at most 21 seconds and gives up. Web searches show that Linux has a different policy.

Therefore, the connect timeout as per this ticket is to be understood as a maximum timeout. This needs to be documented should the feature be added.

@davidsh
Copy link
Contributor

davidsh commented Jun 27, 2016

cc: @CIPop @stephentoub

@karelz
Copy link
Member

karelz commented Feb 23, 2017

It may lead to new API proposal.

Next steps:

  • Fully understand the ask above (it needs deeper reading),
  • Understand OS capabilities

@springy76
Copy link

Just my 2ct: When there will be finally a CancellationToken parameter, will this not automatically allow setting timeouts by using a self-cancelling CancellationTokenSource as such?:

new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds)).Token

Is this bad practice or problematic in other ways?

@GSPP
Copy link
Author

GSPP commented Aug 12, 2018

Once we have cancellation based on CancellationToken as proposed in the first ticket (https://github.com/dotnet/corefx/issues/5749 dd a clean way to cancel a long-running Socket operation (e.g. accept or connect)) implementing timeouts should be easier.

There is still a need to have timeouts directly supported. Pretty much any application needs to put a timeout on socket operations. I repeat here that all async operations do not respect any configured timeout at the moment.

Unfortunately, there are already properties int Socket.ReceiveTimeout and int Socket.SendTimeout. They internally use SetSocketOption to make the OS implement the timeout. I read that sync socket IO is actually implemented in user mode around async kernel IO (on Windows). Then I assume that these timeouts are implemented there as well.

For compatibility reasons we cannot change the fact that async operations do not respect these properties. This means that we need a second way to specify a timeout.

It is also problematic, that these properties can only specify an integer number of seconds. Since we are touching these APIs anyway we should allow for more granular timeouts. In my opinion the framework should use TimeSpan for timeouts everywhere. But for consistency we might need use use int milliseconds. I'm unsure about this. Below, I used TimeSpan to avoid units being ambiguous.

Shutdown never blocks (https://docs.microsoft.com/en-us/windows/desktop/api/winsock/nf-winsock-shutdown) so there is no need to modify it.


API proposal:

We need these timeouts:

  • ReceiveTimeout
  • SendTimeout
  • ConnectTimeout
  • AcceptTimeout
  • DisconnectTimeout (also affects closing in my proposal)

I propose adding these on an options object:

class SocketTimeouts
{ 
    public TimeSpan? ReceiveTimeout { get; set; }
    public TimeSpan? SendTimeout { get; set; }
    public TimeSpan? ConnectTimeout { get; set; }
    public TimeSpan? AcceptTimeout { get; set; }
    public TimeSpan? DiconnectTimeout { get; set; }
}

null values are the default. In this case the behavior is exactly as before.

Socket class:

public SocketTimeouts Timeouts
{
 get { return this.timeouts ?? (this.timeouts = new SocketTimeouts()); }
 set
 {
  if (value == null) throw new ArgumentNullException();
  this.timeouts = value;
 }
}

This object can be accessed and set as a whole as Socket.Timeouts. It is just a behaviorless DTO object. It is not owned by any socket so that this object can be allocated once by the application and reused. This makes this viable for low allocation scenarios.

I would not touch NetworkStream, TcpClient, TcpListener, UdpClient and UdpListener. Timeouts can be set directly on the underlying socket.

Alternatives:

  1. Add new timeout properties to Socket directly.
  2. Keep the existing int timeouts and add a property bool UseTimeoutsForAsync as a compatibility switch.
  3. Don't add properties but overloads on operations accepting int and TimeSpan. This does not work for receive and send, though, because specifying timeouts on each call to these functions is not realistic. Reads might exist inside of library code or happen through NetworkStream.

@GSPP
Copy link
Author

GSPP commented Oct 24, 2018

Cancelling a Connect call by disposing the socket results in the following really scary message:

An operation was attempted on something that is not a socket

With SocketError.NotSocket. This sounds like it might cause handle confusion problems destabilizing the entire process. If the Socket class had timeouts built-in this kind of nastiness would go away.

@wfurt
Copy link
Member

wfurt commented Nov 30, 2022

triage: we should investigate option for Connect

@mjanulaitis
Copy link

Wow this is still a bug in .Net 7. I would have though MS would have removed the API in the .Net 7 release since it STILL doesn't work. I just wasted my morning re-writing my blocking socket code to use the async methods only to find out it's busted after 7 years since the first report of it. Ok I'll just delete the branch I wasted my morning creating. I am specifically trying to shut down a TCP server that is stuck in an AcceptAsync.

@stephentoub
Copy link
Member

I am specifically trying to shut down a TCP server that is stuck in an AcceptAsync.

Can you share a repro? The cancellation token passed to AcceptAsync should enable it to be canceled.

@mjanulaitis
Copy link

@stephentoub I think possibly this has to do with me pressing Ctrl-C. I have a handler for the event then cancel the tasks, however currently it seems like the TPL just dumps tasks that are running rather than allowing the awaited methods to throw a cancellation event. The debugger then hangs the app. When I run without the debugger the process stops as expected. I'm still debugging now.

@antonfirsov antonfirsov removed their assignment Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Sockets enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests