Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient.GetStringAsync Suppressed Exceptions Cause Deadlock #34164

Closed
joelrevans opened this issue Mar 26, 2020 · 10 comments · Fixed by #42346
Closed

HttpClient.GetStringAsync Suppressed Exceptions Cause Deadlock #34164

joelrevans opened this issue Mar 26, 2020 · 10 comments · Fixed by #42346
Assignees
Milestone

Comments

@joelrevans
Copy link

joelrevans commented Mar 26, 2020

HttpClient.GetStringAsync Suppressed Exceptions Cause Deadlock

.NET Core 3.1 running on Windows 10. Console application using default template, entire application code is included below.

using System.Net.Http;
using System.Threading.Tasks;
namespace tmp
{
    class Program
    {
        static async Task Main(string[] args)
        {
            string test_site = "https://beautymatter.com/";
            HttpClient client = new HttpClient();
            await client.GetStringAsync(test_site);    //permanent deadlock 10% of the time
            await client.GetAsync(test_site);          //throws exception 10% of the time
        }
    }
}

I found a website, beautymatter.com, that causes an unusual deadlock when using HttpClient. The site has some odd configurations that result in random I/O connection failures (~10% of the time) that incidentally make it an excellent platform to test for edge-cases.

What I discovered is that the HttpClient.GetStringAsync appears to be suppressing exceptions and entering permanent deadlock when the transport stream fails to close properly. HttpClient.GetAsync correctly throws exceptions for these transport errors.

@scalablecory
Copy link
Contributor

When you say "deadlock" you mean the request simply never returns?

What happens when you decrease HttpClient.Timeout property? It defaults to 100 seconds.

@scalablecory scalablecory transferred this issue from dotnet/core Mar 26, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Mar 26, 2020
@davidsh
Copy link
Contributor

davidsh commented Mar 26, 2020

https://www.ssllabs.com/ssltest/analyze.html?d=beautymatter.com

reports some unusual errors in the TLS server certificate chain. Not sure if this might be related to the reported "hang" problem with HttpClient,

image

including an additional certificate being returned from the server that doesn't appear to be in the proper chain of trust for the server certificate:

image

@davidsh
Copy link
Contributor

davidsh commented Mar 26, 2020

@wfurt @bartonjs

@bartonjs
Copy link
Member

bartonjs commented Mar 26, 2020

I believe that "Certificate #2" is "the certificate we get if we don't include the SNI header", which matches what I see in openssl s_client -connect beautymatter.com:443 (no SNI) vs openssl s_client -connect beautymatter.com:443 -servername beautymatter.com (with SNI)

@joelrevans
Copy link
Author

When you say "deadlock" you mean the request simply never returns?

What happens when you decrease HttpClient.Timeout property? It defaults to 100 seconds.

Decreasing timeout has no discernable effect.

@joelrevans
Copy link
Author

joelrevans commented Mar 27, 2020

The GetAsync call throws the following exceptions. I'm not experienced enough to read into it.

The issue is not so much about the exception or hang, it is the missing exception by the GetStringAsync that is the issue. Deadlock is the worst outcome.

Unhandled exception. System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStream.<FillBufferAsync>g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.FillAsync()
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionResponseContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at tmp.Program.Main(String[] args) in C:\Users\JOEL\desktop\tmp\Program.cs:line 13
   at tmp.Program.<Main>(String[] args)

and this...

Unhandled exception. System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException:  Received an unexpected EOF or 0 bytes from the transport stream.
   at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at System.Net.Http.HttpConnection.FillAsync()
   at System.Net.Http.HttpConnection.ChunkedEncodingReadStream.CopyToAsyncCore(Stream destination, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionResponseContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at tmp.Program.Main(String[] args) in C:\Users\JOEL\desktop\tmp\Program.cs:line 13
   at tmp.Program.<Main>(String[] args)

@Bio2hazard
Copy link

Bio2hazard commented Mar 27, 2020

I reported this issue before at #31259 👍

Here is the relevant code:

GetStringAsyncCore(GetAsync(requestUri, HttpCompletionOption.ResponseHeadersRead, cancellationToken), cancellationToken);

In dotnet core 3.1 these methods do not have the cancellation token overloads yet, but still specify HttpCompletionOption.ResponseHeadersRead.

The fact is that any usage of HttpCompletionOption.ResponseHeadersRead without a timed cancellation token has the potential to permanently deadlock your app.

Here is a live repro of the issue that shows the dangerous behavior https://dotnetfiddle.net/UpFwOD - I had to set the timeout to 3 seconds due to dotnet fiddle runtime limitations but if you copy the code you will see that it permanently deadlocks indefinitely.

Full framework had a 300 second default timeout for idle sockets, but for some reason that was never added to dotnet core.

I strongly maintain that in order to make httpclient safe for customers to use, either a way to set a global idle socket timeout must be provided or all calls that use HttpCompletionOption.ResponseHeadersRead must be forced to specify a timed cancellation token ( which at this point would obviously be a huge breaking change ).

Let me give an example of how many of your customers this current behavior is endangering:

Did you know that the AWS SDK, with 52,965,649 downloads total and close to 30k downloads per day forces HttpCompletionOption.ResponseHeadersRead on all outbound .net core requests?

And how ( until 8 days ago - omg they added their first async overload ) their MD5 Wrapper stream that wraps all their responses does not provide async method overloads

And how the base Stream.cs Async methods that are then used evaluate the cancellation token on method call and then completely ignore it, meaning it can never do it's job?

And how this all results in customers thinking their usage of a Stream is safe against deadlocks because they passed in a cancellation token, which silently and unbeknownst to them actually doesn't do anything because some stream implementation in the chain didn't overload one of the async methods?

AWS Services are fairly reliable, but I've experienced multiple time - on production services mind you - that S3 got into a bad state where it never closes the connection, deadlocking our production service. Worse yet, it seems to take a few minutes for AWS to realize something is wrong and take the faulty S3 service out of rotation. Since calls to S3 are behind a round robin load balancer it means that literally every service in the entire AZ gets deadlocked. Oh yeah, restarting the services right away doesn't work either since the faulty S3 service is still in rotation.

The point I am trying to make is that if AWS makes a single mistake with the configuration of their servers, all dotnet core apps using AWS will deadlock. Period. And will remain deadlocked until they are restarted - even if AWS fixes the issue on their end right away.

Just ... let us configure a timeout for idle sockets & give us a IAsyncStream interface so we can start building modern, fully-async streams 😢 Or at the very least, for non-overloaded async methods with cancellation token register a callback that closes the stream if the cancellation token is cancelled prior to completion, like so: cancellationToken.Register(src => ((Stream)src).Close(), this)

Edit: also this has been reported over a year ago at #28404 (comment) but the root issues have yet to be addressed.

@alnikola alnikola removed the untriaged New issue has not been triaged by the area owner label Apr 24, 2020
@alnikola
Copy link
Contributor

Triage:

  1. We'll fix GetStringAsync and GetByteArrayAsync and make them respect HttpClient's Timeout and CancelPendingRequests to address the initial issue
  2. Fine-grained socket's timeout control (including, option to set idle timeout) requires a separate discussion

@alnikola alnikola removed their assignment Apr 24, 2020
@alnikola alnikola added this to the 5.0 milestone Apr 24, 2020
@karelz karelz added the bug label Apr 24, 2020
@karelz karelz added the help wanted [up-for-grabs] Good issue for external contributors label May 29, 2020
@karelz karelz modified the milestones: 5.0, Future May 29, 2020
@Spongman
Copy link

Spongman commented Sep 5, 2020

it's surprising to me that this is 'up-for-grabs' and not higher priority. this bug (and the other related missing timeout/token API issues) renders HttpClient unfit for purpose in real-world network conditions.

@handlespeed
Copy link
Contributor

Just confirm, seems current issue has been fixed in PR. Is it correct? Or still has something we can participate on this, thanks :)

@stephentoub stephentoub self-assigned this Sep 16, 2020
@stephentoub stephentoub modified the milestones: Future, 6.0.0 Sep 16, 2020
@stephentoub stephentoub removed the help wanted [up-for-grabs] Good issue for external contributors label Sep 16, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.