-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HTTP Connection Pool] Lack of timeout on SSL connection establishment caused high number of pending connections in pool #110598
Comments
Your analysis looks correct. Are you running on .NET 6? We are aware of this issue, and .NET 7 (and newer) includes a fix (#71785) that will cancel such stuck connection attempts even if you haven't set a We've considered changing the default to a non-infinite value in the past, but backed the change out due to concerns of breaking existing scenarios that expect a specific exception type (#66297 (comment)). It's also less important now given the aforementioned change. |
Hello @MihaZupan , thanks for your response! We are actually using .NET 8, so sounds like it should be fixed. Reading though the fix you referenced in your response, I think I understand how the cancellation token is cancelled when the originating request is served/cancelled, but would you mind pointing out how |
Yes, this shouldn't be happening on .NET 8.
If a connection attempt fails, it'll call into Line 321 in 39111b0
|
Note that TCP behavior is applicable to TLS as well as the |
We are not using ConnectCallback. Any chance that the cancellationToken is not fully honored by EstablishSslConnectionAsync and the method that it calls? I've read a few posts about network stream calls hanging even with cancellationToken cancelled. For example, this stackoverflow post |
@MihaZupan @wfurt , any insights on the question above? We are concerned about whether the cancellationToken is fully honored in the EstablishSslConnectionAsync and all the method calls that it generates. If it's not fully honored (e.g., it could be that the cancellation token is opportunistically honored, by checking only at the beginning of a method), then there is no guarantee that the task will throw/finish when the cancellation token is cancelled. In that case, both the fix referenced by @MihaZupan above and the recommendation of setting the ConnectTimeout will not work, since both of them rely on cancellationToken |
Cancellation should be honored at any point during the call, I'm not aware of cases where it wouldn't be in current versions. @antonfirsov @liveans does that ring any bells? |
I agree with @MihaZupan. That article is 12 years old and the behavior improved a lot since. The problematic cases I've seen recently are related to firewall or load balancer cutting communication in the middle. The TCP stack would never receive FIN or RST so it would see the connection as established and And BTW what platform are you running on? You may try to collect some packet captures as well as enable internal tracing. |
I'm not aware of anything besides DNS on Linux, but the discussion seems to point to TLS. |
@wfurt we are on windows server. I am working with @MihaZupan offline to securely transfer the dump file for .NET team to further analyze. In the meantime, our team is considering moving to HttpClientFactory, which will from our understanding periodically tear down handlers and underlying connection pools. Out of curiosity, @wfurt , how to set TCP keep alive from HttpClient/SocketsHttpHandler? I was not able to find such option exposed at HttpClient/SocketsHttpHandler level. |
Yes, You can set tcp keepalive options in the var handler = new SocketsHttpHandler
{
ConnectCallback = async (context, cancellationToken) =>
{
var socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
try
{
socket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive, true);
socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveTime, 60);
socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveInterval, 1);
await socket.ConnectAsync(context.DnsEndPoint, cancellationToken);
return new NetworkStream(socket, ownsSocket: true);
}
catch
{
socket.Dispose();
throw;
}
}
}; |
There's a race condition here where if the initiating request completes right away, before we're able to set the Lines 256 to 263 in c969265
It's easy to reproduce if you add a delay between the Yield and assigning the cts , while failing the request.
|
Description
Our service uses HttpClient to send requests to downstream services, and we observed that,
We took a dump and based on the discovery formed a hypothesis that explains above and would like .NET team to check if the hypothesis is reasonable.
Observations from dump
HttpConnectionPool
that serves the destination has 88 associated connections and all of them are pending, which implies that the connection establishments are hangingAsyncTaskMethodBuilder
for various methods on the heap, it seems that SSL connection establishment is the culprit, not TCP connectionSystem.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.ValueTuple<System.IO.Stream, System.Net.TransportContext, System.Net.IPEndPoint>>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectAsync>d__103>
System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.IO.Stream>+AsyncStateMachineBox<System.Net.Http.HttpConnectionPool+<ConnectToTcpHostAsync>d__104>
System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Net.Security.SslStream>+AsyncStateMachineBox<System.Net.Http.ConnectHelper+<EstablishSslConnectionAsync>d__2>
System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>+AsyncStateMachineBox<System.Net.Security.SslStream+<ForceAuthenticationAsync>d__150<System.Net.Security.AsyncReadWriteAdapter>>
System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Int32>+AsyncStateMachineBox<System.Net.Security.SslStream+<ReceiveHandshakeFrameAsync>d__151<System.Net.Security.AsyncReadWriteAdapter>>
Analysis
Upon checking code in
HttpConnectionPool
, it seems like, under default setting, there is no cancellation forConnectToTcpHostAsync
andEstablishSslConnectionAsync
(a cancellation token is passed, but with InfiniteTimeSpan). It kind of makes sense for TCP connection, as OS has timeout at OS level, but for SSL connection, I am not aware of any OS level timeout. With no OS or application level timeout, SSL connection can hang indefinitely.Hypothesis
HttpConnectionPool
started to get contaminated with connections that hangs in SSL connection phasePoolConnectionLifetime
set in our application, healthy connections start to die off when their lifetime is up, so there are less and less healthy connections in the connection pool. Pending connections does not seem to honorPoolConnectionLifetime
._pendingHttp11ConnectionCount
in the connection pool. High_pendingHttp11ConnectionCount
makes it harder to start new connections (Connection pool has logic that only start new connection if request queue length is larger than_pendingHttp11ConnectionCount
)Asks to .NET team
Reproduction Steps
No manual repro. As described above, our service sees timeouts/failures sending HTTP requests even after network outage is resolved on the infrastructure that hosts the destination of the request.
Expected behavior
HttpClient should be able to send requests to destination, after the network outage impacting the destination is resolved, without restarting sender application/machine.
Actual behavior
HttpClient reports failures and timeouts sending requests to destination, even after the network outage impacting the destination is resolved. The issue is only fixed with restarting application/machine.
Regression?
n/a
Known Workarounds
No response
Configuration
n/a
Other information
n/a
The text was updated successfully, but these errors were encountered: