-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional trace to HttpConnectionPool #66605
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThe additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection. Related to #66297
|
@@ -478,6 +478,13 @@ private void CheckForHttp11ConnectionInjection() | |||
return; | |||
} | |||
|
|||
if (NetEventSource.Log.IsEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add some telemetry counters? So that you wouldn't need to turn on the whole debugging logs to diagnose it.
cc @MihaZupan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of counters did you have in mind?
We could expose many things, but they may not be too useful in diagnosing an issue like this since they are all aggregated for the process.
I see System.Net.Sockets
doesn't have a "current" connection attempt counter - I don't see a reason not to add that. We have "current" counters on all the other sources.
Something like PendingHttp11ConnectionCount
on Http would then be sort of redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to point out is that in order to get this diagnostics log, you need to turn on the internal logging. It's very verbose and chaotic and not designed to be used in production. I see it mostly as our internal logging helping us diagnose issues. In general, we don't recommend customers to turn this on. And there were even attempts to get completely rid of this.
So the question is whether this trace should help us or the customer diagnose the issue? If it's for customers, shouldn't it be part of the official HTTP telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventSource does have the LogLevel for a reason, so we could certainly expose a lot more info under debug/trace.
The "official" telemetry surface is an API we effectively promise not to break.
As such, these logs shouldn't be unique events with a bunch of parameters since they are likely to change/break in the future.
We can, however, have something like
ConnectionPoolDebug(long poolId, string message);
Where you can still easily collect and aggregate all the information around the pool behavior, it just wouldn't be ideal for automatic processing.
Alternatively, we can start using the Keywords feature of EventSource (filtering by topic). Or the connection pool can have its own EventSource System.Net.Http.ConnectionPool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this will be useful for future debugging, thanks.
All of this info is sort of available if you have access to all the events since the start of the connection pool. But if someone just observes moment-in-time behavior (turning on tracing when there is an issue happening), they would be lacking this info, making the trace much less useful.
Something that may also be helpful is including the exception in the trace on connection failure here (same for H2, H3):
runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs
Line 1622 in 68af49d
if (NetEventSource.Log.IsEnabled()) Trace("HTTP/1.1 connection failed"); |
/backport to release/6.0 |
Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/2000094673 |
@CarnaViire backporting to release/6.0 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Add additional trace to HttpConnectionPool
Applying: Add exception to trace in case of failed connect
error: sha1 information is lacking or useless (src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Add exception to trace in case of failed connect
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
The additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in HttpConnectionPool. The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection. Related to dotnet#66297
The additional trace should help troubleshoot hanged "pending" connection issues (i.e. the reasons why new connections were not created) in
HttpConnectionPool
.The issue manifests as growing number of timeouted requests. It is not easy to link a timeouted request (esp. with a small timeout) to a connection that was created 3 minutes ago but still is not connected. There are no traces of the state of HttpConnectionPool, e.g. number of pending connections, so it is unclear from existing traces why the pool decided not to create a connection.
Related to #66297