-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770
[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770
Conversation
agent.SpanFilters.Add(s => s.Type == SpanTypes.Http); | ||
var spans = agent.WaitForSpans(expectedSpanCount).OrderBy(s => s.Start); | ||
var allSpans = agent.WaitForSpans(expectedAllSpansCount).OrderBy(s => s.Start); | ||
allSpans.Should().OnlyHaveUniqueItems(s => new { s.SpanId, s.TraceId }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the product change (and after we started collecting all spans from the application), we would see this error message from the test. Notice how the parent span and the WebRequest spans have the same SpanId.
Expected allSpans to only have unique items on new <>f__AnonymousType11`2(SpanId == s.SpanId, TraceId == s.TraceId), but items {TraceId: 3286320956883792616, SpanId: 7849642159726264990, Name: GetResponseWithDistributedTracingHeaders, Resource: GetResponseWithDistributedTracingHeaders, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 7849642159726264990, Name: http.client.request, Resource: GET localhost:49189/?/, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 12757857473955457543, Name: GetResponseAsyncWithDistributedTracingHeaders, Resource: GetResponseAsyncWithDistributedTracingHeaders, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 12757857473955457543, Name: http.client.request, Resource: GET localhost:49189/?/, Service: Samples.WebRequest} are not unique.
Datadog ReportBranch report: ❄️ New Flaky Tests (1)
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (72ms) : 62, 83
. : milestone, 72,
master - mean (71ms) : 63, 78
. : milestone, 71,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (995ms) : 976, 1014
. : milestone, 995,
master - mean (991ms) : 970, 1012
. : milestone, 991,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (106ms) : 101, 110
. : milestone, 106,
master - mean (106ms) : 101, 110
. : milestone, 106,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (683ms) : 665, 701
. : milestone, 683,
master - mean (683ms) : 663, 702
. : milestone, 683,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (90ms) : 87, 94
. : milestone, 90,
master - mean (90ms) : 88, 91
. : milestone, 90,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (652ms) : 629, 675
. : milestone, 652,
master - mean (650ms) : 631, 669
. : milestone, 650,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (188ms) : 185, 191
. : milestone, 188,
master - mean (188ms) : 185, 190
. : milestone, 188,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (1,134ms) : 1111, 1157
. : milestone, 1134,
master - mean (1,137ms) : 1115, 1160
. : milestone, 1137,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (272ms) : 267, 276
. : milestone, 272,
master - mean (271ms) : 267, 275
. : milestone, 271,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (1,099ms) : 1073, 1126
. : milestone, 1099,
master - mean (1,103ms) : 1076, 1129
. : milestone, 1103,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (4770) - mean (263ms) : 260, 266
. : milestone, 263,
master - mean (262ms) : 258, 267
. : milestone, 262,
section CallTarget+Inlining+NGEN
This PR (4770) - mean (1,066ms) : 1043, 1088
. : milestone, 1066,
master - mean (1,068ms) : 1044, 1091
. : milestone, 1068,
|
Throughput/Crank Report:zap:Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (4770) (10.861M) : 0, 10860898
master (10.293M) : 0, 10292812
benchmarks/2.38.0 (11.867M) : 0, 11866956
benchmarks/2.9.0 (10.958M) : 0, 10957550
section Automatic
This PR (4770) (7.846M) : 0, 7845545
master (7.626M) : 0, 7626259
benchmarks/2.38.0 (8.176M) : 0, 8175636
benchmarks/2.9.0 (7.908M) : 0, 7907971
section Trace stats
This PR (4770) (8.038M) : 0, 8037599
master (7.981M) : 0, 7980764
benchmarks/2.38.0 (8.450M) : 0, 8450370
section Manual
This PR (4770) (9.620M) : 0, 9619892
master (9.696M) : 0, 9696455
benchmarks/2.38.0 (10.334M) : 0, 10334368
section Manual + Automatic
This PR (4770) (7.243M) : 0, 7243143
master (7.342M) : 0, 7342321
benchmarks/2.38.0 (7.750M) : 0, 7750484
section Version Conflict
This PR (4770) (6.564M) : 0, 6563856
master (6.511M) : 0, 6510524
benchmarks/2.38.0 (7.137M) : 0, 7136691
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (4770) (9.515M) : 0, 9515163
master (9.373M) : 0, 9372820
benchmarks/2.38.0 (9.548M) : 0, 9548121
benchmarks/2.9.0 (9.589M) : 0, 9589306
section Automatic
This PR (4770) (6.628M) : 0, 6627651
master (6.736M) : 0, 6736494
benchmarks/2.38.0 (6.747M) : 0, 6747227
section Trace stats
This PR (4770) (7.005M) : 0, 7005269
master (6.838M) : 0, 6838459
benchmarks/2.38.0 (6.815M) : 0, 6814846
section Manual
This PR (4770) (8.446M) : 0, 8445701
master (8.359M) : 0, 8359027
benchmarks/2.38.0 (8.263M) : 0, 8263131
section Manual + Automatic
This PR (4770) (6.237M) : 0, 6236969
master (6.341M) : 0, 6341109
benchmarks/2.38.0 (6.275M) : 0, 6275411
section Version Conflict
This PR (4770) (5.625M) : 0, 5625412
master (5.798M) : 0, 5797802
benchmarks/2.38.0 (5.670M) : 0, 5669744
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (4770) (9.602M) : 0, 9601520
master (9.505M) : 0, 9504792
benchmarks/2.38.0 (9.007M) : 0, 9007066
benchmarks/2.9.0 (9.595M) : 0, 9594704
section Automatic
This PR (4770) (6.686M) : 0, 6685823
master (6.739M) : 0, 6739142
benchmarks/2.38.0 (6.423M) : 0, 6422756
benchmarks/2.9.0 (7.049M) : 0, 7048800
section Trace stats
This PR (4770) (7.050M) : 0, 7050477
master (7.014M) : 0, 7013989
benchmarks/2.38.0 (6.721M) : 0, 6720838
section Manual
This PR (4770) (8.427M) : 0, 8426964
master (8.381M) : 0, 8380765
benchmarks/2.38.0 (7.977M) : 0, 7977494
section Manual + Automatic
This PR (4770) (6.452M) : 0, 6451609
master (6.515M) : 0, 6515344
benchmarks/2.38.0 (6.349M) : 0, 6349066
section Version Conflict
This PR (4770) (5.949M) : 0, 5949452
master (6.043M) : 0, 6043023
benchmarks/2.38.0 (5.682M) : 0, 5681654
gantt
title Throughput Linux x64 (ASM) (Total requests)
dateFormat X
axisFormat %s
section Baseline
master (7.912M) : 0, 7912308
benchmarks/2.38.0 (7.578M) : 0, 7577544
benchmarks/2.9.0 (7.477M) : 0, 7476547
section No attack
master (2.269M) : 0, 2269345
benchmarks/2.38.0 (2.184M) : 0, 2184279
benchmarks/2.9.0 (3.151M) : 0, 3151044
section Attack
master (1.782M) : 0, 1782477
benchmarks/2.38.0 (1.711M) : 0, 1710542
benchmarks/2.9.0 (2.361M) : 0, 2361022
section Blocking
master (3.615M) : 0, 3614991
benchmarks/2.38.0 (3.486M) : 0, 3486124
section IAST default
master (7.079M) : 0, 7078508
section IAST full
master (6.347M) : 0, 6346955
section Base vuln
master (0.957M) : 0, 956568
section IAST vuln
master (0.931M) : 0, 930647
|
Benchmarks Report 🐌Benchmarks for #4770 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Asm.AppSecBodyBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Asm.AppSecWafBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.RedisBenchmark.SendReceive‑net6.0 | 1.123 | 1,247.38 | 1,400.52 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | SendReceive |
net6.0 | 1.25μs | 0.731ns | 2.83ns | 0.0156 | 0 | 0 | 1.1 KB |
master | SendReceive |
netcoreapp3.1 | 1.64μs | 1.17ns | 4.04ns | 0.0147 | 0 | 0 | 1.1 KB |
master | SendReceive |
net472 | 2.01μs | 0.967ns | 3.75ns | 0.177 | 0 | 0 | 1.12 KB |
#4770 | SendReceive |
net6.0 | 1.4μs | 0.438ns | 1.58ns | 0.0154 | 0 | 0 | 1.1 KB |
#4770 | SendReceive |
netcoreapp3.1 | 1.65μs | 0.524ns | 2.03ns | 0.0149 | 0 | 0 | 1.1 KB |
#4770 | SendReceive |
net472 | 1.94μs | 1.13ns | 4.36ns | 0.176 | 0 | 0 | 1.12 KB |
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 2.64μs | 1.42ns | 5.52ns | 0.0211 | 0 | 0 | 1.53 KB |
master | EnrichedLog |
netcoreapp3.1 | 3.6μs | 0.896ns | 3.35ns | 0.0215 | 0 | 0 | 1.58 KB |
master | EnrichedLog |
net472 | 4.01μs | 1.6ns | 6.21ns | 0.311 | 0 | 0 | 1.96 KB |
#4770 | EnrichedLog |
net6.0 | 2.71μs | 0.785ns | 3.04ns | 0.0217 | 0 | 0 | 1.53 KB |
#4770 | EnrichedLog |
netcoreapp3.1 | 3.59μs | 1.6ns | 6.21ns | 0.0216 | 0 | 0 | 1.58 KB |
#4770 | EnrichedLog |
net472 | 4.05μs | 0.915ns | 3.3ns | 0.31 | 0 | 0 | 1.96 KB |
Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️
Faster 🎉 in #4770
Benchmark
base/diff
Base Median (ns)
Diff Median (ns)
Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0
1.179
460.37
390.53
Benchmark | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0 | 1.179 | 460.37 | 390.53 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 460ns | 0.344ns | 1.33ns | 0.00741 | 0 | 0 | 536 B |
master | StartFinishSpan |
netcoreapp3.1 | 577ns | 0.276ns | 1.07ns | 0.00726 | 0 | 0 | 536 B |
master | StartFinishSpan |
net472 | 650ns | 0.335ns | 1.3ns | 0.0853 | 0 | 0 | 538 B |
master | StartFinishScope |
net6.0 | 492ns | 0.146ns | 0.565ns | 0.00925 | 0 | 0 | 656 B |
master | StartFinishScope |
netcoreapp3.1 | 774ns | 0.3ns | 1.12ns | 0.00892 | 0 | 0 | 656 B |
master | StartFinishScope |
net472 | 809ns | 0.287ns | 1.11ns | 0.0978 | 0 | 0 | 618 B |
#4770 | StartFinishSpan |
net6.0 | 391ns | 0.121ns | 0.47ns | 0.00762 | 0 | 0 | 536 B |
#4770 | StartFinishSpan |
netcoreapp3.1 | 606ns | 0.158ns | 0.592ns | 0.00728 | 0 | 0 | 536 B |
#4770 | StartFinishSpan |
net472 | 601ns | 0.274ns | 0.988ns | 0.0853 | 0 | 0 | 538 B |
#4770 | StartFinishScope |
net6.0 | 541ns | 0.198ns | 0.768ns | 0.00913 | 0 | 0 | 656 B |
#4770 | StartFinishScope |
netcoreapp3.1 | 760ns | 0.31ns | 1.2ns | 0.00903 | 0 | 0 | 656 B |
#4770 | StartFinishScope |
net472 | 872ns | 0.42ns | 1.63ns | 0.098 | 0 | 0 | 618 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 608ns | 0.233ns | 0.901ns | 0.00936 | 0 | 0 | 656 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 756ns | 0.504ns | 1.95ns | 0.00882 | 0 | 0 | 656 B |
master | RunOnMethodBegin |
net472 | 945ns | 0.218ns | 0.786ns | 0.0979 | 0 | 0 | 618 B |
#4770 | RunOnMethodBegin |
net6.0 | 634ns | 0.441ns | 1.59ns | 0.00923 | 0 | 0 | 656 B |
#4770 | RunOnMethodBegin |
netcoreapp3.1 | 818ns | 1.44ns | 5.56ns | 0.0089 | 0 | 0 | 656 B |
#4770 | RunOnMethodBegin |
net472 | 945ns | 1.03ns | 3.98ns | 0.0978 | 0 | 0 | 618 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm slightly confused by this:
- AFAICT, we always overwrite any manually-added spans by injecting our own.
- In that case, shouldn't it always be safe to use the headers we find?
I'm assuming there's some edge case where we don't inject the headers, but I'm not sure what that is?
Also, this is a "bad practice" by the customer right, it feels like in an OTel world this would be a great way to get some really weird traces too?
internal static class HeadersInjectedCache | ||
{ | ||
private static readonly object InjectedValue = new(); | ||
private static readonly ConditionalWeakTable<WebHeaderCollection, object> Cache = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to think you're a shill for "Big ConditionalWeakTable"🕵️♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too late for me, I can't turn back now 😂
...iler/AutoInstrumentation/Http/WebRequest/HttpWebRequest_BeginGetRequestStream_Integration.cs
Show resolved
Hide resolved
Unfortunately, we do not always overwrite manually-added distributed tracing headers. With So the error case looks like this:
In the end, two spans are reported to Datadog with a matching TraceId and SpanId, causing ✨ problems ✨ |
Ok, I had totally missed that you don't have to call OK, so just to confirm, the only problematic scenario is:
Thanks, rubber ducking over! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work + nice tests!
…ted tracing headers to their WebRequest headers. This will cause the following WebRequest integrations to incorrectly use the existing distributed tracing headers: - GetResponse - GetResponseAsync
…a static, self-cleansing cache to know when we injected distributed tracing headers into the WebRequest (not the application), so WebRequest.GetResponse and WebRequest.GetResponseAsync can use the correct trace context.
d2fea00
to
6fe933f
Compare
Summary of changes
When we pre-emptively add distributed tracing headers in
WebRequest.BeginGetRequestStream
,WebRequest.BeginGetResponse
, andWebRequest.GetRequestStream
, add the headers to a cache to signal to the later integrations that we had last updated the headers and we are responsible for the distributed tracing headers in the WebRequest object.When we check for the pre-emptively added distributed tracing headers in
WebRequest.EndGetResponse
,WebRequest.GetResponse
andWebRequest.GetResponseAsync
, first check a cache to know whether we added distributed tracing headers to the WebRequest object. If present, then continue to use the injected trace context. Otherwise, we create a new context.Reason for change
In our
GetResponse
integrations, we check the WebRequest headers to see if we had to pre-emptively add distributed tracing headers before the headers became locked, and if they are present then we use that context for the newly created WebRequest span.The problem with this, as seen in a user issue, is that some applications may proxy incoming headers to the outbound HTTP requests. Before this change, our handling would cause our integrations for
WebRequest.EndGetResponse
,WebRequest.GetResponse
, andWebRequest.GetResponseAsync
to re-use the trace context in the headers for the WebRequest span, which results in the server span and the outgoing HTTP span having the same trace-id and span-id. After the fix, we only re-use the trace context when we know that we had last updated the headers.Implementation details
For our cache, we use (surprise!) a ConditionalWeakTable to keep track of request headers that we injected a trace context into. We use this data structure because it will remove stale entries when the keys are no longer referenced. In the
GetResponse
integrations we only use the trace context from the request headers if our automatic instrumentation had injected the headers.Test coverage
Update the WebRequest test application to manually add distributed tracing headers, and rewrite the integration tests to collect all spans from the application and assert that there are no duplicate trace-id/span-id combinations. Before the product change, we would see duplicates from the
GetResponse
andGetResponseAsync
calls.Other details
N/A