[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770

zacharycmontoya · 2023-10-27T00:03:19Z

Summary of changes

When we pre-emptively add distributed tracing headers in WebRequest.BeginGetRequestStream, WebRequest.BeginGetResponse, and WebRequest.GetRequestStream, add the headers to a cache to signal to the later integrations that we had last updated the headers and we are responsible for the distributed tracing headers in the WebRequest object.

When we check for the pre-emptively added distributed tracing headers in WebRequest.EndGetResponse, WebRequest.GetResponse and WebRequest.GetResponseAsync, first check a cache to know whether we added distributed tracing headers to the WebRequest object. If present, then continue to use the injected trace context. Otherwise, we create a new context.

Reason for change

In our GetResponse integrations, we check the WebRequest headers to see if we had to pre-emptively add distributed tracing headers before the headers became locked, and if they are present then we use that context for the newly created WebRequest span.

The problem with this, as seen in a user issue, is that some applications may proxy incoming headers to the outbound HTTP requests. Before this change, our handling would cause our integrations for WebRequest.EndGetResponse, WebRequest.GetResponse, and WebRequest.GetResponseAsync to re-use the trace context in the headers for the WebRequest span, which results in the server span and the outgoing HTTP span having the same trace-id and span-id. After the fix, we only re-use the trace context when we know that we had last updated the headers.

Implementation details

For our cache, we use (surprise!) a ConditionalWeakTable to keep track of request headers that we injected a trace context into. We use this data structure because it will remove stale entries when the keys are no longer referenced. In the GetResponse integrations we only use the trace context from the request headers if our automatic instrumentation had injected the headers.

Test coverage

Update the WebRequest test application to manually add distributed tracing headers, and rewrite the integration tests to collect all spans from the application and assert that there are no duplicate trace-id/span-id combinations. Before the product change, we would see duplicates from the GetResponse and GetResponseAsync calls.

Other details

N/A

zacharycmontoya · 2023-10-27T00:04:22Z

tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/WebRequestTests.cs

-                agent.SpanFilters.Add(s => s.Type == SpanTypes.Http);
-                var spans = agent.WaitForSpans(expectedSpanCount).OrderBy(s => s.Start);
+                var allSpans = agent.WaitForSpans(expectedAllSpansCount).OrderBy(s => s.Start);
+                allSpans.Should().OnlyHaveUniqueItems(s => new { s.SpanId, s.TraceId });


Before the product change (and after we started collecting all spans from the application), we would see this error message from the test. Notice how the parent span and the WebRequest spans have the same SpanId.

Expected allSpans to only have unique items on new <>f__AnonymousType11`2(SpanId == s.SpanId, TraceId == s.TraceId), but items {TraceId: 3286320956883792616, SpanId: 7849642159726264990, Name: GetResponseWithDistributedTracingHeaders, Resource: GetResponseWithDistributedTracingHeaders, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 7849642159726264990, Name: http.client.request, Resource: GET localhost:49189/?/, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 12757857473955457543, Name: GetResponseAsyncWithDistributedTracingHeaders, Resource: GetResponseAsyncWithDistributedTracingHeaders, Service: Samples.WebRequest, TraceId: 3286320956883792616, SpanId: 12757857473955457543, Name: http.client.request, Resource: GET localhost:49189/?/, Service: Samples.WebRequest} are not unique.

datadog-ddstaging · 2023-10-27T00:25:43Z

Datadog Report

Branch report: zach/bug-fix/web-request-propagation-headers
Commit report: 6fe933f

❄️ dd-trace-dotnet: 0 Failed, 1 New Flaky, 298129 Passed, 942 Skipped, 36m 41.55s Wall Time

New Flaky Tests (1)

SubmitsTraces - Datadog.Trace.ClrProfiler.IntegrationTests.KafkaTests - Last Failure

Expand for error

 Expected allData to contain a single item matching x.IsRequestType("app-closing"), but no such item was found.
 Expected latestIntegrations not to be empty.
 Expected varintegration=latestIntegrations {empty} to contain key "Kafka".
 Expected integration.Enabled to be True because should only be enabled if we generate a span, but found False.
 Expected True because should only be auto-enabled if available, but found <null>.

andrewlock · 2023-10-27T00:28:13Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (72ms)  : 62, 83
     .   : milestone, 72,
    master - mean (71ms)  : 63, 78
     .   : milestone, 71,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (995ms)  : 976, 1014
     .   : milestone, 995,
    master - mean (991ms)  : 970, 1012
     .   : milestone, 991,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (106ms)  : 101, 110
     .   : milestone, 106,
    master - mean (106ms)  : 101, 110
     .   : milestone, 106,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (683ms)  : 665, 701
     .   : milestone, 683,
    master - mean (683ms)  : 663, 702
     .   : milestone, 683,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (90ms)  : 87, 94
     .   : milestone, 90,
    master - mean (90ms)  : 88, 91
     .   : milestone, 90,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (652ms)  : 629, 675
     .   : milestone, 652,
    master - mean (650ms)  : 631, 669
     .   : milestone, 650,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (188ms)  : 185, 191
     .   : milestone, 188,
    master - mean (188ms)  : 185, 190
     .   : milestone, 188,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (1,134ms)  : 1111, 1157
     .   : milestone, 1134,
    master - mean (1,137ms)  : 1115, 1160
     .   : milestone, 1137,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (272ms)  : 267, 276
     .   : milestone, 272,
    master - mean (271ms)  : 267, 275
     .   : milestone, 271,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (1,099ms)  : 1073, 1126
     .   : milestone, 1099,
    master - mean (1,103ms)  : 1076, 1129
     .   : milestone, 1103,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4770) - mean (263ms)  : 260, 266
     .   : milestone, 263,
    master - mean (262ms)  : 258, 267
     .   : milestone, 262,

    section CallTarget+Inlining+NGEN
    This PR (4770) - mean (1,066ms)  : 1043, 1088
     .   : milestone, 1066,
    master - mean (1,068ms)  : 1044, 1091
     .   : milestone, 1068,

andrewlock · 2023-10-27T01:05:31Z

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4770) (10.861M)   : 0, 10860898
    master (10.293M)   : 0, 10292812
    benchmarks/2.38.0 (11.867M)   : 0, 11866956
    benchmarks/2.9.0 (10.958M)   : 0, 10957550

    section Automatic
    This PR (4770) (7.846M)   : 0, 7845545
    master (7.626M)   : 0, 7626259
    benchmarks/2.38.0 (8.176M)   : 0, 8175636
    benchmarks/2.9.0 (7.908M)   : 0, 7907971

    section Trace stats
    This PR (4770) (8.038M)   : 0, 8037599
    master (7.981M)   : 0, 7980764
    benchmarks/2.38.0 (8.450M)   : 0, 8450370

    section Manual
    This PR (4770) (9.620M)   : 0, 9619892
    master (9.696M)   : 0, 9696455
    benchmarks/2.38.0 (10.334M)   : 0, 10334368

    section Manual + Automatic
    This PR (4770) (7.243M)   : 0, 7243143
    master (7.342M)   : 0, 7342321
    benchmarks/2.38.0 (7.750M)   : 0, 7750484

    section Version Conflict
    This PR (4770) (6.564M)   : 0, 6563856
    master (6.511M)   : 0, 6510524
    benchmarks/2.38.0 (7.137M)   : 0, 7136691

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4770) (9.515M)   : 0, 9515163
    master (9.373M)   : 0, 9372820
    benchmarks/2.38.0 (9.548M)   : 0, 9548121
    benchmarks/2.9.0 (9.589M)   : 0, 9589306

    section Automatic
    This PR (4770) (6.628M)   : 0, 6627651
    master (6.736M)   : 0, 6736494
    benchmarks/2.38.0 (6.747M)   : 0, 6747227

    section Trace stats
    This PR (4770) (7.005M)   : 0, 7005269
    master (6.838M)   : 0, 6838459
    benchmarks/2.38.0 (6.815M)   : 0, 6814846

    section Manual
    This PR (4770) (8.446M)   : 0, 8445701
    master (8.359M)   : 0, 8359027
    benchmarks/2.38.0 (8.263M)   : 0, 8263131

    section Manual + Automatic
    This PR (4770) (6.237M)   : 0, 6236969
    master (6.341M)   : 0, 6341109
    benchmarks/2.38.0 (6.275M)   : 0, 6275411

    section Version Conflict
    This PR (4770) (5.625M)   : 0, 5625412
    master (5.798M)   : 0, 5797802
    benchmarks/2.38.0 (5.670M)   : 0, 5669744

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4770) (9.602M)   : 0, 9601520
    master (9.505M)   : 0, 9504792
    benchmarks/2.38.0 (9.007M)   : 0, 9007066
    benchmarks/2.9.0 (9.595M)   : 0, 9594704

    section Automatic
    This PR (4770) (6.686M)   : 0, 6685823
    master (6.739M)   : 0, 6739142
    benchmarks/2.38.0 (6.423M)   : 0, 6422756
    benchmarks/2.9.0 (7.049M)   : 0, 7048800

    section Trace stats
    This PR (4770) (7.050M)   : 0, 7050477
    master (7.014M)   : 0, 7013989
    benchmarks/2.38.0 (6.721M)   : 0, 6720838

    section Manual
    This PR (4770) (8.427M)   : 0, 8426964
    master (8.381M)   : 0, 8380765
    benchmarks/2.38.0 (7.977M)   : 0, 7977494

    section Manual + Automatic
    This PR (4770) (6.452M)   : 0, 6451609
    master (6.515M)   : 0, 6515344
    benchmarks/2.38.0 (6.349M)   : 0, 6349066

    section Version Conflict
    This PR (4770) (5.949M)   : 0, 5949452
    master (6.043M)   : 0, 6043023
    benchmarks/2.38.0 (5.682M)   : 0, 5681654

gantt
    title Throughput Linux x64 (ASM) (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    master (7.912M)   : 0, 7912308
    benchmarks/2.38.0 (7.578M)   : 0, 7577544
    benchmarks/2.9.0 (7.477M)   : 0, 7476547

    section No attack
    master (2.269M)   : 0, 2269345
    benchmarks/2.38.0 (2.184M)   : 0, 2184279
    benchmarks/2.9.0 (3.151M)   : 0, 3151044

    section Attack
    master (1.782M)   : 0, 1782477
    benchmarks/2.38.0 (1.711M)   : 0, 1710542
    benchmarks/2.9.0 (2.361M)   : 0, 2361022

    section Blocking
    master (3.615M)   : 0, 3614991
    benchmarks/2.38.0 (3.486M)   : 0, 3486124

    section IAST default
    master (7.079M)   : 0, 7078508

    section IAST full
    master (6.347M)   : 0, 6346955

    section Base vuln
    master (0.957M)   : 0, 956568

    section IAST vuln
    master (0.931M)   : 0, 930647

andrewlock · 2023-10-27T01:44:08Z

Benchmarks Report 🐌

Benchmarks for #4770 compared to master:

1 benchmarks are faster, with geometric mean 1.179
1 benchmarks are slower, with geometric mean 1.123
All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`StartStopWithChild`	net6.0	8.12μs	44.2ns	272ns	0.021	0.00839	0	7.29 KB
master	`StartStopWithChild`	netcoreapp3.1	10μs	53.3ns	287ns	0.024	0.0048	0	7.39 KB
master	`StartStopWithChild`	net472	15.7μs	47ns	182ns	1.29	0.335	0.101	7.66 KB
#4770	`StartStopWithChild`	net6.0	8.52μs	44.3ns	221ns	0.0217	0.00869	0	7.29 KB
#4770	`StartStopWithChild`	netcoreapp3.1	10.5μs	55.5ns	299ns	0.0255	0.0102	0	7.38 KB
#4770	`StartStopWithChild`	net472	15.8μs	36.8ns	138ns	1.28	0.331	0.102	7.66 KB

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	479μs	336ns	1.3μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	653μs	272ns	1.05μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	783μs	447ns	1.67μs	0.391	3.3 KB
#4770	`WriteAndFlushEnrichedTraces`	net6.0	494μs	145ns	521ns	0	2.7 KB
#4770	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	649μs	191ns	715ns	0	2.7 KB
#4770	`WriteAndFlushEnrichedTraces`	net472	795μs	378ns	1.41μs	0.396	3.3 KB

Benchmarks.Trace.Asm.AppSecBodyBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`AllCycleSimpleBody`	net6.0	44.8μs	20.9ns	81.1ns	0.022	0	2.03 KB
master	`AllCycleSimpleBody`	netcoreapp3.1	45.4μs	83.7ns	324ns	0.0223	0	2.01 KB
master	`AllCycleSimpleBody`	net472	47μs	88.6ns	343ns	0.325	0	2.08 KB
master	`AllCycleMoreComplexBody`	net6.0	228μs	84ns	325ns	0.114	0	8.63 KB
master	`AllCycleMoreComplexBody`	netcoreapp3.1	233μs	126ns	487ns	0.117	0	8.52 KB
master	`AllCycleMoreComplexBody`	net472	238μs	100ns	388ns	1.31	0	8.7 KB
master	`ObjectExtractorSimpleBody`	net6.0	120ns	0.0871ns	0.337ns	0.00397	0	280 B
master	`ObjectExtractorSimpleBody`	netcoreapp3.1	178ns	0.141ns	0.545ns	0.00379	0	272 B
master	`ObjectExtractorSimpleBody`	net472	150ns	0.238ns	0.921ns	0.0446	0	281 B
master	`ObjectExtractorMoreComplexBody`	net6.0	3μs	1.07ns	4.01ns	0.054	0	3.88 KB
master	`ObjectExtractorMoreComplexBody`	netcoreapp3.1	4.13μs	1.52ns	5.9ns	0.0516	0	3.78 KB
master	`ObjectExtractorMoreComplexBody`	net472	4.1μs	3.54ns	13.3ns	0.618	0.00627	3.89 KB
#4770	`AllCycleSimpleBody`	net6.0	43.6μs	113ns	436ns	0.022	0	2.03 KB
#4770	`AllCycleSimpleBody`	netcoreapp3.1	45.7μs	33.3ns	125ns	0.0225	0	2.01 KB
#4770	`AllCycleSimpleBody`	net472	46.6μs	45.9ns	165ns	0.308	0	2.08 KB
#4770	`AllCycleMoreComplexBody`	net6.0	228μs	75.6ns	273ns	0.114	0	8.63 KB
#4770	`AllCycleMoreComplexBody`	netcoreapp3.1	234μs	82.4ns	319ns	0.117	0	8.52 KB
#4770	`AllCycleMoreComplexBody`	net472	240μs	83.9ns	314ns	1.31	0	8.7 KB
#4770	`ObjectExtractorSimpleBody`	net6.0	119ns	0.111ns	0.416ns	0.00397	0	280 B
#4770	`ObjectExtractorSimpleBody`	netcoreapp3.1	178ns	0.0647ns	0.233ns	0.00369	0	272 B
#4770	`ObjectExtractorSimpleBody`	net472	146ns	0.15ns	0.58ns	0.0446	0	281 B
#4770	`ObjectExtractorMoreComplexBody`	net6.0	3.07μs	2.15ns	8.05ns	0.0539	0	3.88 KB
#4770	`ObjectExtractorMoreComplexBody`	netcoreapp3.1	4.12μs	1.43ns	5.53ns	0.0514	0	3.78 KB
#4770	`ObjectExtractorMoreComplexBody`	net472	4.17μs	2.48ns	9.59ns	0.618	0.00622	3.89 KB

Benchmarks.Trace.Asm.AppSecWafBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`RunWaf(args=NestedMap (10))`	net6.0	12.9μs	2.1ns	7.57ns	0.129	0	9.42 KB
master	`RunWaf(args=NestedMap (10))`	netcoreapp3.1	19.6μs	41.5ns	155ns	0.127	0	9.42 KB
master	`RunWaf(args=NestedMap (10))`	net472	28.4μs	20.4ns	79ns	1.5	0	9.48 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	net6.0	76.4μs	127ns	492ns	0.191	0	15.77 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	netcoreapp3.1	85.3μs	83ns	311ns	0.214	0	15.72 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	net472	98.3μs	66.2ns	256ns	2.54	0	16.04 KB
master	`RunWaf(args=NestedMap (100))`	net6.0	24μs	8.51ns	31.8ns	0.276	0	19.66 KB
master	`RunWaf(args=NestedMap (100))`	netcoreapp3.1	39.4μs	24ns	89.8ns	0.275	0	20.42 KB
master	`RunWaf(args=NestedMap (100))`	net472	55.7μs	114ns	441ns	3.26	0.0276	20.63 KB
master	`RunWafWithAttack(args=Neste(...)tack) [23])`	net6.0	91μs	224ns	869ns	0.359	0	26.01 KB
master	`RunWafWithAttack(args=Neste(...)tack) [23])`	netcoreapp3.1	109μs	230ns	860ns	0.329	0	26.72 KB
master	`RunWafWithAttack(args=Neste(...)tack) [23])`	net472	134μs	80.7ns	312ns	4.29	0.066	27.19 KB
master	`RunWaf(args=NestedMap (20))`	net6.0	24.1μs	5ns	18.7ns	0.264	0	19.4 KB
master	`RunWaf(args=NestedMap (20))`	netcoreapp3.1	38.2μs	11.7ns	43.9ns	0.267	0	19.84 KB
master	`RunWaf(args=NestedMap (20))`	net472	56.1μs	18.3ns	63.5ns	3.17	0.0562	20.04 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	net6.0	90.5μs	205ns	792ns	0.358	0	25.74 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	netcoreapp3.1	108μs	113ns	436ns	0.323	0	26.14 KB
master	`RunWafWithAttack(args=Neste(...)tack) [22])`	net472	130μs	196ns	732ns	4.17	0.0641	26.6 KB
#4770	`RunWaf(args=NestedMap (10))`	net6.0	12.6μs	4.85ns	18.8ns	0.132	0	9.42 KB
#4770	`RunWaf(args=NestedMap (10))`	netcoreapp3.1	19.5μs	7.05ns	26.4ns	0.126	0	9.42 KB
#4770	`RunWaf(args=NestedMap (10))`	net472	28.6μs	9.79ns	35.3ns	1.5	0	9.48 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	net6.0	75.7μs	43.5ns	163ns	0.19	0	15.77 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	netcoreapp3.1	85.2μs	207ns	801ns	0.214	0	15.72 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	net472	97.6μs	192ns	743ns	2.52	0	16.04 KB
#4770	`RunWaf(args=NestedMap (100))`	net6.0	24μs	9.48ns	35.5ns	0.274	0	19.66 KB
#4770	`RunWaf(args=NestedMap (100))`	netcoreapp3.1	38.2μs	10.3ns	39.8ns	0.268	0	20.42 KB
#4770	`RunWaf(args=NestedMap (100))`	net472	56.8μs	64.7ns	251ns	3.27	0.0282	20.63 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [23])`	net6.0	91.9μs	45.2ns	175ns	0.363	0	26.01 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [23])`	netcoreapp3.1	109μs	86.6ns	312ns	0.326	0	26.72 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [23])`	net472	132μs	87.8ns	317ns	4.31	0.0652	27.19 KB
#4770	`RunWaf(args=NestedMap (20))`	net6.0	23.7μs	8.89ns	32ns	0.273	0	19.4 KB
#4770	`RunWaf(args=NestedMap (20))`	netcoreapp3.1	37.6μs	9.69ns	37.5ns	0.262	0	19.84 KB
#4770	`RunWaf(args=NestedMap (20))`	net472	53.5μs	20.7ns	77.6ns	3.18	0.0535	20.04 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	net6.0	88.7μs	53.5ns	207ns	0.36	0	25.74 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	netcoreapp3.1	107μs	198ns	767ns	0.324	0	26.14 KB
#4770	`RunWafWithAttack(args=Neste(...)tack) [22])`	net472	130μs	128ns	461ns	4.21	0.0648	26.6 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	167μs	165ns	637ns	0.25	18.04 KB
master	`SendRequest`	netcoreapp3.1	189μs	273ns	985ns	0.19	20.2 KB
master	`SendRequest`	net472	0.000326ns	0.000183ns	0.00071ns	0	0 b
#4770	`SendRequest`	net6.0	169μs	107ns	413ns	0.17	18.04 KB
#4770	`SendRequest`	netcoreapp3.1	189μs	230ns	892ns	0.189	20.2 KB
#4770	`SendRequest`	net472	0ns	0ns	0ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	534μs	362ns	1.3μs	0.539	0	0	41.54 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	632μs	781ns	3.02μs	0.321	0	0	41.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	853μs	3.69μs	14.3μs	8.45	2.53	0.422	53.24 KB
#4770	`WriteAndFlushEnrichedTraces`	net6.0	551μs	993ns	3.85μs	0.561	0	0	41.73 KB
#4770	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	667μs	1.02μs	3.95μs	0.329	0	0	41.74 KB
#4770	`WriteAndFlushEnrichedTraces`	net472	825μs	3.73μs	16.3μs	8.17	2.45	0.408	53.27 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteNonQuery`	net6.0	1.05μs	0.601ns	2.33ns	0.0105	768 B
master	`ExecuteNonQuery`	netcoreapp3.1	1.31μs	0.749ns	2.9ns	0.0106	768 B
master	`ExecuteNonQuery`	net472	1.67μs	3.7ns	14.3ns	0.115	730 B
#4770	`ExecuteNonQuery`	net6.0	1.02μs	0.468ns	1.81ns	0.0108	768 B
#4770	`ExecuteNonQuery`	netcoreapp3.1	1.36μs	1.44ns	5.58ns	0.0102	768 B
#4770	`ExecuteNonQuery`	net472	1.59μs	0.757ns	2.93ns	0.116	730 B

Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`CallElasticsearch`	net6.0	1.22μs	0.706ns	2.74ns	0.0128	936 B
master	`CallElasticsearch`	netcoreapp3.1	1.44μs	0.658ns	2.46ns	0.0131	936 B
master	`CallElasticsearch`	net472	2.47μs	0.925ns	3.58ns	0.151	955 B
master	`CallElasticsearchAsync`	net6.0	1.31μs	0.838ns	3.25ns	0.013	912 B
master	`CallElasticsearchAsync`	netcoreapp3.1	1.63μs	0.757ns	2.93ns	0.013	984 B
master	`CallElasticsearchAsync`	net472	2.55μs	0.875ns	3.27ns	0.161	1.01 KB
#4770	`CallElasticsearch`	net6.0	1.15μs	0.291ns	1.05ns	0.0133	936 B
#4770	`CallElasticsearch`	netcoreapp3.1	1.5μs	0.488ns	1.83ns	0.012	936 B
#4770	`CallElasticsearch`	net472	2.35μs	0.681ns	2.64ns	0.151	955 B
#4770	`CallElasticsearchAsync`	net6.0	1.31μs	1.4ns	5.06ns	0.0124	912 B
#4770	`CallElasticsearchAsync`	netcoreapp3.1	1.51μs	0.404ns	1.51ns	0.0128	984 B
#4770	`CallElasticsearchAsync`	net472	2.54μs	0.803ns	3.11ns	0.16	1.01 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net6.0	1.16μs	1.06ns	3.98ns	0.0128	912 B
master	`ExecuteAsync`	netcoreapp3.1	1.49μs	1.01ns	3.77ns	0.012	912 B
master	`ExecuteAsync`	net472	1.68μs	2.48ns	9.62ns	0.139	875 B
#4770	`ExecuteAsync`	net6.0	1.11μs	0.794ns	2.97ns	0.0126	912 B
#4770	`ExecuteAsync`	netcoreapp3.1	1.49μs	1.73ns	6.72ns	0.0127	912 B
#4770	`ExecuteAsync`	net472	1.67μs	1.35ns	5.22ns	0.139	875 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	3.73μs	1.27ns	4.75ns	0.026	1.9 KB
master	`SendAsync`	netcoreapp3.1	4.31μs	2.94ns	11.4ns	0.0325	2.43 KB
master	`SendAsync`	net472	7.09μs	15.6ns	60.4ns	0.473	2.99 KB
#4770	`SendAsync`	net6.0	3.75μs	4.98ns	18.6ns	0.0264	1.9 KB
#4770	`SendAsync`	netcoreapp3.1	4.47μs	1.44ns	5.19ns	0.0334	2.43 KB
#4770	`SendAsync`	net472	6.97μs	2.15ns	8.03ns	0.474	2.99 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.36μs	0.796ns	3.08ns	0.0217	1.57 KB
master	`EnrichedLog`	netcoreapp3.1	2μs	0.585ns	2.19ns	0.021	1.57 KB
master	`EnrichedLog`	net472	2.3μs	4.03ns	15.6ns	0.237	1.49 KB
#4770	`EnrichedLog`	net6.0	1.39μs	0.664ns	2.49ns	0.0223	1.57 KB
#4770	`EnrichedLog`	netcoreapp3.1	2.05μs	1.03ns	3.84ns	0.0215	1.57 KB
#4770	`EnrichedLog`	net472	2.32μs	2.07ns	8.01ns	0.237	1.49 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	112μs	77.2ns	299ns	0.0559	0	4.21 KB
master	`EnrichedLog`	netcoreapp3.1	117μs	125ns	482ns	0.0582	0	4.21 KB
master	`EnrichedLog`	net472	147μs	75.7ns	283ns	0.668	0.223	4.38 KB
#4770	`EnrichedLog`	net6.0	114μs	106ns	409ns	0	0	4.21 KB
#4770	`EnrichedLog`	netcoreapp3.1	118μs	149ns	577ns	0.06	0	4.21 KB
#4770	`EnrichedLog`	net472	149μs	156ns	604ns	0.676	0.225	4.38 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.84μs	1.03ns	4ns	0.0295	2.13 KB
master	`EnrichedLog`	netcoreapp3.1	4.01μs	1.1ns	4.26ns	0.0282	2.13 KB
master	`EnrichedLog`	net472	4.48μs	0.921ns	3.44ns	0.306	1.93 KB
#4770	`EnrichedLog`	net6.0	2.9μs	1.13ns	4.37ns	0.0304	2.13 KB
#4770	`EnrichedLog`	netcoreapp3.1	3.96μs	1.19ns	4.63ns	0.0278	2.13 KB
#4770	`EnrichedLog`	net472	4.51μs	3.13ns	12.1ns	0.307	1.93 KB

Benchmarks.Trace.RedisBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #4770

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.RedisBenchmark.SendReceive‑net6.0	1.123	1,247.38	1,400.52

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendReceive`	net6.0	1.25μs	0.731ns	2.83ns	0.0156	1.1 KB
master	`SendReceive`	netcoreapp3.1	1.64μs	1.17ns	4.04ns	0.0147	1.1 KB
master	`SendReceive`	net472	2.01μs	0.967ns	3.75ns	0.177	1.12 KB
#4770	`SendReceive`	net6.0	1.4μs	0.438ns	1.58ns	0.0154	1.1 KB
#4770	`SendReceive`	netcoreapp3.1	1.65μs	0.524ns	2.03ns	0.0149	1.1 KB
#4770	`SendReceive`	net472	1.94μs	1.13ns	4.36ns	0.176	1.12 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.64μs	1.42ns	5.52ns	0.0211	1.53 KB
master	`EnrichedLog`	netcoreapp3.1	3.6μs	0.896ns	3.35ns	0.0215	1.58 KB
master	`EnrichedLog`	net472	4.01μs	1.6ns	6.21ns	0.311	1.96 KB
#4770	`EnrichedLog`	net6.0	2.71μs	0.785ns	3.04ns	0.0217	1.53 KB
#4770	`EnrichedLog`	netcoreapp3.1	3.59μs	1.6ns	6.21ns	0.0216	1.58 KB
#4770	`EnrichedLog`	net472	4.05μs	0.915ns	3.3ns	0.31	1.96 KB

Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #4770

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net6.0	1.179	460.37	390.53

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	460ns	0.344ns	1.33ns	0.00741	536 B
master	`StartFinishSpan`	netcoreapp3.1	577ns	0.276ns	1.07ns	0.00726	536 B
master	`StartFinishSpan`	net472	650ns	0.335ns	1.3ns	0.0853	538 B
master	`StartFinishScope`	net6.0	492ns	0.146ns	0.565ns	0.00925	656 B
master	`StartFinishScope`	netcoreapp3.1	774ns	0.3ns	1.12ns	0.00892	656 B
master	`StartFinishScope`	net472	809ns	0.287ns	1.11ns	0.0978	618 B
#4770	`StartFinishSpan`	net6.0	391ns	0.121ns	0.47ns	0.00762	536 B
#4770	`StartFinishSpan`	netcoreapp3.1	606ns	0.158ns	0.592ns	0.00728	536 B
#4770	`StartFinishSpan`	net472	601ns	0.274ns	0.988ns	0.0853	538 B
#4770	`StartFinishScope`	net6.0	541ns	0.198ns	0.768ns	0.00913	656 B
#4770	`StartFinishScope`	netcoreapp3.1	760ns	0.31ns	1.2ns	0.00903	656 B
#4770	`StartFinishScope`	net472	872ns	0.42ns	1.63ns	0.098	618 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	608ns	0.233ns	0.901ns	0.00936	656 B
master	`RunOnMethodBegin`	netcoreapp3.1	756ns	0.504ns	1.95ns	0.00882	656 B
master	`RunOnMethodBegin`	net472	945ns	0.218ns	0.786ns	0.0979	618 B
#4770	`RunOnMethodBegin`	net6.0	634ns	0.441ns	1.59ns	0.00923	656 B
#4770	`RunOnMethodBegin`	netcoreapp3.1	818ns	1.44ns	5.56ns	0.0089	656 B
#4770	`RunOnMethodBegin`	net472	945ns	1.03ns	3.98ns	0.0978	618 B

andrewlock

I think I'm slightly confused by this:

AFAICT, we always overwrite any manually-added spans by injecting our own.
In that case, shouldn't it always be safe to use the headers we find?

I'm assuming there's some edge case where we don't inject the headers, but I'm not sure what that is?

Also, this is a "bad practice" by the customer right, it feels like in an OTel world this would be a great way to get some really weird traces too?

andrewlock · 2023-10-27T10:28:36Z

...er/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Http/WebRequest/HeadersInjectedCache.cs

+internal static class HeadersInjectedCache
+{
+    private static readonly object InjectedValue = new();
+    private static readonly ConditionalWeakTable<WebHeaderCollection, object> Cache = new();


I'm starting to think you're a shill for "Big ConditionalWeakTable"🕵️‍♂️

It's too late for me, I can't turn back now 😂

...iler/AutoInstrumentation/Http/WebRequest/HttpWebRequest_BeginGetRequestStream_Integration.cs

zacharycmontoya · 2023-10-27T16:10:51Z

I think I'm slightly confused by this:

AFAICT, we always overwrite any manually-added spans by injecting our own.

In that case, shouldn't it always be safe to use the headers we find?

I'm assuming there's some edge case where we don't inject the headers, but I'm not sure what that is?

Unfortunately, we do not always overwrite manually-added distributed tracing headers. With WebRequest.GetResponse and WebRequest.GetResponseAsync we have to check if there are any existing distributed tracing headers, to get around the WebRequest GetRequestStream API's freezing the headers. If none of the other WebRequest API's are actually called, then we consume the ones set by the application.

So the error case looks like this:

A request enters a server application with distributed tracing headers
- Headers: x-datadog-trace-id:A,x-datadog-parent-id:B
Our automatic instrumentation creates a new ASP.NET span based on incoming distributed tracing headers
- Span: TraceId:A,SpanId:C
The server application creates a new HTTP request and propagates the incoming distributed tracing headers
- Headers: x-datadog-trace-id:A,x-datadog-parent-id:B
The server application calls WebRequest.GetResponse or WebRequest.GetResponseAsync to issue the downstream request which also triggers our automatic instrumentation to create a HTTP span
- Span: TraceId:A,SpanId:B

In the end, two spans are reported to Datadog with a matching TraceId and SpanId, causing ✨ problems ✨

andrewlock · 2023-11-01T09:21:06Z

The server application calls WebRequest.GetResponse or WebRequest.GetResponseAsync to issue the downstream request

Ok, I had totally missed that you don't have to call WebRequest.GetRequestStream() et al, so that part makes sense, thanks 👍

OK, so just to confirm, the only problematic scenario is:

The customer doesn't call WebRequest.GetRequestStream() (if they do, we inject the correct headers, and then they're frozen so the customer can't change them).
The customer injects the (incorrect) headers, and then calls...
WebRequest.GetResponse or WebRequest.GetResponseAsync. At the start of that we try to retrieve the headers from the request and 💥

Thanks, rubber ducking over! 😄

andrewlock

Nice work + nice tests!

…ted tracing headers to their WebRequest headers. This will cause the following WebRequest integrations to incorrectly use the existing distributed tracing headers: - GetResponse - GetResponseAsync

…a static, self-cleansing cache to know when we injected distributed tracing headers into the WebRequest (not the application), so WebRequest.GetResponse and WebRequest.GetResponseAsync can use the correct trace context.

zacharycmontoya added the type:bug label Oct 27, 2023

zacharycmontoya requested a review from a team as a code owner October 27, 2023 00:03

zacharycmontoya commented Oct 27, 2023

View reviewed changes

andrewlock reviewed Oct 27, 2023

View reviewed changes

lucaspimentel approved these changes Oct 30, 2023

View reviewed changes

andrewlock approved these changes Nov 1, 2023

View reviewed changes

zacharycmontoya added 3 commits November 1, 2023 08:42

Add test code to exhibit error when the application adds the distribu…

ab32bbd

…ted tracing headers to their WebRequest headers. This will cause the following WebRequest integrations to incorrectly use the existing distributed tracing headers: - GetResponse - GetResponseAsync

Update other affected integration tests

6fe933f

zacharycmontoya force-pushed the zach/bug-fix/web-request-propagation-headers branch from d2fea00 to 6fe933f Compare November 1, 2023 18:00

pierotibou approved these changes Nov 2, 2023

View reviewed changes

zacharycmontoya merged commit 7124515 into master Nov 2, 2023

zacharycmontoya deleted the zach/bug-fix/web-request-propagation-headers branch November 2, 2023 13:38

github-actions bot added this to the vNext milestone Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770

[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770

zacharycmontoya commented Oct 27, 2023

zacharycmontoya Oct 27, 2023 •

edited

Loading

datadog-ddstaging bot commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Slower ⚠️ in #4770

Raw results

Raw results

Faster 🎉 in #4770

Raw results

Raw results

andrewlock left a comment

andrewlock Oct 27, 2023

zacharycmontoya Oct 27, 2023

zacharycmontoya commented Oct 27, 2023 •

edited

Loading

andrewlock commented Nov 1, 2023 •

edited

Loading

andrewlock left a comment

[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770

[Tracing] Fix WebRequest bug with existing distributed tracing headers #4770

Conversation

zacharycmontoya commented Oct 27, 2023

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

zacharycmontoya Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

datadog-ddstaging bot commented Oct 27, 2023 • edited Loading

Datadog Report

New Flaky Tests (1)

andrewlock commented Oct 27, 2023 • edited Loading

Execution-Time Benchmarks Report ⏱️

andrewlock commented Oct 27, 2023 • edited Loading

Throughput/Crank Report:zap:

andrewlock commented Oct 27, 2023 • edited Loading

Benchmarks Report 🐌

Benchmark details

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Slower ⚠️ in #4770

Raw results

Raw results

Faster 🎉 in #4770

Raw results

Raw results

andrewlock left a comment

Choose a reason for hiding this comment

andrewlock Oct 27, 2023

Choose a reason for hiding this comment

zacharycmontoya Oct 27, 2023

Choose a reason for hiding this comment

zacharycmontoya commented Oct 27, 2023 • edited Loading

andrewlock commented Nov 1, 2023 • edited Loading

andrewlock left a comment

Choose a reason for hiding this comment

zacharycmontoya Oct 27, 2023 •

edited

Loading

datadog-ddstaging bot commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

andrewlock commented Oct 27, 2023 •

edited

Loading

zacharycmontoya commented Oct 27, 2023 •

edited

Loading

andrewlock commented Nov 1, 2023 •

edited

Loading