Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in manual instrumentation #6330

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

andrewlock
Copy link
Member

Summary of changes

Fixes the Tracer.Settings getter in manual instrumentation

Reason for change

While investigating something else, I noticed that the Tracer.Settings getter doesn't work correctly if the tracer has been reconfigured. The previous implementation assumed that the automatic Tracer instance is tied to a given ImmutableTracerSettings object, but that's not the case. It's the TracerManager that's tied to a settings object, but in "normal" usage, the Tracer instance isn't tied to a single TracerManager, and hence isn't tied to a settings object. e.g

Tracer.Instance.ActiveScope;// Trigger the client-side creation of the `Tracer` object with default settings

// customer manually re-configures tracer
var settings = TracerSettings.FromDefaultSources();
settings.ServiceName = "updated-name";
settings.Environment = "updated-env";
settings.GlobalTags = new Dictionary<string, string> { { "Updated-key", "Updated Value" } };
Tracer.Configure(settings); // update Tracer.Instance

// Fetch the settings back
var immutable = Tracer.Instance.Settings;
Debug.Assert(immutable.ServiceName == settings.ServiceName); // fails
Debug.Assert(immutable.Environment == settings.Environment ); // fails
// The settings have not changed from the _original_ values

Implementation details

Instead of populating a settings object when creating the manual Tracer instance, create it when the customer explicitly calls Tracer.Settings. This is likely to be rare (will add telemetry in a followup PR) but we also don't want to create the dictionary transport every time they call it. To work around this, we track the "current" (automatic) ImmutableTracerSettings in a local field and pass that to the automatic side, so that it can check whether the settings need to be updated or not.

Test coverage

Added additional assertions to the manual instrumentation tests, which fail prior to these changes

Other details

This removes an instrumentation point (and adds another) but we keep the instrumentation around for back-compat reasons

@andrewlock andrewlock added type:bug area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) labels Nov 21, 2024
@andrewlock andrewlock requested review from a team as code owners November 21, 2024 13:56
@andrewlock
Copy link
Member Author

andrewlock commented Nov 21, 2024

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (72ms)  : 63, 81
     .   : milestone, 72,
    master - mean (71ms)  : 64, 78
     .   : milestone, 71,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (1,111ms)  : 1091, 1131
     .   : milestone, 1111,
    master - mean (1,109ms)  : 1083, 1135
     .   : milestone, 1109,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (109ms)  : 106, 111
     .   : milestone, 109,
    master - mean (109ms)  : 106, 112
     .   : milestone, 109,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (769ms)  : 753, 785
     .   : milestone, 769,
    master - mean (772ms)  : 752, 791
     .   : milestone, 772,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (92ms)  : 90, 93
     .   : milestone, 92,
    master - mean (92ms)  : 91, 94
     .   : milestone, 92,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (719ms)  : 704, 733
     .   : milestone, 719,
    master - mean (726ms)  : 711, 741
     .   : milestone, 726,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (191ms)  : 187, 194
     .   : milestone, 191,
    master - mean (192ms)  : 186, 198
     .   : milestone, 192,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (1,214ms)  : 1194, 1233
     .   : milestone, 1214,
    master - mean (1,214ms)  : 1192, 1237
     .   : milestone, 1214,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (277ms)  : 271, 283
     .   : milestone, 277,
    master - mean (278ms)  : 272, 284
     .   : milestone, 278,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (943ms)  : 926, 959
     .   : milestone, 943,
    master - mean (949ms)  : 933, 965
     .   : milestone, 949,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6330) - mean (266ms)  : 262, 269
     .   : milestone, 266,
    master - mean (266ms)  : 262, 270
     .   : milestone, 266,

    section CallTarget+Inlining+NGEN
    This PR (6330) - mean (927ms)  : 907, 946
     .   : milestone, 927,
    master - mean (931ms)  : 914, 948
     .   : milestone, 931,

Loading

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Nov 21, 2024

Datadog Report

Branch report: andrew/ci/manual-instrumentation-flake
Commit report: ad3394e
Test service: dd-trace-dotnet

✅ 0 Failed, 449261 Passed, 2723 Skipped, 18h 50m 34.21s Total Time

@andrewlock andrewlock force-pushed the andrew/ci/manual-instrumentation-flake branch from ee4d15f to ad3394e Compare November 22, 2024 13:32
@andrewlock andrewlock changed the title Fix manual instrumentation instrumentation Fix bug in manual instrumentation Nov 22, 2024
// seeing as the whole _settings object is going to be thrown away and
// repopulated if the customer calls Settings. Kind of annoying...
// so to avoid the extra work/allocation we check whether the initial values contains the
// agent key, as an easy way of detecting if we're running in this version-conflict mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why AgentUriKey? Is there something special about that settings? Should you include that in this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's nothing special about it 😄 It's just one that we already support and probably will continue to support for all future versions

Copy link
Contributor

@bouwkast bouwkast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

@andrewlock andrewlock merged commit 9618050 into master Nov 22, 2024
75 of 79 checks passed
@andrewlock andrewlock deleted the andrew/ci/manual-instrumentation-flake branch November 22, 2024 17:22
@github-actions github-actions bot added this to the vNext-v3 milestone Nov 22, 2024
andrewlock added a commit that referenced this pull request Nov 25, 2024
## Summary of changes

Add a mutex to try to avoid flake

## Reason for change

We see occasional flake in the manual + automatic instrumentation tests.
I don't entirely understand why this happens _other than_ it shows that
the httplistener span is getting serialized _after_ we have reconfigured
the tracer (because traces _aren't_ tied to a settings instance). This
is arguably a bug, but it's "known" to an extent. The problem is that
it's causing flake when you change the base service name.

> FWIW, we already "fixed" this same symptom once:
#4960


## Implementation details

Add a mutex to try to make sure the httplistener span is closed before
we close the request span

## Test coverage

Enabled debug logs so that we can get more details if this _does_ recur.

## Other details

Stacked on:
- #6330
veerbia pushed a commit that referenced this pull request Dec 16, 2024
## Summary of changes

Fixes the `Tracer.Settings` getter in manual instrumentation

## Reason for change

While investigating something else, I noticed that the `Tracer.Settings`
getter doesn't work correctly if the tracer has been reconfigured. The
previous implementation assumed that the automatic `Tracer` instance is
tied to a given `ImmutableTracerSettings` object, but that's not the
case. It's the `TracerManager` that's tied to a settings object, but in
"normal" usage, the `Tracer` instance _isn't_ tied to a single
`TracerManager`, and hence _isn't_ tied to a settings object. e.g

```csharp
Tracer.Instance.ActiveScope;// Trigger the client-side creation of the `Tracer` object with default settings

// customer manually re-configures tracer
var settings = TracerSettings.FromDefaultSources();
settings.ServiceName = "updated-name";
settings.Environment = "updated-env";
settings.GlobalTags = new Dictionary<string, string> { { "Updated-key", "Updated Value" } };
Tracer.Configure(settings); // update Tracer.Instance

// Fetch the settings back
var immutable = Tracer.Instance.Settings;
Debug.Assert(immutable.ServiceName == settings.ServiceName); // fails
Debug.Assert(immutable.Environment == settings.Environment ); // fails
// The settings have not changed from the _original_ values
```

## Implementation details

Instead of populating a settings object when creating the manual
`Tracer` instance, create it when the customer explicitly calls
`Tracer.Settings`. This is likely to be rare (will add telemetry in a
followup PR) but we also don't want to create the dictionary transport
every time they call it. To work around this, we track the "current"
(automatic) `ImmutableTracerSettings` in a local field and pass that to
the automatic side, so that it can check whether the settings need to be
updated or not.

## Test coverage

Added additional assertions to the manual instrumentation tests, which
fail prior to these changes

## Other details

This _removes_ an instrumentation point (and adds another) but we keep
the instrumentation around for back-compat reasons
veerbia pushed a commit that referenced this pull request Dec 16, 2024
## Summary of changes

Add a mutex to try to avoid flake

## Reason for change

We see occasional flake in the manual + automatic instrumentation tests.
I don't entirely understand why this happens _other than_ it shows that
the httplistener span is getting serialized _after_ we have reconfigured
the tracer (because traces _aren't_ tied to a settings instance). This
is arguably a bug, but it's "known" to an extent. The problem is that
it's causing flake when you change the base service name.

> FWIW, we already "fixed" this same symptom once:
#4960


## Implementation details

Add a mutex to try to make sure the httplistener span is closed before
we close the request span

## Test coverage

Enabled debug logs so that we can get more details if this _does_ recur.

## Other details

Stacked on:
- #6330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) type:bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants