Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-2165: Add trace sampling for instrumented network requests #934

Merged
merged 3 commits into from
May 23, 2022

Conversation

0xnm
Copy link
Member

@0xnm 0xnm commented May 19, 2022

What does this PR do?

This change adds the trace sampling for auto-instrumented network requests (instrumentation via TracingInterceptor, RumInterceptor and DatadogInterceptor) by adding x-datadog-sampling-priority header to the outgoing requests which may have a value of 1 (keep) or 0 (drop). This header will later be parsed on the backend side and sampling decision will be respected for the services downstream.

Sampler will make a decision if request should be traced (by injecting tracing headers) or not. This will allow customer to control amount of traces created directly (via TracingInterceptor) or indirectly (via RUM2APM integration in case of DatadogInterceptor or RumInterceptor), which is necessary to align with a new billing model. Default trace sampling is set to 20%.

The sampling is applied only to the tracing part of the tracking, meaning that if, for example, trace sampling rate is 75% and 100 requests were made, in case of DatadogInterceptor or RumInterceptor it will be 100 RUM resources created, but only 75 spans roughly.

Also this change removes x-datadog-sampled header, which is not really used.

Generally, we can add the sampling priority information to the span in the following way:

  • If span is MutableSpan, we can use MutableSpan#setSamplingPriority
  • We can also use Span#setTag with values Tags.MANUAL_KEEP (or MANUAL_DROP)

The problem is that in the former case MutableSpan exists only for spans created by Datadog tracer, but we need also support non-Datadog tracers, so we cannot rely on this interface. In the latter case there is no way to read a tag from the span and it is also not available in the SpanContext interface.

So in this implementation we have to access x-datadog-sampling-priority header directly for reading/writing sampling decision instead of relying on the Span (and we need to add it explicitly anyway even if non-Datadog tracer is used, because in the end data is sent to the Datadog servers anyway).

Also this PR includes commit 2a08eae which is optional and can be discussed - it just makes chained interceptors respect sampling decision for the request which was made in the upstream interceptor.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@0xnm 0xnm requested review from a team as code owners May 19, 2022 09:56
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #934 (2a08eae) into release/1.13.0 (cd47e77) will decrease coverage by 0.09%.
The diff coverage is 98.46%.

@@                Coverage Diff                 @@
##           release/1.13.0     #934      +/-   ##
==================================================
- Coverage           83.04%   82.95%   -0.09%     
==================================================
  Files                 267      267              
  Lines                9060     9091      +31     
  Branches             1457     1467      +10     
==================================================
+ Hits                 7523     7541      +18     
- Misses               1140     1148       +8     
- Partials              397      402       +5     
Impacted Files Coverage Δ
...adog/opentracing/propagation/DatadogHttpCodec.java 5.77% <ø> (+0.11%) ⬆️
.../com/datadog/android/tracing/TracingInterceptor.kt 97.78% <98.00%> (-0.44%) ⬇️
...in/com/datadog/android/glide/DatadogGlideModule.kt 94.74% <100.00%> (+0.62%) ⬆️
...n/kotlin/com/datadog/android/DatadogInterceptor.kt 93.67% <100.00%> (+0.43%) ⬆️
...n/kotlin/com/datadog/android/rum/RumInterceptor.kt 87.50% <100.00%> (+87.50%) ⬆️
...id/webview/internal/rum/WebViewRumEventConsumer.kt 76.74% <0.00%> (-9.30%) ⬇️
...ore/internal/constraints/DatadogDataConstraints.kt 86.49% <0.00%> (-8.11%) ⬇️
...droid/rum/tracking/FragmentViewTrackingStrategy.kt 82.61% <0.00%> (-4.35%) ⬇️
...android/rum/internal/ndk/DatadogNdkCrashHandler.kt 83.91% <0.00%> (-3.45%) ⬇️
... and 2 more

@xgouchet xgouchet added the size-large This PR is large sized label May 19, 2022
Copy link
Contributor

@alai97 alai97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for docs!

)
tracedRequestBuilder.addHeader(SAMPLING_PRIORITY_HEADER, DROP_SAMPLING_DECISION)
} else {
tracer.inject(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you add the SAMPLING_PRIORITY_HEADER here with 1 ? Otherwise when you check for it in the TracingInterceptor the way I've seen you are going to apply a sampling there which is not wanted if the request was not sampled out here.

Copy link
Member Author

@0xnm 0xnm May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question :) I was thinking about it, but let it as it is now. So normally the expectation is that Injector will add it, because this is what DatadogHttpCodec is doing here

We could check if header is there after the injection and if it is not, we could add it, but the thing is that Request.Builder has no method to check if header is already there, and calling removeHeader + addHeader (because simple addHeader just adds it and not replacing if there is an existing one) can set a header with a value different from injector. But anyway, with DatadogHttpCodec everything should be fine.

Now the question is: what if customer is using non-Datadog tracer (generally speaking it is possible, because we accept high-level io.opentracing.Tracer interface, not necessary Datadog tracer). So it won't be DatadogHttpCodec there and so maybe x-datadog-sampling-priority won't be set in this branch. If it is not set, then simply on the backend side service will make its own decision about the sampling, so not a big deal.

And since we are accepting io.opentracing.Tracer and io.opentracing.Span interface which generally have a way to set priority sampling, but there is no way to read it, we have to rely on the headers with such kind of hacky decision.

We could assume that Datadog tracer is always used (and then use DDSpan, which have an opportunity to read back the sampling priority) and not rely on the headers, but then traceSamplingRate parameter would be useless if customer is using non-Datadog tracer and so they will face the new billing model in a not very nice way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done automatically by the inject method

)
tracedRequestBuilder.addHeader(SAMPLING_PRIORITY_HEADER, DROP_SAMPLING_DECISION)
} else {
tracer.inject(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done automatically by the inject method

@0xnm 0xnm merged commit 978d633 into release/1.13.0 May 23, 2022
@0xnm 0xnm deleted the nogorodnikov/rumm-2165/add-apm-sampling branch May 23, 2022 11:38
@xgouchet xgouchet added this to the 1.13.0 milestone Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size-large This PR is large sized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants