-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddtrace/tracer: adding support for single span matching #1357
Conversation
5abe908
to
d056ab1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Mostly just some nit-picky comments here and there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Diana! I didn't get a chance to go very deep into the review, and it's very possible that some of my comments won't be very clear, so I'm happy to pair with you on any of it this week if you'd like. That may make it easier for me to explain what I'm going for here.
Comments about the PR description:
"a warning is emitted" can you make it clear that it is only emitted if both the rules file and rules env var are provided, thus dropped the file?
ddtrace/tracer/sampler.go
Outdated
// DD_SPAN_SAMPLING_RULES_FILE environment variables. | ||
func spanSamplingRulesFromEnv() ([]SamplingRule, error) { | ||
errs := []string{} | ||
rules, err := processSamplingRulesFromEnv([]byte(os.Getenv("DD_SPAN_SAMPLING_RULES")), SingleSpanSamplingType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do the same check here as we do for trace sampling rules? e.g.
rulesFromEnv := os.Getenv("DD_SPAN_SAMPLING_RULES")
if rulesFromEnv == "" {
return nil, nil
}
return processSamplingRulesFromEnv([]byte(rulesFromEnv), SingleSpanSamplingType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, but there are a lot already. Once they get addressed, I can look again.
ddtrace/ext/tags.go
Outdated
SpanSamplingMechanism = "_dd.span_sampling.mechanism" | ||
|
||
// SingleSpanSamplingMechanism is a constant reserved to indicate that a span was kept on account of a span sampling rule. | ||
SingleSpanSamplingMechanism = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any of these constants meant to be used by users? Why are they public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is good point, movedt to tracer/sampler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only resolve the comment once the change is committed please.
ddtrace/tracer/tracer.go
Outdated
// rules for applying a sampling rate to spans that match the designated service | ||
// or operation name. Such spans are sampled if trace sampling decision is 'drop' and | ||
// may be sent separately | ||
singleSpanRulesSampling *singleSpanRulesSampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many samplers. We should have just one: rulesSampler
and that one should have two methods (instead of apply
), called:
sampleTrace
sampleSpan
// sampleTraces reports whether an entire trace should be kept based solely on the given span,
// which is usually the first in the chunk.
and
// sampleSpan reports whether this individual span should be kept.
I understand that the RFC was called "single span sampling" but it may be much clearer to say "individual span sampling" and it's fine to do that especially in internal methods IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually be more tedious to combine them into one. I'll leave this open for now
ddtrace/tracer/option.go
Outdated
|
||
// spanSamplingRules contains user-defined rules determine the sampling rate to apply | ||
// to single spans, regardless of the trace sampling decision. | ||
spanSamplingRules []SamplingRule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these defined and why are they here? There is no option (With*
) which sets them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supplied through env variable, like I've mentioned above I think of adding analogous option, without renaming old one (WithSamplingRules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just have WithSamplingRules
which defines all of them? Both for spans and for traces. You already added a Type
field to SamplingRule
, that should be enough. You can pick them apart into two sets inside the rulesSampler
implementation.
…pler to a new struct; moved traceSampler code to a separate file; updated sampler to include rules from WithSamplingRules option where both types of rules can be set;
Can't reproduce test failure with 2k runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! 👏
just some small comments but overall approved once those changes are looked at / made!!
…op level tag anymore
ddtrace/tracer/log_test.go
Outdated
@@ -55,7 +55,7 @@ func TestStartupLog(t *testing.T) { | |||
tp.Reset() | |||
logStartup(tracer) | |||
assert.Len(tp.Lines(), 2) | |||
assert.Regexp(`Datadog Tracer v[0-9]+\.[0-9]+\.[0-9]+(-rc\.[0-9]+)? INFO: DATADOG TRACER CONFIGURATION {"date":"[^"]*","os_name":"[^"]*","os_version":"[^"]*","version":"[^"]*","lang":"Go","lang_version":"[^"]*","env":"configuredEnv","service":"configured.service","agent_url":"http://localhost:9/v0.4/traces","agent_error":"Post .*","debug":true,"analytics_enabled":true,"sample_rate":"0\.123000","sample_rate_limit":"100","sampling_rules":\[{"service":"mysql","name":"","sample_rate":0\.75}\],"sampling_rules_error":"","service_mappings":{"initial_service":"new_service"},"tags":{"runtime-id":"[^"]*","tag":"value","tag2":"NaN"},"runtime_metrics_enabled":true,"health_metrics_enabled":true,"profiler_code_hotspots_enabled":((false)|(true)),"profiler_endpoints_enabled":((false)|(true)),"dd_version":"2.3.4","architecture":"[^"]*","global_service":"configured.service","lambda_mode":"false","appsec":((true)|(false)),"agent_features":{"DropP0s":false,"Stats":false,"StatsdPort":0}}`, tp.Lines()[1]) | |||
assert.Regexp(`Datadog Tracer v[0-9]+\.[0-9]+\.[0-9]+(-rc\.[0-9]+)? INFO: DATADOG TRACER CONFIGURATION {"date":"[^"]*","os_name":"[^"]*","os_version":"[^"]*","version":"[^"]*","lang":"Go","lang_version":"[^"]*","env":"configuredEnv","service":"configured.service","agent_url":"http://localhost:9/v0.4/traces","agent_error":"Post .*","debug":true,"analytics_enabled":true,"sample_rate":"0\.123000","sample_rate_limit":"100","sampling_rules":\[{"service":"mysql","name":"","sample_rate":0\.75,"type":0}\],"sampling_rules_error":"","service_mappings":{"initial_service":"new_service"},"tags":{"runtime-id":"[^"]*","tag":"value","tag2":"NaN"},"runtime_metrics_enabled":true,"health_metrics_enabled":true,"profiler_code_hotspots_enabled":((false)|(true)),"profiler_endpoints_enabled":((false)|(true)),"dd_version":"2.3.4","architecture":"[^"]*","global_service":"configured.service","lambda_mode":"false","appsec":((true)|(false)),"agent_features":{"DropP0s":false,"Stats":false,"StatsdPort":0}}`, tp.Lines()[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is meant for debugging and confirming configuration, having a meaningful human-readable value for type would be better. Can we can convert that int to its enum/const name easily?
…/DataDog/dd-trace-go into shevchenko/single-span-ingestion
a9c49ce
to
46b0fe6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I just have some comments regarding naming but overall LGTM!
ddtrace/tracer/rules_sampler.go
Outdated
return false | ||
} | ||
|
||
func (rs *singleSpanRulesSampler) applyRate(span *span, rule SamplingRule, rate float64, now time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you forgot this comment. If you feel strongly on having this additional method, it's ok, but I don't see the reason for it. If you keep it, maybe rename it like this:
func (rs *singleSpanRulesSampler) applyRate(span *span, rule SamplingRule, rate float64, now time.Time) { | |
func (rs *singleSpanRulesSampler) applyRule(span *span, rule SamplingRule, now time.Time) { |
There is no point to have the rate
argument since it's derived from rule.Rate
ddtrace/tracer/tracer.go
Outdated
func (t *tracer) pushTrace(trace []*span) { | ||
// traceInfo contains span list and sampling decision of the trace | ||
// needed to run single span sampling. | ||
type traceInfo struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally when you find yourself using words like "data", "info", "process" or "util" in programming, I recommend questioning that choice and perhaps trying to find something better. These are words that generally apply in any situation and are not very specific. How about this?
type traceInfo struct { | |
type finishedTrace struct { |
traceInfo
sounds a bit as if it is information about a trace, whereas this is actually a finished trace with context about sampling decision.
ddtrace/tracer/tracer.go
Outdated
atomic.AddUint64(&t.partialTraces, 1) | ||
} | ||
|
||
func (t *tracer) pushTraceInfo(info *traceInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite you having changed the signature of this function by adding a new argument. I would not change the name. The "info" suffix is not needed imho.
ddtrace/tracer/tracer.go
Outdated
|
||
// processTraceInfo runs single span sampling on the spans pushed to the worker thread. | ||
// Sampling spans on the worker thread keeps any associated latency off the application thread. | ||
func (t *tracer) processTraceInfo(info *traceInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of my previous comment, here we find both the word "process" and "info" as part of the same method. Double whammy 🙂 How about sampleFinishedTrace
? This makes it clear that it is sampling which occurs at the end.
ddtrace/tracer/tracer.go
Outdated
// needed to run single span sampling. | ||
type traceInfo struct { | ||
spans []*span | ||
samplingDecision samplingDecision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samplingDecision samplingDecision | |
decision samplingDecision |
Perhaps this is enough?
ddtrace/tracer/tracer.go
Outdated
// traceInfo contains span list and sampling decision of the trace | ||
// needed to run single span sampling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to document something as what it is or what it stands for, rather than what its fields are and where it is used. Considering you follow my suggestions, I'd do something like:
// finishedTrace holds information about a trace that has finished, including its spans.
I would not mention that it's needed for sampling. When I read "this is needed for ..." it sounds a bit like it's an excuse for having it. As if it would not be here otherwise. Let's make it clear that it is useful instead.
ddtrace/tracer/tracer.go
Outdated
@@ -295,8 +302,11 @@ func (t *tracer) worker(tick <-chan time.Time) { | |||
// before the final flush to ensure no traces are lost (see #526) | |||
for { | |||
select { | |||
case trace := <-t.out: | |||
t.traceWriter.add(trace) | |||
case info := <-t.out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case info := <-t.out: | |
case trace := <-t.out: |
Then, trace.decision
and trace.spans
makes complete sense.
ddtrace/tracer/tracer.go
Outdated
samplingDecision samplingDecision | ||
} | ||
|
||
// processTraceInfo runs single span sampling on the spans pushed to the worker thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is very technical and assumes that the reader has a lot of context. If possible (it's not always possible) we should try to document things in a more independent way, for example:
// sampleFinishedTrace applies single-span sampling to the provided trace, which is considered to be finished.
ddtrace/tracer/tracer.go
Outdated
} | ||
|
||
// processTraceInfo runs single span sampling on the spans pushed to the worker thread. | ||
// Sampling spans on the worker thread keeps any associated latency off the application thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems out of place and arguably not needed. A function's documentation should not assume that it is called from anywhere. It's just a function at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, lastly, I think we should update the docs and explain how to use single span sampling with environment variables and files. WDYT?
If you want to do that as a separate PR, that's fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👏🏻 🚀 🎉
Motivation: Single Span Ingestion Control RFC
Recap: