Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: adding support for single span matching #1357

Merged
merged 35 commits into from
Aug 2, 2022

Conversation

dianashevchenko
Copy link
Contributor

Motivation: Single Span Ingestion Control RFC

Recap:

  • glob matcher regex matches full strings, only '?' and '*' are treated as metacharacters
  • 2 mandatory single span tags(mechanism and rule rate) should be set on matched spans, max_per_second as well if available
  • span rules loaded from the env variable take precedence over a file with rules and a warning is emitted

@dianashevchenko dianashevchenko added this to the 1.39.0 milestone Jun 27, 2022
@dianashevchenko dianashevchenko changed the title pkg/tracer: adding support for single span matching WIP: pkg/tracer: adding support for single span matching Jun 27, 2022
@dianashevchenko dianashevchenko changed the title WIP: pkg/tracer: adding support for single span matching pkg/tracer: adding support for single span matching Jun 29, 2022
@dianashevchenko dianashevchenko force-pushed the shevchenko/single-span-ingestion branch from 5abe908 to d056ab1 Compare June 29, 2022 15:19
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Mostly just some nit-picky comments here and there :)

ddtrace/ext/tags.go Outdated Show resolved Hide resolved
ddtrace/tracer/option.go Outdated Show resolved Hide resolved
ddtrace/tracer/option.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/single_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/single_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/single_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/single_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/single_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@katiehockman katiehockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Diana! I didn't get a chance to go very deep into the review, and it's very possible that some of my comments won't be very clear, so I'm happy to pair with you on any of it this week if you'd like. That may make it easier for me to explain what I'm going for here.

Comments about the PR description:
"a warning is emitted" can you make it clear that it is only emitted if both the rules file and rules env var are provided, thus dropped the file?

ddtrace/ext/tags.go Outdated Show resolved Hide resolved
ddtrace/tracer/option.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
// DD_SPAN_SAMPLING_RULES_FILE environment variables.
func spanSamplingRulesFromEnv() ([]SamplingRule, error) {
errs := []string{}
rules, err := processSamplingRulesFromEnv([]byte(os.Getenv("DD_SPAN_SAMPLING_RULES")), SingleSpanSamplingType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same check here as we do for trace sampling rules? e.g.

	rulesFromEnv := os.Getenv("DD_SPAN_SAMPLING_RULES")
	if rulesFromEnv == "" {
		return nil, nil
	}
	return processSamplingRulesFromEnv([]byte(rulesFromEnv), SingleSpanSamplingType)

ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
@gbbr gbbr changed the title pkg/tracer: adding support for single span matching ddtrace/tracer: adding support for single span matching Jun 30, 2022
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but there are a lot already. Once they get addressed, I can look again.

ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved
SpanSamplingMechanism = "_dd.span_sampling.mechanism"

// SingleSpanSamplingMechanism is a constant reserved to indicate that a span was kept on account of a span sampling rule.
SingleSpanSamplingMechanism = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of these constants meant to be used by users? Why are they public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is good point, movedt to tracer/sampler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only resolve the comment once the change is committed please.

ddtrace/tracer/option.go Outdated Show resolved Hide resolved
ddtrace/tracer/option.go Outdated Show resolved Hide resolved
ddtrace/tracer/util.go Outdated Show resolved Hide resolved
ddtrace/tracer/spancontext.go Outdated Show resolved Hide resolved
ddtrace/tracer/spancontext.go Outdated Show resolved Hide resolved
// rules for applying a sampling rate to spans that match the designated service
// or operation name. Such spans are sampled if trace sampling decision is 'drop' and
// may be sent separately
singleSpanRulesSampling *singleSpanRulesSampler
Copy link
Contributor

@gbbr gbbr Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many samplers. We should have just one: rulesSampler and that one should have two methods (instead of apply), called:

  • sampleTrace
  • sampleSpan
// sampleTraces reports whether an entire trace should be kept based solely on the given span,
// which is usually the first in the chunk.

and

// sampleSpan reports whether this individual span should be kept.

I understand that the RFC was called "single span sampling" but it may be much clearer to say "individual span sampling" and it's fine to do that especially in internal methods IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually be more tedious to combine them into one. I'll leave this open for now

ddtrace/tracer/sampler.go Outdated Show resolved Hide resolved

// spanSamplingRules contains user-defined rules determine the sampling rate to apply
// to single spans, regardless of the trace sampling decision.
spanSamplingRules []SamplingRule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these defined and why are they here? There is no option (With*) which sets them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are supplied through env variable, like I've mentioned above I think of adding analogous option, without renaming old one (WithSamplingRules)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just have WithSamplingRules which defines all of them? Both for spans and for traces. You already added a Type field to SamplingRule, that should be enough. You can pick them apart into two sets inside the rulesSampler implementation.

@Hellzy Hellzy modified the milestones: 1.39.0, 1.40.0 Jul 1, 2022
…pler to a new struct; moved traceSampler code to a separate file; updated sampler to include rules from WithSamplingRules option where both types of rules can be set;
@dianashevchenko
Copy link
Contributor Author

Can't reproduce test failure with 2k runs

Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! 👏
just some small comments but overall approved once those changes are looked at / made!!

ddtrace/tracer/rules_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/rules_sampler.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved
@@ -55,7 +55,7 @@ func TestStartupLog(t *testing.T) {
tp.Reset()
logStartup(tracer)
assert.Len(tp.Lines(), 2)
assert.Regexp(`Datadog Tracer v[0-9]+\.[0-9]+\.[0-9]+(-rc\.[0-9]+)? INFO: DATADOG TRACER CONFIGURATION {"date":"[^"]*","os_name":"[^"]*","os_version":"[^"]*","version":"[^"]*","lang":"Go","lang_version":"[^"]*","env":"configuredEnv","service":"configured.service","agent_url":"http://localhost:9/v0.4/traces","agent_error":"Post .*","debug":true,"analytics_enabled":true,"sample_rate":"0\.123000","sample_rate_limit":"100","sampling_rules":\[{"service":"mysql","name":"","sample_rate":0\.75}\],"sampling_rules_error":"","service_mappings":{"initial_service":"new_service"},"tags":{"runtime-id":"[^"]*","tag":"value","tag2":"NaN"},"runtime_metrics_enabled":true,"health_metrics_enabled":true,"profiler_code_hotspots_enabled":((false)|(true)),"profiler_endpoints_enabled":((false)|(true)),"dd_version":"2.3.4","architecture":"[^"]*","global_service":"configured.service","lambda_mode":"false","appsec":((true)|(false)),"agent_features":{"DropP0s":false,"Stats":false,"StatsdPort":0}}`, tp.Lines()[1])
assert.Regexp(`Datadog Tracer v[0-9]+\.[0-9]+\.[0-9]+(-rc\.[0-9]+)? INFO: DATADOG TRACER CONFIGURATION {"date":"[^"]*","os_name":"[^"]*","os_version":"[^"]*","version":"[^"]*","lang":"Go","lang_version":"[^"]*","env":"configuredEnv","service":"configured.service","agent_url":"http://localhost:9/v0.4/traces","agent_error":"Post .*","debug":true,"analytics_enabled":true,"sample_rate":"0\.123000","sample_rate_limit":"100","sampling_rules":\[{"service":"mysql","name":"","sample_rate":0\.75,"type":0}\],"sampling_rules_error":"","service_mappings":{"initial_service":"new_service"},"tags":{"runtime-id":"[^"]*","tag":"value","tag2":"NaN"},"runtime_metrics_enabled":true,"health_metrics_enabled":true,"profiler_code_hotspots_enabled":((false)|(true)),"profiler_endpoints_enabled":((false)|(true)),"dd_version":"2.3.4","architecture":"[^"]*","global_service":"configured.service","lambda_mode":"false","appsec":((true)|(false)),"agent_features":{"DropP0s":false,"Stats":false,"StatsdPort":0}}`, tp.Lines()[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is meant for debugging and confirming configuration, having a meaningful human-readable value for type would be better. Can we can convert that int to its enum/const name easily?

@dianashevchenko dianashevchenko force-pushed the shevchenko/single-span-ingestion branch from a9c49ce to 46b0fe6 Compare August 1, 2022 17:00
knusbaum
knusbaum previously approved these changes Aug 1, 2022
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I just have some comments regarding naming but overall LGTM!

return false
}

func (rs *singleSpanRulesSampler) applyRate(span *span, rule SamplingRule, rate float64, now time.Time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you forgot this comment. If you feel strongly on having this additional method, it's ok, but I don't see the reason for it. If you keep it, maybe rename it like this:

Suggested change
func (rs *singleSpanRulesSampler) applyRate(span *span, rule SamplingRule, rate float64, now time.Time) {
func (rs *singleSpanRulesSampler) applyRule(span *span, rule SamplingRule, now time.Time) {

There is no point to have the rate argument since it's derived from rule.Rate

func (t *tracer) pushTrace(trace []*span) {
// traceInfo contains span list and sampling decision of the trace
// needed to run single span sampling.
type traceInfo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally when you find yourself using words like "data", "info", "process" or "util" in programming, I recommend questioning that choice and perhaps trying to find something better. These are words that generally apply in any situation and are not very specific. How about this?

Suggested change
type traceInfo struct {
type finishedTrace struct {

traceInfo sounds a bit as if it is information about a trace, whereas this is actually a finished trace with context about sampling decision.

atomic.AddUint64(&t.partialTraces, 1)
}

func (t *tracer) pushTraceInfo(info *traceInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite you having changed the signature of this function by adding a new argument. I would not change the name. The "info" suffix is not needed imho.


// processTraceInfo runs single span sampling on the spans pushed to the worker thread.
// Sampling spans on the worker thread keeps any associated latency off the application thread.
func (t *tracer) processTraceInfo(info *traceInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of my previous comment, here we find both the word "process" and "info" as part of the same method. Double whammy 🙂 How about sampleFinishedTrace? This makes it clear that it is sampling which occurs at the end.

// needed to run single span sampling.
type traceInfo struct {
spans []*span
samplingDecision samplingDecision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
samplingDecision samplingDecision
decision samplingDecision

Perhaps this is enough?

Comment on lines 319 to 320
// traceInfo contains span list and sampling decision of the trace
// needed to run single span sampling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to document something as what it is or what it stands for, rather than what its fields are and where it is used. Considering you follow my suggestions, I'd do something like:

// finishedTrace holds information about a trace that has finished, including its spans.

I would not mention that it's needed for sampling. When I read "this is needed for ..." it sounds a bit like it's an excuse for having it. As if it would not be here otherwise. Let's make it clear that it is useful instead.

@@ -295,8 +302,11 @@ func (t *tracer) worker(tick <-chan time.Time) {
// before the final flush to ensure no traces are lost (see #526)
for {
select {
case trace := <-t.out:
t.traceWriter.add(trace)
case info := <-t.out:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case info := <-t.out:
case trace := <-t.out:

Then, trace.decision and trace.spans makes complete sense.

samplingDecision samplingDecision
}

// processTraceInfo runs single span sampling on the spans pushed to the worker thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is very technical and assumes that the reader has a lot of context. If possible (it's not always possible) we should try to document things in a more independent way, for example:

// sampleFinishedTrace applies single-span sampling to the provided trace, which is considered to be finished.

}

// processTraceInfo runs single span sampling on the spans pushed to the worker thread.
// Sampling spans on the worker thread keeps any associated latency off the application thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems out of place and arguably not needed. A function's documentation should not assume that it is called from anywhere. It's just a function at this point.

Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, lastly, I think we should update the docs and explain how to use single span sampling with environment variables and files. WDYT?

If you want to do that as a separate PR, that's fine!

gbbr
gbbr previously approved these changes Aug 2, 2022
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👏🏻 🚀 🎉

@dianashevchenko dianashevchenko merged commit d3776a9 into main Aug 2, 2022
@dianashevchenko dianashevchenko deleted the shevchenko/single-span-ingestion branch August 2, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants