Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: only finish execution trace task, restore pprof labels once #2708

Merged
merged 3 commits into from
May 31, 2024

Conversation

nsrip-dd
Copy link
Contributor

@nsrip-dd nsrip-dd commented May 23, 2024

What does this PR do?

Only finish the execution trace task and restore pprof labels for a span
the first time Finish is called.

Motivation

It's possible for users to call Finish multiple times on a span. We
should only record the span finishing in the execution tracer and via
pprof labels one time, though. Otherwise we're 1) wasting space in the
trace and 2) possibly overriding pprof labels with incorrect values.
Move the task ending and label setting inside span.finish, after we
check whether the span is already finished.

#2709 shows an example regression test for this issue. We could
do just the pprof label check in this package, but if we want to check
execution traces as well I think we should move to golang.org/x/exp/trace,
which IMO warrants moving to a separate module to isolate the golang.org/x/exp
dependency. Open to doing a smaller thing in this PR, or we can
go with #2709.

Here's an example program exercising the pprof label behavior:

package main

import (
	"context"
	"os"
	"runtime/pprof"
	"time"

	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
)

func waste() {
	now := time.Now()
	for time.Since(now) < 100*time.Millisecond {
	}
}

func main() {
	pprof.StartCPUProfile(os.Stdout)
	defer pprof.StopCPUProfile()
	tracer.Start()
	defer tracer.Stop()

	foo, ctx := tracer.StartSpanFromContext(context.Background(), "foo")
	bar, _ := tracer.StartSpanFromContext(ctx, "bar")
	bar.Finish()
	foo.Finish()
	bar.Finish()
	// If we don't handle double finish right, we will see CPU profile samples
	// for waste tagged with the trace/span IDs from foo
	waste()
}

On current main, we see this:

% go tool pprof -focus=waste -tags ./cpu-before.pprof 
 local root span id: Total 80.0ms
                     80.0ms (  100%): 1474084951106970338

 span id: Total 80.0ms
          80.0ms (  100%): 1474084951106970338

 trace endpoint: Total 80.0ms
                 80.0ms (  100%): foo

But all the spans are finished, so there shouldn't be any labels associated with waste. With this PR, we see there are no tags associated with the time

% go tool pprof -focus=waste -tags ./cpu-after.pprof
%

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@pr-commenter
Copy link

pr-commenter bot commented May 23, 2024

Benchmarks

Benchmark execution time: 2024-05-31 13:52:19

Comparing candidate commit f70a3cd in PR branch nick.ripley/only-finish-span-once with baseline commit e0b2c6d in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 1 unstable metrics.

…s once

It's possible for users to call Finish multiple times on a span. We
should only record the span finishing in the execution tracer and via
pprof labels one time, though. Otherwise we're 1) wasting space in the
trace and 2) possibly overriding pprof labels with incorrect values.
Move the task ending and label setting inside span.finish, after we
check whether the span is already finished.
@nsrip-dd nsrip-dd force-pushed the nick.ripley/only-finish-span-once branch from 297fed4 to 3c50bd6 Compare May 24, 2024 13:34
@nsrip-dd nsrip-dd changed the title WIP: ddtrace/tracer: only finish execution trace task, restore pprof labels once ddtrace/tracer: only finish execution trace task, restore pprof labels once May 24, 2024
@nsrip-dd
Copy link
Contributor Author

For tracer reviewers, this is doing some cleanup under the span lock that wasn't under the lock before. The cleanup should be cheap: just taking a stack trace with frame pointers, writing some bytes to a buffer, and updating a pointer in a goroutine structure. Our benchmarks don't show any regression, but I guess we also aren't specifically measuring contended access to a span while a program is calling Finish on it... Any thoughts on whether this will be a real performance issue?

@nsrip-dd nsrip-dd marked this pull request as ready for review May 24, 2024 13:38
@nsrip-dd nsrip-dd requested a review from a team as a code owner May 24, 2024 13:38
@nsrip-dd nsrip-dd requested a review from felixge May 24, 2024 13:39
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙇

Edit: On second thought, I'm less sure. See my comment on the other PR.

@felixge felixge requested a review from a team as a code owner May 31, 2024 13:28
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙇

@felixge felixge enabled auto-merge (squash) May 31, 2024 13:29
auto-merge was automatically disabled May 31, 2024 13:32

Pull Request is not mergeable

@nsrip-dd nsrip-dd merged commit a43e166 into main May 31, 2024
200 checks passed
@nsrip-dd nsrip-dd deleted the nick.ripley/only-finish-span-once branch May 31, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants