Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONTINT-3554] Send the Datadog-Entity-ID header, containing either the container-id or the cgroup inode if available #2402

Merged
merged 12 commits into from
Dec 19, 2023

Conversation

AliDatadog
Copy link
Contributor

@AliDatadog AliDatadog commented Nov 30, 2023

What does this PR do?

This PR sends the Datadog-Entity-ID header when the Datadog-Container-Id is usually sent, containing either the container-id or the cgroup inode if available.

Motivation

Using the cgroup inode will allow the trace-agent to retrieve the container-id, typically on cgroupv2 without UDS.

Drawbacks

Datadog-Entity-ID will be sent even if the app is not running in a container because there is no reliable way to ensure that the app is running in a container.

Test

This PR should be tested in several setups without UDS:

  • Only cgroupv2 is available: Traces should contain the in-<inode> with the inode obtain by kubectl exec <pod> -- stat /sys/fs/cgroup
  • Hybrid setup: If the container-id was available before this PR, it should still be available. Otherwise it should use the cgroup memory controller inode (kubectl exec <pod> -- stat /sys/fs/cgroup/memory).

With UDS, we should never use the inode logic.

  apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: dummy-apm-app
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: dummy-apm-app
    template:
      metadata:
        labels:
          app: dummy-apm-app
          admission.datadoghq.com/enabled: "false"
      spec:
        containers:
        - name: dummy-apm-app
          image: docker.io/alidatadog/dummy-apm-app:0.2.0
          env:
          - name: DD_AGENT_HOST
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
  • Otherwise I made a dummy app that logs span headers. It should be deployed on kind 1.28 as well. The docker vm should use cgroupv2 by default.
  apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: dummy-apm-app
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: dummy-apm-app
    template:
      metadata:
        labels:
          app: dummy-apm-app
          admission.datadoghq.com/enabled: "false" # Important otherwise container_id won't be shared.
      spec:
        containers:
        - name: dummy-apm-app
          image: docker.io/alidatadog/dummy-apm-app:0.1.0

Example:

2023/11/30 11:10:50 map[Accept-Encoding:[gzip] Content-Type:[application/msgpack] Datadog-Client-Computed-Top-Level:[yes] Datadog-Client-Dropped-P0-Spans:[0] Datadog-Client-Dropped-P0-Traces:[0] Datadog-Entity-Id:[in-35385] Datadog-Meta-Lang:[go] Datadog-Meta-Lang-Interpreter:[gc-arm64-linux] Datadog-Meta-Lang-Version:[1.21.4] Datadog-Meta-Tracer-Version:[v1.59.0-dev] User-Agent:[Go-http-client/1.1] X-Datadog-Trace-Count:[1]]
  • If you're interested by the app, I used the following code:
package main

import (
	"log"
	"net/http"
	"time"

	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
)

func main() {
	go func() {
		http.HandleFunc("/v0.4/traces", func(w http.ResponseWriter, r *http.Request) {
			log.Println(r.Header)
		})
		log.Fatal(http.ListenAndServe(":8080", nil))
	}()

	tracer.Start(
		tracer.WithAgentAddr(":8080"),
		tracer.WithService("my.service"),
	)
	defer tracer.Stop()

	ticker := time.NewTicker(10 * time.Second)
	for ; true; <-ticker.C {
		span := tracer.StartSpan("my.span")
		time.Sleep(2 * time.Second)
		span.Finish()
	}
}

and build the image with the following Dockerfile

# Use the official Golang image to create a build artifact.
# This is based on Debian and sets the GOPATH to /go.
FROM golang:latest as builder

# Set the Current Working Directory inside the container
WORKDIR /app

# Copy the source from the current directory to the Working Directory inside the container
COPY . .

# Build the Go app
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Command to run the executable
CMD ["./main"]

To build a multi-arch image, I used docker buildx build --push --platform=linux/amd64,linux/arm64 -t docker.io/alidatadog/dummy-apm-app:0.1.0 .

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.

For Datadog employees:

  • If this PR touches code that handles credentials of any kind, such as Datadog API keys, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@AliDatadog AliDatadog added enhancement quick change/addition that does not need full team approval team:apm-go labels Nov 30, 2023
@AliDatadog AliDatadog requested review from a team as code owners November 30, 2023 10:13
@AliDatadog AliDatadog requested a review from a team November 30, 2023 10:13
@pr-commenter
Copy link

pr-commenter bot commented Nov 30, 2023

Benchmarks

Benchmark execution time: 2023-12-14 16:54:03

Comparing candidate commit bd1d7cd in PR branch ali/add-entity-id-header with baseline commit 5a92f7b in branch main.

Found 5 performance improvements and 0 performance regressions! Performance is the same for 34 metrics, 2 unstable metrics.

scenario:BenchmarkPartialFlushing/Disabled-24

  • 🟩 execution_time [-9.804ms; -5.840ms] or [-3.395%; -2.022%]

scenario:BenchmarkPartialFlushing/Enabled-24

  • 🟩 execution_time [-11.606ms; -7.114ms] or [-3.998%; -2.451%]

scenario:BenchmarkSingleSpanRetention/no-rules-24

  • 🟩 execution_time [-10.086µs; -9.256µs] or [-3.964%; -3.638%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-all-24

  • 🟩 execution_time [-10.958µs; -9.832µs] or [-4.262%; -3.825%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-half-24

  • 🟩 execution_time [-10.451µs; -9.040µs] or [-4.071%; -3.521%]

@AliDatadog AliDatadog marked this pull request as draft November 30, 2023 10:33
Copy link
Contributor

@ahmed-mez ahmed-mez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Let's get this merged once we have the changes in the agent!

internal/container.go Outdated Show resolved Hide resolved
Co-authored-by: Ahmed Mezghani <38987709+ahmed-mez@users.noreply.github.com>
internal/container.go Outdated Show resolved Hide resolved
@AliDatadog AliDatadog marked this pull request as ready for review December 1, 2023 14:30
Copy link
Collaborator

@piochelepiotr piochelepiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for data streams

Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! One small request regarding the profiler unit tests. Otherwise LGTM.

profiler/upload_test.go Show resolved Hide resolved
Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for profiling, thanks!

profiler/upload_test.go Show resolved Hide resolved
@AliDatadog AliDatadog changed the title Send the Datadog-Entity-ID header, containing either the container-id or the cgroupv2 inode if available Send the Datadog-Entity-ID header, containing either the container-id or the cgroup inode if available Dec 4, 2023
@AliDatadog AliDatadog force-pushed the ali/add-entity-id-header branch 3 times, most recently from 2df3dd0 to 4c2e2bb Compare December 4, 2023 18:36
if containerID != "" {
return "cid-" + containerID
}
return getCgroupInode(mountsPath, cgroupPath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code is incorrect here, mountsPath is /proc/mounts, which is not read as we defaulted to always reading /sys/fs/cgroup, it should be defaultCgroupMountPath and mountsPath should be removed.

Note that you have no unit test on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I fixed it and added a unit test

@AliDatadog AliDatadog changed the title Send the Datadog-Entity-ID header, containing either the container-id or the cgroup inode if available [CONTINT-3554] Send the Datadog-Entity-ID header, containing either the container-id or the cgroup inode if available Dec 11, 2023
@ahmed-mez
Copy link
Contributor

@AliDatadog @vboulineau is this ready to merge?

@AliDatadog
Copy link
Contributor Author

@AliDatadog @vboulineau is this ready to merge?

Waiting for the last approval

)

const (
// cgroupPath is the path to the cgroup file where we can find the container id if one exists.
cgroupPath = "/proc/self/cgroup"

// mountsPath is the path to the mounts file where we can find the cgroup v2 mount point.
mountsPath = "/proc/mounts"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used, should be removed

@darccio darccio merged commit 337ded8 into main Dec 19, 2023
151 of 153 checks passed
@darccio darccio deleted the ali/add-entity-id-header branch December 19, 2023 09:38
@darccio darccio added this to the Triage milestone Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement quick change/addition that does not need full team approval team:apm-go
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants