Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog] Datadog exporter panics while exporting metrics pushed from Kuma dataplane proxy #32103

Closed
Automaat opened this issue Apr 2, 2024 · 6 comments
Labels

Comments

@Automaat
Copy link

Automaat commented Apr 2, 2024

Component(s)

exporter/datadog

What happened?

Description

We've introduced support for pushing metrics to OpenTelemetry collector in Kuma service mesh, and we discovered an issue with datadog exporter. A couple of minutes after we start pushing metrics to collector, it panics. More information in Kuma issue, with logs from debug exporter: kumahq/kuma#9336

Steps to Reproduce

Install Kuma, guide

helm repo add kuma https://kumahq.github.io/charts
helm repo update
helm install --create-namespace --namespace kuma-system kuma kuma/kuma

Install demo app

kumactl install demo | kubectl apply -f -

Otel collector with config:

kubectl --context $CTX_CLUSTER3 create namespace observability

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# otel collector config via helm
cat > otel-config-datadog.yaml <<EOF
mode: deployment
config:
  exporters:
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog
EOF

helm upgrade --install \
  --kube-context ${CTX_CLUSTER3} \
  -n observability \
  --set mode=deployment \
  -f otel-config-datadog.yaml \
  opentelemetry-collector open-telemetry/opentelemetry-collector

# enable Metrics
kumactl apply -f - <<EOF
type: MeshMetric
name: metrics-default
mesh: default
spec:
  targetRef:
    kind: Mesh
  default:
    backends:
    - type: OpenTelemetry
      openTelemetry: 
        endpoint: "opentelemetry-collector.observability.svc:4317"
EOF

Expected Result

exporter does not panic

Actual Result

exporter panics

panic: runtime error: index out of range [0] with length 0

goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/quantile@v0.13.2/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:39 +0x7d

Collector version

0.92.0

Environment information

Environment

OpenTelemetry Collector configuration

mode: deployment
config:
  exporters:
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog

Log output

panic: runtime error: index out of range [0] with length 0

goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/quantile@v0.13.2/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics@v0.13.2/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter@v0.94.0/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/exporter@v0.94.1/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
	go.opentelemetry.io/collector/exporter@v0.94.1/internal/queue/consumers.go:39 +0x7d

Additional context

No response

@Automaat Automaat added bug Something isn't working needs triage New item requiring triage labels Apr 2, 2024
@github-actions github-actions bot added the exporter/datadog Datadog components label Apr 2, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mx-psi mx-psi added data:metrics Metric related issues priority:p1 High and removed needs triage New item requiring triage labels Apr 2, 2024
@mx-psi
Copy link
Member

mx-psi commented Apr 2, 2024

@Automaat If it is easy to reproduce, would you be able to use the file exporter or the debug exporter to get some sample metrics?

It sounds like the problematic metric is an OTLP Histogram, but I don't have enough data to reproduce just yet.

@Automaat
Copy link
Author

Automaat commented Apr 2, 2024

@mx-psi we have logs from debug-exporter, here: kumahq/kuma#9336 (comment) If it is not enough I can collect more

@mx-psi
Copy link
Member

mx-psi commented Apr 2, 2024

@Automaat These logs don't have any sample data, using one of the exporters mentioned on #32103 (comment) should allow us to see the actual payload. If you have not used them before, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/troubleshooting.md#local-exporters for a brief explanation

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 17, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants