Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/prometheusremotewrite] Collector crashes after metrics expiry #7149

Closed
albertteoh opened this issue Jan 12, 2022 · 1 comment · Fixed by #7306
Closed

[exporter/prometheusremotewrite] Collector crashes after metrics expiry #7149

albertteoh opened this issue Jan 12, 2022 · 1 comment · Fixed by #7306
Assignees
Labels
bug Something isn't working

Comments

@albertteoh
Copy link
Contributor

albertteoh commented Jan 12, 2022

Relates to #6935

Describe the bug
When metrics expire, opentelemetry-collector crashes.

Steps to reproduce

  1. Start otel collector with provided config below.
  2. Start HotROD example: docker run --rm --network="host" --env JAEGER_AGENT_HOST=localhost --env JAEGER_AGENT_PORT=6835 -p8080-8083:8080-8083 jaegertracing/example-hotrod:latest all
    1. This will emit spans to otel collector on port 6835.
  3. Click on a button on HotROD example to send a trace.
  4. Check that the latency_bucket metrics are correct for the first minute of data, and make a note of which bucket has count > 0. For example, where the metric has label le = "250".
    1. Example query: latency_bucket{service_name = "driver", le="250"}

What did you expect to see?

After a minute (on metric expiry), the metric with le = "250" (example query latency_bucket{service_name = "driver", le="250"}) should no longer be query-able (a null value).

What did you see instead?

After a minute (on metric expiry), the opentelemetry-collector crashes with the following logs:

2022-01-12T12:53:46.460Z	debug	prometheusexporter@v0.42.0/accumulator.go:262	metric expired: latency	{"kind": "exporter", "name": "prometheus"}
2022-01-12T12:53:46.460Z	debug	prometheusexporter@v0.42.0/accumulator.go:262	metric expired: latency	{"kind": "exporter", "name": "prometheus"}
2022-01-12T12:53:46.460Z	debug	prometheusexporter@v0.42.0/accumulator.go:262	metric expired: latency	{"kind": "exporter", "name": "prometheus"}
2022-01-12T12:53:46.466Z	INFO	loggingexporter/logging_exporter.go:54	MetricsExporter	{"#metrics": 7}
2022-01-12T12:53:46.466Z	DEBUG	loggingexporter/logging_exporter.go:64	ResourceMetrics #0

...

panic: runtime error: index out of range [-1]

goroutine 111 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter.addSingleHistogramDataPoint({0x8}, {0x0}, {0x0}, {0x0, 0x0}, 0xc00226aaa0, 0x8)
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter@v0.42.0/helper.go:397 +0x8bd
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter.(*PRWExporter).PushMetrics(0xc00061ba40, {0x4bb0408, 0xc00225d800}, {0x4bb0440})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter@v0.42.0/exporter.go:151 +0xa25
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).export(0x4bb0440, {0x4bb0408, 0xc00225d800})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/metrics.go:67 +0x34
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc000a06028, {0x4c18830, 0xc002264ae0})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/common.go:232 +0x96
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send(0xc00031d200, {0x4c18830, 0xc002264ae0})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry.go:176 +0x5eb
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc00012e168, {0x4c18830, 0xc002264ae0})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/metrics.go:134 +0x88
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1({0x3f9a020, 0xc002264ae0})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry_inmemory.go:105 +0x5c
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume(0xc00064ffa8, {0x3f9a020, 0xc002264ae0})
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:99 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2()
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:78 +0xd6
created by go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers
	go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:68 +0xa5

What version did you use?
Version: main branch

What config did you use?
Config:

receivers:
  jaeger:
    protocols:
      thrift_binary:
        endpoint: "localhost:6836"
      thrift_compact:
        endpoint: "localhost:6835"
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: :12345
  otlp:
    protocols:
      grpc:
        endpoint: :4317
  prometheus:
    config:
      global:
        external_labels:
          p8s_logzio_name: spm-demo-otel
      scrape_configs: 
      - job_name: 'atm'
        scrape_interval: 15s
        static_configs:
        - targets: [ "0.0.0.0:8889" ]
exporters:
  jaeger:
    endpoint: "localhost:14250"
    tls:
      insecure: true
  prometheus:
    endpoint: "localhost:8889"
    metric_expiration: 1m
  logging:
    loglevel: debug
  prometheusremotewrite:
    endpoint: https://myremotewriteendpoint
    headers:
      Authorization: Bearer mybearertoken
processors:
  batch:
  spanmetrics:
    metrics_exporter: prometheus
    latency_histogram_buckets: [2ms, 6ms, 10ms, 100ms, 250ms, 500ms, 1000ms, 10000ms, 100000ms, 1000000ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code    
extensions:
  pprof:
    endpoint: :1777
  zpages:
    endpoint: :55679
  health_check:
service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [jaeger]
      processors: [spanmetrics,batch]
      exporters: [jaeger, logging]
    metrics/spanmetrics:
      # This receiver is just a dummy and never used.
      # Added to pass validation requiring at least one receiver in a pipeline.
      receivers: [otlp/spanmetrics]
      exporters: [prometheus]
    metrics:
      receivers: [otlp,prometheus]
      exporters: [logging,prometheusremotewrite]      
  telemetry:
    logs:
      level: "debug"

Environment
OS: "Ubuntu 20.04"
Compiler(if manually compiled): "go 1.17.5")

cc @Aneurysm9

@albertteoh albertteoh added the bug Something isn't working label Jan 12, 2022
@Aneurysm9 Aneurysm9 self-assigned this Jan 12, 2022
@albertteoh
Copy link
Contributor Author

Sorry, I'd forgotten to include the prometheusremotewrite exporter in the description's config; this is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants