Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus remote write exporter is failing with "out of order samples" under load #2315

Closed
rakyll opened this issue Dec 21, 2020 · 6 comments · Fixed by #2941
Closed

Prometheus remote write exporter is failing with "out of order samples" under load #2315

rakyll opened this issue Dec 21, 2020 · 6 comments · Fixed by #2941
Labels

Comments

@rakyll
Copy link
Contributor

rakyll commented Dec 21, 2020

The Prometheus remote writer exporter is failing with the following errors under load:

  • out of order samples
  • duplicate sample for timestamp

These issues can be potentially caused by the concurrent writes at https://github.com/open-telemetry/opentelemetry-collector/blob/a16c599d26b6db7a95489568ca890b06380463cc/exporter/prometheusremotewriteexporter/exporter.go and/or samples produced for the same metric type at the same replica of the target without pod labels.

I'm consistently able to reproduce the problem in a Kubernetes cluster. To reproduce, run a single replica of the collector and try to collect from 50/60 targets or more and produce 40-50 metrics at each target .

@rakyll rakyll added the bug Something isn't working label Dec 21, 2020
@rakyll rakyll changed the title prometheusremotewriter is failing with "out of order samples" under load Prometheus remote write exporter is failing with "out of order samples" under load Dec 21, 2020
@bogdandrutu
Copy link
Member

bogdandrutu commented Jan 4, 2021

May this be backend error? Does the Prometheus remote write require samples to come in a specific ordering?

@rakyll
Copy link
Contributor Author

rakyll commented Jan 4, 2021

This is a backend error, but we must make sure our fan-out logic at the exporter is causing this error. Prometheus is an append-only system, so you can't ingest older data points to a series after appending relatively newer ones.

The second error might be potentially cased by a failure to label data points correctly. Assume you have N replicas of the same program exporting the same metric. If the scraper fails to label the data points with the source program's unique identifier, we might be sending duplicate samples for the same metric at the same timestamp.

@andrewhsu
Copy link
Member

talked about this at the collector sig mtg today, @rakyll can you provide more info on what you expect to see?

@jmacd
Copy link
Contributor

jmacd commented Jan 28, 2021

There's something here about how metrics are meant to be "single source". It should not be possible to get out-of-order samples in a correctly configured system, since each metric should come from a single source of truth. I wrote about how "single-homed" metrics are always easier in this document: https://docs.google.com/document/d/1hed5J_Lk_l1qz5rei4V7kDSzcIlr4b0qKG4m7VfbS0I/edit?usp=sharing

@rakyll
Copy link
Contributor Author

rakyll commented Apr 13, 2021

We produce remote write requests not in the choronological order:

first req ============> timeseries:<labels:<name:"name" value:"test_histogram0_count" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.1" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0_count" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.1" > samples:<value:0.4774478099834713 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.5" > samples:<value:0.8713202045000787 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_gauge0" > samples:<value:5.723645357596498 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_sum" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.005" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"1" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"+Inf" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0_sum" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.99" > samples:<value:0.9977474185974964 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_counter0_total" > samples:<timestamp:1618297785608 > >

second req ============> timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.005" > samples:<timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"1" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"+Inf" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0_sum" > samples:<value:16.256622262869485 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.5" > samples:<value:0.920758717698257 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_gauge0" > samples:<value:5.41446628663609 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_sum" > samples:<value:16.256622262869485 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.1" > samples:<timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0_count" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.1" > samples:<value:0.5881942475189074 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.99" > samples:<value:0.9984683987641793 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_counter0_total" > samples:<value:0.06368703615839877 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_count" > samples:<value:17 timestamp:1618297785602 > >

2021-04-13T07:09:45.676Z error exporterhelper/queued_retry.go:279 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: server returned HTTP status 400 Bad Request: user=516699956539_ws-ce1f837b-8384-42dc-8c55-4e3834673ad5: err: out of order sample. timestamp=2021-04-13T07:09:45.602Z, series={name="test_summary0_sum"} ", "dropped_items": 4}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
/Users/jbd/opentelemetry-collector/exporter/exporterhelper/queued_retry.go:279
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
/Users/jbd/opentelemetry-collector/exporter/exporterhelper/metricshelper.go:120
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
/Users/jbd/opentelemetry-collector/exporter/exporterhelper/queued_retry.go:152
github.com/jaegertracing/jaeger/pkg/queue.ConsumerFunc.Consume
/Users/jbd/go/pkg/mod/github.com/jaegertracing/jaeger@v1.22.0/pkg/queue/bounded_queue.go:104
github.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumersWithFactory.func1
/Users/jbd/go/pkg/mod/github.com/jaegertracing/jaeger@v1.22.0/pkg/queue/bounded_queue.go:83

Second request proto has an earlier timestamp than the first one.

@rakyll
Copy link
Contributor Author

rakyll commented Apr 13, 2021

In the exporter, the internal type we use for the incoming metric data (map[string]*prompb.TimeSeries{}) is a map and shuffles the order of the data. We need to preserve the order.

odeke-em added a commit to orijtech/opentelemetry-collector that referenced this issue Apr 14, 2021
…t of order errors

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes open-telemetry#2315
Fixes open-telemetry/prometheus-interoperability-spec#10
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this issue Apr 14, 2021
…t of order errors

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes open-telemetry#2315
Fixes open-telemetry/prometheus-interoperability-spec#10
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this issue Apr 15, 2021
…t of order errors

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes open-telemetry#2315
Fixes open-telemetry/prometheus-interoperability-spec#10
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this issue Apr 15, 2021
…t of order errors

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes open-telemetry#2315
Fixes open-telemetry/prometheus-interoperability-spec#10
bogdandrutu pushed a commit that referenced this issue Apr 15, 2021
…t of order errors (#2941)

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes #2315
Fixes open-telemetry/prometheus-interoperability-spec#10
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
…pen-telemetry#2315)

Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.23.0 to 1.24.0.
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](uber-go/zap@v1.23.0...v1.24.0)

---
updated-dependencies:
- dependency-name: go.uber.org/zap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants