-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus remote write exporter is failing with "out of order samples" under load #2315
Comments
May this be backend error? Does the Prometheus remote write require samples to come in a specific ordering? |
This is a backend error, but we must make sure our fan-out logic at the exporter is causing this error. Prometheus is an append-only system, so you can't ingest older data points to a series after appending relatively newer ones. The second error might be potentially cased by a failure to label data points correctly. Assume you have N replicas of the same program exporting the same metric. If the scraper fails to label the data points with the source program's unique identifier, we might be sending duplicate samples for the same metric at the same timestamp. |
talked about this at the collector sig mtg today, @rakyll can you provide more info on what you expect to see? |
There's something here about how metrics are meant to be "single source". It should not be possible to get out-of-order samples in a correctly configured system, since each metric should come from a single source of truth. I wrote about how "single-homed" metrics are always easier in this document: https://docs.google.com/document/d/1hed5J_Lk_l1qz5rei4V7kDSzcIlr4b0qKG4m7VfbS0I/edit?usp=sharing |
We produce remote write requests not in the choronological order: first req ============> timeseries:<labels:<name:"name" value:"test_histogram0_count" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.1" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0_count" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.1" > samples:<value:0.4774478099834713 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.5" > samples:<value:0.8713202045000787 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_gauge0" > samples:<value:5.723645357596498 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_sum" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.005" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"1" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"+Inf" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0_sum" > samples:<timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.99" > samples:<value:0.9977474185974964 timestamp:1618297785608 > > timeseries:<labels:<name:"name" value:"test_counter0_total" > samples:<timestamp:1618297785608 > > second req ============> timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.005" > samples:<timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"1" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"+Inf" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0_sum" > samples:<value:16.256622262869485 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.5" > samples:<value:0.920758717698257 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_gauge0" > samples:<value:5.41446628663609 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_sum" > samples:<value:16.256622262869485 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_bucket" > labels:<name:"le" value:"0.1" > samples:<timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0_count" > samples:<value:17 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.1" > samples:<value:0.5881942475189074 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_summary0" > labels:<name:"quantile" value:"0.99" > samples:<value:0.9984683987641793 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_counter0_total" > samples:<value:0.06368703615839877 timestamp:1618297785602 > > timeseries:<labels:<name:"name" value:"test_histogram0_count" > samples:<value:17 timestamp:1618297785602 > > 2021-04-13T07:09:45.676Z error exporterhelper/queued_retry.go:279 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: server returned HTTP status 400 Bad Request: user=516699956539_ws-ce1f837b-8384-42dc-8c55-4e3834673ad5: err: out of order sample. timestamp=2021-04-13T07:09:45.602Z, series={name="test_summary0_sum"} ", "dropped_items": 4} Second request proto has an earlier timestamp than the first one. |
In the exporter, the internal type we use for the incoming metric data (map[string]*prompb.TimeSeries{}) is a map and shuffles the order of the data. We need to preserve the order. |
…t of order errors Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes open-telemetry#2315 Fixes open-telemetry/prometheus-interoperability-spec#10
…t of order errors Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes open-telemetry#2315 Fixes open-telemetry/prometheus-interoperability-spec#10
…t of order errors Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes open-telemetry#2315 Fixes open-telemetry/prometheus-interoperability-spec#10
…t of order errors Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes open-telemetry#2315 Fixes open-telemetry/prometheus-interoperability-spec#10
…t of order errors (#2941) Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes #2315 Fixes open-telemetry/prometheus-interoperability-spec#10
…pen-telemetry#2315) Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.23.0 to 1.24.0. - [Release notes](https://github.com/uber-go/zap/releases) - [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md) - [Commits](uber-go/zap@v1.23.0...v1.24.0) --- updated-dependencies: - dependency-name: go.uber.org/zap dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The Prometheus remote writer exporter is failing with the following errors under load:
These issues can be potentially caused by the concurrent writes at https://github.com/open-telemetry/opentelemetry-collector/blob/a16c599d26b6db7a95489568ca890b06380463cc/exporter/prometheusremotewriteexporter/exporter.go and/or samples produced for the same metric type at the same replica of the target without pod labels.
I'm consistently able to reproduce the problem in a Kubernetes cluster. To reproduce, run a single replica of the collector and try to collect from 50/60 targets or more and produce 40-50 metrics at each target .
The text was updated successfully, but these errors were encountered: