exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

odeke-em · 2021-04-14T23:38:55Z

Ensures that before a prompb.WriteRequest is created, that the
TimeSeries contained have their Sample values sorted chronologically
by Prometheus to avoid out of order errors reported by Prometheus
barfing.

Thanks to @rakyll for a reproducer and for diagnosing the problem, which
helped distill the issue from a very complex setup that required super
expensive Kubernetes clusters with many replicas, but essentially the problem
became more apparently when the number of TimeSeries grew.

It is important to note that the presence of such a bug signifies that
with a large number of replicas are being scraped from, this stalls
scraping and takes a long time which means that targets scraped in a
round-robin fashion experience staleness when many. This might be even
more reasons for setups to adopt a push model as opposed to scrape
endpoints.

Fixes #2315
Fixes open-telemetry/prometheus-interoperability-spec#10

codecov · 2021-04-14T23:46:59Z

Codecov Report

Merging #2941 (8b97422) into main (b4f495f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2941   +/-   ##
=======================================
  Coverage   91.66%   91.66%           
=======================================
  Files         312      312           
  Lines       15312    15317    +5     
=======================================
+ Hits        14036    14041    +5     
  Misses        870      870           
  Partials      406      406

Impacted Files	Coverage Δ
exporter/prometheusremotewriteexporter/helper.go	`99.55% <100.00%> (+<0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4f495f...8b97422. Read the comment docs.

exporter/prometheusremotewriteexporter/helper.go

@rakyll

…t of order errors Ensures that before a prompb.WriteRequest is created, that the TimeSeries contained have their Sample values sorted chronologically by Prometheus to avoid out of order errors reported by Prometheus barfing. Thanks to @rakyll for a reproducer and for diagnosing the problem, which helped distill the issue from a very complex setup that required super expensive Kubernetes clusters with many replicas, but essentially the problem became more apparently when the number of TimeSeries grew. It is important to note that the presence of such a bug signifies that with a large number of replicas are being scraped from, this stalls scraping and takes a long time which means that targets scraped in a round-robin fashion experience staleness when many. This might be even more reasons for setups to adopt a push model as opposed to scrape endpoints. Fixes open-telemetry#2315 Fixes open-telemetry/prometheus-interoperability-spec#10

rakyll · 2021-04-16T03:54:28Z

Thanks much for this PR! I've been testing the exporter with this change and can't reproduce the bug anymore. I'm using relabel_configs to have pod name in the samples to avoid the duplicates that might be potentially coming from the replicas of my app.

rakyll · 2021-04-16T04:43:59Z

I spoke too soon, filed #2949 for a follow up.

Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 4 to 5. - [Release notes](https://github.com/peter-evans/create-pull-request/releases) - [Commits](peter-evans/create-pull-request@v4...v5) --- updated-dependencies: - dependency-name: peter-evans/create-pull-request dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

odeke-em requested a review from a team April 14, 2021 23:38

bogdandrutu approved these changes Apr 15, 2021

View reviewed changes

exporter/prometheusremotewriteexporter/helper.go Outdated Show resolved Hide resolved

odeke-em force-pushed the prometheus-remotewriteexporter-fix-out-of-order-samples branch from 4c6e148 to 3f40ffc Compare April 15, 2021 00:13

bogdandrutu reviewed Apr 15, 2021

View reviewed changes

exporter/prometheusremotewriteexporter/helper.go Outdated Show resolved Hide resolved

odeke-em force-pushed the prometheus-remotewriteexporter-fix-out-of-order-samples branch from 3f40ffc to 8b97422 Compare April 15, 2021 04:24

bogdandrutu merged commit 03c7bf9 into open-telemetry:main Apr 15, 2021

odeke-em deleted the prometheus-remotewriteexporter-fix-out-of-order-samples branch June 12, 2021 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

odeke-em commented Apr 14, 2021

codecov bot commented Apr 14, 2021 •

edited

Loading

rakyll commented Apr 16, 2021 •

edited

Loading

rakyll commented Apr 16, 2021

exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

exporter/prometheusremotewriter: sort Sample by Timestamp to avoid out of order errors #2941

Conversation

odeke-em commented Apr 14, 2021

codecov bot commented Apr 14, 2021 • edited Loading

Codecov Report

rakyll commented Apr 16, 2021 • edited Loading

rakyll commented Apr 16, 2021

codecov bot commented Apr 14, 2021 •

edited

Loading

rakyll commented Apr 16, 2021 •

edited

Loading