-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RollingSummary to prevent summary saturation #306
Conversation
In a staging environment I have ran into this error:
I'll try to find a simple way to reproduce. |
First off, thank you for submitting a solution for this problem. :) I'll need to see what the benchmarks look like, but the tracking of a timestamp per recorded value in a histogram is a bit of a red flag to me, performance-wise. There's some thoughts I have in mind for how to avoid it while still providing proper time bucketing/rolling, but I'll wait to see what the benchmarks look like first before getting into any refactoring. |
Alright, apologies for the delay. After looking at the change, and running I think what I'd want to see is localizing the change to As I try and get closer and closer to what a 1.0 release might look like, I'm striving to avoid continually changing every crate in the workspace due to changes that primarily occur in only one of them. |
That makes sense, I will try to make the suggested changes this week. |
I have updated the PR as suggested, please let me know what you think. BTW, the issue linked above with ddsketch is not yet resolved and will need to be before this could be merged. |
Taking a cursory look, this a lot more in line with what I was thinking. 👍🏻 There's probably some performance to be squeezed out if we lowered the timing accuracy by bucketing with a background task on an interval, rather than getting the timestamp at record time, but I don't think this matters a ton for users using the Prometheus exporter anyways. I'll do a more in-depth path, including looking at the DDSketch issue you mentioned, in the next few days. Thank you for making the requested changes. |
Bah, looks like |
The ddsketch issue has been fixed in a new 0.2.0 release. With it we can simplify the |
I updated the CI workflow to bump the MSRV to 1.56.1 which seems to get the MSRV-specific unit tests passing again, so I think you should be good if you merge in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was finally able to take a look through, and this looks solid to me.
Most of my thoughts/concerns are nits or can be addressed in the future (i.e. configurable summary window/bucket width, enforcing monotonicity of samples to drop a lot of the logic in add
, etc) and nothing here feels like a design pitfall.
Thanks for your contribution! <3 |
Released as |
- Add metrics::SharedString and update function signatures in describe_* methods to have desc be a SharedString. - Update metrics and friends to their latest (0.21, 0.12.1, 0.15) - Update broken test due to a fixed bug (i assume) in the summary calculation: see: metrics-rs/metrics#306 - Unify MYSQL_* env variables used in benchmarks and generator - benchmarks: Add .into() for each of the metrics descriptions - .with_push_gateway requires user/pass to be passed, passing in None for now but this should probably become configurable as an env var - Update tikv-jemallocator to 0.5.0 to remove a future incompat message from Rust Change-Id: If92d136dd76c29b90623c36e7ff5ceb892089f01 Reviewed-on: https://gerrit.readyset.name/c/readyset/+/4892 Tested-by: Buildkite CI Reviewed-by: Griffin Smith <griffin@readyset.io>
* fix prometheus metric name and label key sanitizer (metrics-rs#296) Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> * metrics-util: add ability to collect metrics on a per-thread basis via DebuggingRecorder (metrics-rs#299) Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com> * (cargo-release) version 0.12.1 * Improve handling of the global recorder instance (metrics-rs#302) This gets rid of the dangerous `static mut`, adds more comments about the code, relaxes the orderings and documents the unsoundness of the `clear` function, in addition to marking it unsafe. It implements a small once_cell-like abstraction. Co-authored-by: Toby Lawrence <toby@nuclearfurnace.com> * Fix `metrics::Cow` provenance issue (metrics-rs#303) * Quantile Remapping Fix (metrics-rs#304) * update changelogs prior to release * (cargo-release) version 0.19.0 * (cargo-release) version 0.13.0 * (cargo-release) version 0.11.0 * (cargo-release) version 0.10.0 * Update CHANGELOG.md * Update README.md * Update ci.yml * Update ci.yml * Add RollingSummary to prevent summary saturation (metrics-rs#306) * Update quanta requirement from 0.9.3 to 0.10.0 (metrics-rs#301) Updates the requirements on [quanta](https://github.com/metrics-rs/quanta) to permit the latest version. - [Release notes](https://github.com/metrics-rs/quanta/releases) - [Changelog](https://github.com/metrics-rs/quanta/blob/main/CHANGELOG.md) - [Commits](metrics-rs/quanta@v0.9.3...v0.10.0) --- updated-dependencies: - dependency-name: quanta dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Document CI enforced msrv in readme and rust-version fields (metrics-rs#311) * Change description to be SharedString (metrics-rs#312) * Update hashbrown requirement from 0.11 to 0.12 (metrics-rs#266) Updates the requirements on [hashbrown](https://github.com/rust-lang/hashbrown) to permit the latest version. - [Release notes](https://github.com/rust-lang/hashbrown/releases) - [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md) - [Commits](rust-lang/hashbrown@v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: hashbrown dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> * Shims remaining AtomicU64 usage (metrics-rs#313) * Update parking_lot requirement from 0.11 to 0.12 (metrics-rs#268) Updates the requirements on [parking_lot](https://github.com/Amanieu/parking_lot) to permit the latest version. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](Amanieu/parking_lot@0.11.0...0.12.0) --- updated-dependencies: - dependency-name: parking_lot dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> * update changelogs * (cargo-release) version 0.13.1 * add std atomics handle support back + changelogs * (cargo-release) version 0.20.0 * (cargo-release) version 0.14.0 * (cargo-release) version 0.11.0 * changelog * (cargo-release) version 0.12.0 * (cargo-release) version 0.7.0 * update changelog * (cargo-release) version 0.6.0 * update changelog * (cargo-release) version 0.20.1 * Remove incorrect return info (metrics-rs#316) * Add a `KeyName` argument to `LabelFilter::should_include_label` (metrics-rs#342) * rewind * (cargo-release) version 0.13.0 * Use std sync primitives instead of parking_lot. (metrics-rs#344) * Use std::sync::Mutex instead of parking_lot::Mutex. * Bump MSRV to 1.60.0 for CI. Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> * clean up github CI workflow + rust-toolchain.toml to match quanta * update portable-atomic to 1.0 * bump to prost 0.11 + fix spelling issue in metrics-tracing-context * bump MSRV to 1.61 + update hashbrown/ahash deps * update termion/ordered-float and pin predicates-* to avoid stupid 1.64 MSRV * update syn * update quanta + clean up semver notation in cargo.toml * cleanup README wording for MSRV policy * Add white background to splash image (metrics-rs#348) * Install protoc in CI. * fix spelling error in CI workflow * Update ci.yml * no need to run CI against macOS/Windows specifically * Bring the metrics-observer protobufs up to date (metrics-rs#345) * Use global paths for macros (metrics-rs#358) * fix changes to fully qualified metrics crate ref change in macros * update changelog * update tui to 0.19 * bump numpy dep * impl std::error::Error for metrics_exporter_tcp::Error * changelog * tweak test to avoid unused code * rework 32 vs 64-bit arch atomics support + a lot of import consolidation/cleanup * const-ify some stuff + rewrite some comments for the global recorder init code * clean up clippy lints * bump MSRV to 1.61.0 * (cargo-release) version 0.7.0 * (cargo-release) version 0.21.0 * (cargo-release) version 0.15.0 * (cargo-release) version 0.14.0 * (cargo-release) version 0.8.0 * (cargo-release) version 0.12.0 * allow publishing * (cargo-release) version 0.2.0 * make it publishable pt 2 * push-gateway support authentication (metrics-rs#366) * update changelog + fix failing feature check test * (cargo-release) version 0.12.1 * feat(util): new helper type for recovering recorder after installing it (metrics-rs#362) * Update aho-corasick to 1.0. * Impl From<std::borrow::Cow> for KeyName (metrics-rs#378) * changelog * Add `Borrow` impl to `KeyName` (metrics-rs#381) * bump deps + clippy stuff * changelog * pin hashbrown to avoid MSRV bump * (cargo-release) version 0.15.1 * (cargo-release) version 0.21.1 * Add metadata to metrics (metrics-rs#380) * add https support in Prometheus gateway (metrics-rs#392) * migrate from procedural to declarative macros (metrics-rs#386) * Make `Unit` methods return `&'static str` where possible (metrics-rs#393) * simplify macros (metrics-rs#394) * Add support for `Arc<T>` to `metrics::Cow<'a, T>`. (metrics-rs#402) * Add support for `tracing::Span::record`ed fields in `metrics-tracing-context` (metrics-rs#408) * fix(prom): `RollingSummary` overflow panic (metrics-rs#423) * update CHANGELOG * (cargo-release) version 0.12.2 * Remove unneeded unsafe from test (metrics-rs#427) * Fix feature check in CI (metrics-rs#428) * update changelogs/release notes * fix unsafe/incorrect crossbeam-epoch usage in Block<T> * Add a clippy CI check (metrics-rs#416) (metrics-rs#417) * Try other resolved addresses if the first one fails (metrics-rs#429) * update changelog * Update quanta requirement from 0.11 to 0.12 (metrics-rs#396) Updates the requirements on [quanta](https://github.com/metrics-rs/quanta) to permit the latest version. - [Changelog](https://github.com/metrics-rs/quanta/blob/v0.12.0/CHANGELOG.md) - [Commits](metrics-rs/quanta@v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: quanta dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> * Update ordered-float requirement from 3.4 to 4.2 (metrics-rs#421) Updates the requirements on [ordered-float](https://github.com/reem/rust-ordered-float) to permit the latest version. - [Release notes](https://github.com/reem/rust-ordered-float/releases) - [Commits](reem/rust-ordered-float@v3.4.0...v4.2.0) --- updated-dependencies: - dependency-name: ordered-float dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix quantile api * missing clear * reintroduce old way to avoid big refactor --------- Signed-off-by: Toby Lawrence <toby@nuclearfurnace.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Shaoyuan CHEN <chensy20@mails.tsinghua.edu.cn> Co-authored-by: Toby Lawrence <tobz@users.noreply.github.com> Co-authored-by: Toby Lawrence <toby@nuclearfurnace.com> Co-authored-by: nils <48135649+Nilstrieb@users.noreply.github.com> Co-authored-by: Dan Wilbanks <78116928+dfwilbanks395@users.noreply.github.com> Co-authored-by: Daniel Nelson <daniel@wavesofdawn.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lucas Kent <rubickent@gmail.com> Co-authored-by: Fredrik Enestad <fredrik@enestad.com> Co-authored-by: Christopher Hunt <huntchr@gmail.com> Co-authored-by: zohnannor <35764628+zohnannor@users.noreply.github.com> Co-authored-by: Sinotov Aleksandr <bratsinot@gmail.com> Co-authored-by: Jacob Kiesel <kieseljake@live.com> Co-authored-by: C J Silverio <ceejceej@gmail.com> Co-authored-by: CinchBlue <aytasato@gmail.com> Co-authored-by: JasonLi <lijingxuan92@126.com> Co-authored-by: Mostafa Elhemali <mostafa.elhemaly@gmail.com> Co-authored-by: Harry Barber <106155934+hlbarber@users.noreply.github.com> Co-authored-by: Qingwen Zhao <864593343@qq.com> Co-authored-by: david-perez <d@vidp.dev> Co-authored-by: Lucio Franco <luciofranco14@gmail.com> Co-authored-by: Nicolas Stinus <nicolas.stinus@gmail.com> Co-authored-by: Valeriy V. Vorotyntsev <valery.vv@gmail.com>
This PR attempts to address the issue described in #269 where Prometheus
Summaries become saturated. I added a RollingSummary type that manages a list
of Summary so that old values can be retired.
There is a breaking change in
AtomicStorage
:I tested using a program that produces a test pattern, every two minutes it
switches between uniform random values between 0-100 and 100-200.
I also graph the summary
rate(sum[1m]) / rate(count[1m])
for comparison tothe 50th quantile.
Before this patch:
With this patch: