Aged summaries report as zero, but never pruned #492

babbottscott · 2022-02-10T02:09:52Z

We have a chronic issue with one of our services, where over time the metric count per instance will grow to the point it overwhelms our prometheus scraper. I think I've tracked it down to stale label sets that have been aged out (zeroed), but are still reporting on the endpoint. Is there any functionality to reap stale summaries, rather than continuing to report them as 0?

The text was updated successfully, but these errors were encountered:

brice-morin · 2022-06-01T21:39:13Z

We have a related issue. We collect metrics from a number of devices. Each device will typically be online for some hours, and then offline for the rest of the day. The period of activity varies from device to device. In grafana we have dashboards that summarize those metrics (basically computing avg, std, etc) so that we can have an overview of our fleet of devices. The problem is that some devices sometimes report very extreme values before they go offline. Those extreme values will be exported as long as the devices are offline, and will have some undesired side effects on the avg, std we compute in grafana.

Ideally, when a device goes offline, and metrics are not updated anymore through the prom-client, we would like the metrics for those devices not to be exposed anymore, until the device is back online and fresh metrics are produced.

There is an interesting discussion on this topic in the Prometheus documentation for how to write exporters. See section "Pushes", "Firstly, when do you expire metrics?"

Do you have any plan to provide some TTL (time to live) option or some expiry delay to stop old/stale metrics from being exposed?

rilpires · 2023-02-08T14:57:36Z

Reporting web requests with path as a label is almost impractical, since there will be always some thrash routes (probably security breachs on some servers) which will never expire.

Couldn't we all agree that pruning a expired summary labelset value instead of zeroing it could be at least a configuration option ? I think it could even be the default behavior

zbjornson · 2023-02-08T20:01:17Z

There is an interesting discussion on this topic in the Prometheus documentation for how to write exporters. See section "Pushes", "Firstly, when do you expire metrics?"

I think this is only relevant to using pushgateway.

Nonetheless, I understand the use case/issue. Do other prometheus client libraries have this TTL/pruning feature?

rilpires · 2023-02-10T11:55:37Z

Nonetheless, I understand the use case/issue. Do other prometheus client libraries have this TTL/pruning feature?

Not exactly with summary, but cadvisor doesn't export dead containers metrics, i.e.

zbjornson · 2023-03-08T03:36:07Z

Looks like this was fixed in #540, thanks @rilpires.

rilpires mentioned this issue Feb 8, 2023

Option to erase series without requests for a given time jochen-schweizer/express-prom-bundle#116

Open

rilpires mentioned this issue Feb 8, 2023

'pruneAgedBuckets' config option for summary #540

Merged

zbjornson closed this as completed Mar 8, 2023

kj800x mentioned this issue Jan 18, 2024

Support TTL on Guages #607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aged summaries report as zero, but never pruned #492

Aged summaries report as zero, but never pruned #492

babbottscott commented Feb 10, 2022

brice-morin commented Jun 1, 2022

rilpires commented Feb 8, 2023 •

edited

Loading

zbjornson commented Feb 8, 2023

rilpires commented Feb 10, 2023

zbjornson commented Mar 8, 2023

Aged summaries report as zero, but never pruned #492

Aged summaries report as zero, but never pruned #492

Comments

babbottscott commented Feb 10, 2022

brice-morin commented Jun 1, 2022

rilpires commented Feb 8, 2023 • edited Loading

zbjornson commented Feb 8, 2023

rilpires commented Feb 10, 2023

zbjornson commented Mar 8, 2023

rilpires commented Feb 8, 2023 •

edited

Loading