-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aged summaries report as zero, but never pruned #492
Comments
We have a related issue. We collect metrics from a number of devices. Each device will typically be online for some hours, and then offline for the rest of the day. The period of activity varies from device to device. In grafana we have dashboards that summarize those metrics (basically computing avg, std, etc) so that we can have an overview of our fleet of devices. The problem is that some devices sometimes report very extreme values before they go offline. Those extreme values will be exported as long as the devices are offline, and will have some undesired side effects on the avg, std we compute in grafana. Ideally, when a device goes offline, and metrics are not updated anymore through the prom-client, we would like the metrics for those devices not to be exposed anymore, until the device is back online and fresh metrics are produced. There is an interesting discussion on this topic in the Prometheus documentation for how to write exporters. See section "Pushes", "Firstly, when do you expire metrics?" Do you have any plan to provide some TTL (time to live) option or some expiry delay to stop old/stale metrics from being exposed? |
Reporting web requests with path as a label is almost impractical, since there will be always some thrash routes (probably security breachs on some servers) which will never expire. Couldn't we all agree that pruning a expired summary labelset value instead of zeroing it could be at least a configuration option ? I think it could even be the default behavior |
I think this is only relevant to using pushgateway. Nonetheless, I understand the use case/issue. Do other prometheus client libraries have this TTL/pruning feature? |
Not exactly with summary, but cadvisor doesn't export dead containers metrics, i.e. |
We have a chronic issue with one of our services, where over time the metric count per instance will grow to the point it overwhelms our prometheus scraper. I think I've tracked it down to stale label sets that have been aged out (zeroed), but are still reporting on the endpoint. Is there any functionality to reap stale summaries, rather than continuing to report them as
0
?The text was updated successfully, but these errors were encountered: