Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc about how to work around missing metric expiry. #8948

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@ prepending
prepends
prober
programmatically
PromQL
proto
protobuf
protoc
Expand Down
18 changes: 18 additions & 0 deletions content/en/faq/metrics-and-logs/metric-expiry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: How can I manage short-lived metrics?
weight: 20
---

Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its `/status` page. Additional information can be retrieved [via PromQL](https://www.robustperception.io/which-are-my-biggest-metrics).
There are several ways to reduce the cardinality of Istio metrics:

* Disable host header fallback.
The `destination_service` label is one potential source of high-cardinality.
The values for `destination_service` default to the host header if the Istio proxy is not able to determine the destination service from other request metadata.
If clients are using a variety of host headers, this could result in a large number of values for the `destination_service`.
For such case, follow [metric customization](https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide.
bianpengyuan marked this conversation as resolved.
Show resolved Hide resolved
To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector.
[This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this.
* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`.
bianpengyuan marked this conversation as resolved.
Show resolved Hide resolved
* Normalize label values, either through federation or classification.
If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label.
5 changes: 3 additions & 2 deletions content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ v2 which are listed below:
* **No metric expiration for short-lived metrics**
Mixer-based telemetry supported metric expiration whereby metrics which were
not generated for a configurable amount of time were de-registered for
collection by Prometheus. This is useful in scenarios where short-lived jobs
surface telemetry only for a short amount of time, and de-registering
collection by Prometheus. This is useful in scenarios where short-lived metrics
only surface for a short amount of time, and de-registering
bianpengyuan marked this conversation as resolved.
Show resolved Hide resolved
the metrics prevents reporting of metrics which would no longer change in the
future, thereby reducing network traffic and storage in Prometheus.
This expiration mechanism is not available in in-proxy telemetry.
The workaround for this can be found [here](/faq/metrics-and-logs/#metric-expiry).