diff --git a/.spelling b/.spelling index b9e79699157f8..18072befab784 100644 --- a/.spelling +++ b/.spelling @@ -556,6 +556,7 @@ prepending prepends prober programmatically +PromQL proto protobuf protoc diff --git a/content/en/faq/metrics-and-logs/metric-expiry.md b/content/en/faq/metrics-and-logs/metric-expiry.md new file mode 100644 index 0000000000000..04d4bf3b3354f --- /dev/null +++ b/content/en/faq/metrics-and-logs/metric-expiry.md @@ -0,0 +1,18 @@ +--- +title: How can I manage short-lived metrics? +weight: 20 +--- + +Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its `/status` page. Additional information can be retrieved [via PromQL](https://www.robustperception.io/which-are-my-biggest-metrics). +There are several ways to reduce the cardinality of Istio metrics: + +* Disable host header fallback. + The `destination_service` label is one potential source of high-cardinality. + The values for `destination_service` default to the host header if the Istio proxy is not able to determine the destination service from other request metadata. + If clients are using a variety of host headers, this could result in a large number of values for the `destination_service`. + In this case, follow the [metric customization](/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. + To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector. + [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. +* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop it from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. +* Normalize label values, either through federation or classification. + If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. diff --git a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md index 20d6dd2b97c97..4053df06411c6 100644 --- a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md +++ b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md @@ -41,8 +41,8 @@ v2 which are listed below: * **No metric expiration for short-lived metrics** Mixer-based telemetry supported metric expiration whereby metrics which were not generated for a configurable amount of time were de-registered for - collection by Prometheus. This is useful in scenarios where short-lived jobs - surface telemetry only for a short amount of time, and de-registering + collection by Prometheus. This is useful in scenarios, such as one-off jobs, that generate short-lived metrics. De-registering the metrics prevents reporting of metrics which would no longer change in the future, thereby reducing network traffic and storage in Prometheus. This expiration mechanism is not available in in-proxy telemetry. + The workaround for this can be found [here](/faq/metrics-and-logs/#metric-expiry).