From 0fa9f34755bd051f8b4b59e2706a105537fab8a3 Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 06:12:33 +0000 Subject: [PATCH 1/6] Add doc about how to work around missing metric expiry. --- content/en/faq/metrics-and-logs/metric-expiry.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 content/en/faq/metrics-and-logs/metric-expiry.md diff --git a/content/en/faq/metrics-and-logs/metric-expiry.md b/content/en/faq/metrics-and-logs/metric-expiry.md new file mode 100644 index 0000000000000..d7ea3b866e681 --- /dev/null +++ b/content/en/faq/metrics-and-logs/metric-expiry.md @@ -0,0 +1,15 @@ +--- +title: How to work around missing metric expiry with in-proxy telemetry? +weight: 20 +--- + +First step is to identify the metric name and label which has high cardinality. +Prometheus provides cardinality information at its `/status` page. +Based on the labels which have high cardinality, there are several ways to reduce the cardinality: + +* `destination_service` could cause high cardinality since for intra mesh traffic, destination service could fallback to host header if Istio proxy does not know which service the request heads to. + you can follow [metric customization](https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. + To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector. + [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. +* If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. +* If the information provided by the label is desired, you can use [prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. From 555ad424718a4581f1ab8a5382372d22f69c8ff3 Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 18:52:13 +0000 Subject: [PATCH 2/6] address comment. --- .../en/faq/metrics-and-logs/metric-expiry.md | 23 +++++++++++-------- .../metrics-and-logs/telemetry-v1-vs-v2.md | 5 ++-- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/content/en/faq/metrics-and-logs/metric-expiry.md b/content/en/faq/metrics-and-logs/metric-expiry.md index d7ea3b866e681..4e0e87b590ee0 100644 --- a/content/en/faq/metrics-and-logs/metric-expiry.md +++ b/content/en/faq/metrics-and-logs/metric-expiry.md @@ -1,15 +1,18 @@ --- -title: How to work around missing metric expiry with in-proxy telemetry? +title: How can I manage short-lived metrics? weight: 20 --- -First step is to identify the metric name and label which has high cardinality. -Prometheus provides cardinality information at its `/status` page. -Based on the labels which have high cardinality, there are several ways to reduce the cardinality: +Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its `/status` page. Additional information can be retrieved [via PromQL](https://www.robustperception.io/which-are-my-biggest-metrics). +There are several ways to reduce the cardinality of Istio metrics: -* `destination_service` could cause high cardinality since for intra mesh traffic, destination service could fallback to host header if Istio proxy does not know which service the request heads to. - you can follow [metric customization](https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. - To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector. - [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. -* If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. -* If the information provided by the label is desired, you can use [prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. +* Disable host header fallback. + The `destination_service` label is one potential source of high-cardinality. + The values for `destination_service` default to the host header if the Istio proxy is not able to determine the destination service from other request metadata. + If clients are using a variety of host headers, this could result in a large number of values for the `destination_service`. + For such case, follow [metric customization](https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. + To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector. + [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. +* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. +* Normalize label values, either through federation or classification. + If the information provided by the label is desired, you can use [prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. diff --git a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md index 20d6dd2b97c97..d8148651f851f 100644 --- a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md +++ b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md @@ -41,8 +41,9 @@ v2 which are listed below: * **No metric expiration for short-lived metrics** Mixer-based telemetry supported metric expiration whereby metrics which were not generated for a configurable amount of time were de-registered for - collection by Prometheus. This is useful in scenarios where short-lived jobs - surface telemetry only for a short amount of time, and de-registering + collection by Prometheus. This is useful in scenarios where short-lived metrics + only surface for a short amount of time, and de-registering the metrics prevents reporting of metrics which would no longer change in the future, thereby reducing network traffic and storage in Prometheus. This expiration mechanism is not available in in-proxy telemetry. + The workaround for this can be found [here](/faq/metrics-and-logs/#metric-expiry). From 498643ddade7d56245a43c59dd9539642c60a721 Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 19:13:21 +0000 Subject: [PATCH 3/6] lint --- content/en/faq/metrics-and-logs/metric-expiry.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/faq/metrics-and-logs/metric-expiry.md b/content/en/faq/metrics-and-logs/metric-expiry.md index 4e0e87b590ee0..a3a68a5b8b889 100644 --- a/content/en/faq/metrics-and-logs/metric-expiry.md +++ b/content/en/faq/metrics-and-logs/metric-expiry.md @@ -15,4 +15,4 @@ There are several ways to reduce the cardinality of Istio metrics: [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. * Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. * Normalize label values, either through federation or classification. - If the information provided by the label is desired, you can use [prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. + If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. From e0a6a90c25f45c91c340436f1660eed69876c09c Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 19:21:12 +0000 Subject: [PATCH 4/6] add spelling change --- .spelling | 1 + 1 file changed, 1 insertion(+) diff --git a/.spelling b/.spelling index b9e79699157f8..18072befab784 100644 --- a/.spelling +++ b/.spelling @@ -556,6 +556,7 @@ prepending prepends prober programmatically +PromQL proto protobuf protoc From 22c860e58f8e99fcc986c8258a3bd5e497f984fb Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 19:31:54 +0000 Subject: [PATCH 5/6] fix --- content/en/faq/metrics-and-logs/metric-expiry.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/en/faq/metrics-and-logs/metric-expiry.md b/content/en/faq/metrics-and-logs/metric-expiry.md index a3a68a5b8b889..04d4bf3b3354f 100644 --- a/content/en/faq/metrics-and-logs/metric-expiry.md +++ b/content/en/faq/metrics-and-logs/metric-expiry.md @@ -10,9 +10,9 @@ There are several ways to reduce the cardinality of Istio metrics: The `destination_service` label is one potential source of high-cardinality. The values for `destination_service` default to the host header if the Istio proxy is not able to determine the destination service from other request metadata. If clients are using a variety of host headers, this could result in a large number of values for the `destination_service`. - For such case, follow [metric customization](https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. + In this case, follow the [metric customization](/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide. To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector. [This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this. -* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop is from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. +* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop it from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`. * Normalize label values, either through federation or classification. If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label. From 193bc59dead92affc6decfffa342f3e79db62209 Mon Sep 17 00:00:00 2001 From: Pengyuan Bian Date: Wed, 10 Feb 2021 12:49:33 -0800 Subject: [PATCH 6/6] Update content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md Co-authored-by: Douglas Reid --- content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md index d8148651f851f..4053df06411c6 100644 --- a/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md +++ b/content/en/faq/metrics-and-logs/telemetry-v1-vs-v2.md @@ -41,8 +41,7 @@ v2 which are listed below: * **No metric expiration for short-lived metrics** Mixer-based telemetry supported metric expiration whereby metrics which were not generated for a configurable amount of time were de-registered for - collection by Prometheus. This is useful in scenarios where short-lived metrics - only surface for a short amount of time, and de-registering + collection by Prometheus. This is useful in scenarios, such as one-off jobs, that generate short-lived metrics. De-registering the metrics prevents reporting of metrics which would no longer change in the future, thereby reducing network traffic and storage in Prometheus. This expiration mechanism is not available in in-proxy telemetry.