Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Merging should ignore non 2xx service metrics #546

Closed
lawliet89 opened this issue Jun 24, 2021 · 1 comment · Fixed by #551
Closed

Metrics Merging should ignore non 2xx service metrics #546

lawliet89 opened this issue Jun 24, 2021 · 1 comment · Fixed by #551

Comments

@lawliet89
Copy link
Contributor

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

When Consul Connect has metrics merging enabled, the merged metrics should ignore service metrics that return non 2xx responses. For example, the following service does not have any metrics and the merged metrics is invalid:

image

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Deploy services without service metrics and add it to the service mesh
  2. Enable metrics merging

Consul info for both Client and Server

Version 1.10

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

GKE 1.20

Log Fragments

Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

N/A

@david-yu
Copy link
Contributor

Hi @lawliet89 I'm going to move this over to consul-k8s since it looks that it is probably more appropriately handled there.

@david-yu david-yu transferred this issue from hashicorp/consul Jun 25, 2021
lkysow added a commit that referenced this issue Jul 5, 2021
* If Envoy returns an error then also respond with a 500 in our merged
metrics response so that Prometheus will know that we had an error, not
that there are no metrics.
* If the service metrics return with a non-2xx status code then don't
include the response body in the merged metrics. This will stop issues
where users accidentally turn on metrics merging but they don't have an
exporter and so their metrics endpoint returns 404. I could have
responded with a 500 in this case in order to indicate that there is an
error, however I think it's more likely that users are accidentally
turning on metrics merging and the error indication is accomplished via
a new metric (see below).
* Append a new metric that indicates the success of the service
scraping. This can be used for alerting by users since the response code
of the service metrics response is discarded:
  * success: consul_metrics_merging_service_metrics_success 1
  * fail: consul_metrics_merging_service_metrics_success 0
* modify logging to use key/value pairs
* Fixes #546
lkysow added a commit that referenced this issue Jul 10, 2021
* If Envoy returns an error then also respond with a 500 in our merged
metrics response so that Prometheus will know that we had an error, not
that there are no metrics.
* If the service metrics return with a non-2xx status code then don't
include the response body in the merged metrics. This will stop issues
where users accidentally turn on metrics merging but they don't have an
exporter and so their metrics endpoint returns 404. I could have
responded with a 500 in this case in order to indicate that there is an
error, however I think it's more likely that users are accidentally
turning on metrics merging and the error indication is accomplished via
a new metric (see below).
* Append a new metric that indicates the success of the service
scraping. This can be used for alerting by users since the response code
of the service metrics response is discarded:
  * success: consul_metrics_merging_service_metrics_success 1
  * fail: consul_metrics_merging_service_metrics_success 0
* modify logging to use key/value pairs
* Fixes #546
lkysow added a commit that referenced this issue Nov 23, 2021
* If Envoy returns an error then also respond with a 500 in our merged
metrics response so that Prometheus will know that we had an error, not
that there are no metrics.
* If the service metrics return with a non-2xx status code then don't
include the response body in the merged metrics. This will stop issues
where users accidentally turn on metrics merging but they don't have an
exporter and so their metrics endpoint returns 404. I could have
responded with a 500 in this case in order to indicate that there is an
error, however I think it's more likely that users are accidentally
turning on metrics merging and the error indication is accomplished via
a new metric (see below).
* Append a new metric that indicates the success of the service
scraping. This can be used for alerting by users since the response code
of the service metrics response is discarded:
  * success: consul_metrics_merging_service_metrics_success 1
  * fail: consul_metrics_merging_service_metrics_success 0
* modify logging to use key/value pairs
* Fixes #546
lkysow added a commit that referenced this issue Dec 6, 2021
* If Envoy returns an error then also respond with a 500 in our merged
metrics response so that Prometheus will know that we had an error, not
that there are no metrics.
* If the service metrics return with a non-2xx status code then don't
include the response body in the merged metrics. This will stop issues
where users accidentally turn on metrics merging but they don't have an
exporter and so their metrics endpoint returns 404. I could have
responded with a 500 in this case in order to indicate that there is an
error, however I think it's more likely that users are accidentally
turning on metrics merging and the error indication is accomplished via
a new metric (see below).
* Append a new metric that indicates the success of the service
scraping. This can be used for alerting by users since the response code
of the service metrics response is discarded:
  * success: consul_metrics_merging_service_metrics_success 1
  * fail: consul_metrics_merging_service_metrics_success 0
* modify logging to use key/value pairs
* Fixes #546
lkysow added a commit that referenced this issue Dec 6, 2021
* If Envoy returns an error then also respond with a 500 in our merged
metrics response so that Prometheus will know that we had an error, not
that there are no metrics.
* If the service metrics return with a non-2xx status code then don't
include the response body in the merged metrics. This will stop issues
where users accidentally turn on metrics merging but they don't have an
exporter and so their metrics endpoint returns 404. I could have
responded with a 500 in this case in order to indicate that there is an
error, however I think it's more likely that users are accidentally
turning on metrics merging and the error indication is accomplished via
a new metric (see below).
* Append a new metric that indicates the success of the service
scraping. This can be used for alerting by users since the response code
of the service metrics response is discarded:
  * success: consul_metrics_merging_service_metrics_success 1
  * fail: consul_metrics_merging_service_metrics_success 0
* modify logging to use key/value pairs
* Fixes #546
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants