Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NGINX reload counters #1049

Merged
merged 3 commits into from
Sep 19, 2023
Merged

Conversation

ciarams87
Copy link
Contributor

@ciarams87 ciarams87 commented Sep 12, 2023

Proposed changes

Problem: As an operator of an environment running NKG
I want to track the total number of NGINX reloads and failures for NGINX processes across my environment
So that I can correlate availability issues with excessive NGINX reloads or failures
And so that I can let the NKG know when reloads become a problem.

Solution: Total NGINX reloads and failed reloads are counted and reported via a Prometheus endpoint as a counter.
Also included reload duration histogram and a stale config gauge.

Testing: Manual testing with metrics enabled and disabled. Confirmed that reloads reported = HUP signals observed in the NGINX logs = config version reported in the version endpoint. Example output:

<...>
# HELP nginx_kubernetes_gateway_nginx_reload_errors_total Number of unsuccessful NGINX reloads
# TYPE nginx_kubernetes_gateway_nginx_reload_errors_total counter
nginx_kubernetes_gateway_nginx_reload_errors_total{class="nginx"} 0
# HELP nginx_kubernetes_gateway_nginx_reloads_milliseconds Duration in milliseconds of NGINX reloads
# TYPE nginx_kubernetes_gateway_nginx_reloads_milliseconds histogram
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="500"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="1000"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="5000"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="10000"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="30000"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_bucket{class="nginx",le="+Inf"} 2
nginx_kubernetes_gateway_nginx_reloads_milliseconds_sum{class="nginx"} 231
nginx_kubernetes_gateway_nginx_reloads_milliseconds_count{class="nginx"} 2
# HELP nginx_kubernetes_gateway_nginx_reloads_total Number of successful NGINX reloads
# TYPE nginx_kubernetes_gateway_nginx_reloads_total counter
nginx_kubernetes_gateway_nginx_reloads_total{class="nginx"} 2
# HELP nginx_kubernetes_gateway_nginx_stale_config Indicates if NGINX is not serving the latest configuration.
# TYPE nginx_kubernetes_gateway_nginx_stale_config gauge
nginx_kubernetes_gateway_nginx_stale_config{class="nginx"} 0
<...>

Closes #887

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 12, 2023
Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to report a reload error on every stage of where a reload could fail (e.g. - no main process found, can't get child process file, HUP signal fails, no new processes, no new config version), or something different?

I think that would be too much granularity.

docs/monitoring.md Outdated Show resolved Hide resolved
docs/monitoring.md Outdated Show resolved Hide resolved
internal/mode/static/metrics/collector.go Outdated Show resolved Hide resolved
@ciarams87 ciarams87 marked this pull request as ready for review September 15, 2023 09:04
@ciarams87 ciarams87 requested a review from a team as a code owner September 15, 2023 09:04
internal/mode/static/manager.go Outdated Show resolved Hide resolved
internal/mode/static/manager.go Outdated Show resolved Hide resolved
internal/mode/static/metrics/collector.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/runtime/manager.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/runtime/manager.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/runtime/manager.go Outdated Show resolved Hide resolved
internal/mode/static/nginx/runtime/manager.go Show resolved Hide resolved
internal/mode/static/metrics/collector.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ciarams87 ciarams87 merged commit f55c94e into nginx:main Sep 19, 2023
@ciarams87 ciarams87 deleted the feat/reload-metrics branch September 19, 2023 21:16
miledxz added a commit to miledxz/nginx-gateway-fabric that referenced this pull request Jan 14, 2025
* Add NGINX reload counters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics: Total and failed NGINX reloads
5 participants