Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prometheus): flaky metrics server #608

Merged
merged 1 commit into from
Mar 8, 2025
Merged

fix(prometheus): flaky metrics server #608

merged 1 commit into from
Mar 8, 2025

Conversation

dnut
Copy link
Contributor

@dnut dnut commented Mar 7, 2025

Problems

Prometheus does not consistently get metrics from our metrics endpoint. The metrics are intermittent.

image (2)

Also, the server often crashes due to errors. In #590 I added a wrapper to catch these errors and restart the server. But sometimes an error occurs that is handled, a server restart is attempted, but the server never responds to any http requests any more, so the metrics become completely inaccessible until sig is completely restarted. Usually this is caused by error.BrokenPipe

Solution

Revert "fix(prometheus): remove httpz again and fix prometheus metrics (#555)"

This reverts commit 3502333.

I understand the desire to eliminate unnecessary dependencies. In the grand scheme of things, eliminating httpz is not that important. Having a working metrics endpoint is critical though. We have a solution that works with httpz. We should test more thoroughly before replacing it.

@dnut dnut requested a review from Rexicon226 March 7, 2025 01:23
@dnut dnut self-assigned this Mar 7, 2025
@0xNineteen
Copy link
Contributor

if our zig server implementation is flaky rn we should keep this in mind when were using it for rpc (and if we should take a different approach there too)

Copy link

codecov bot commented Mar 7, 2025

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/prometheus/http.zig 0.00% 8 Missing ⚠️
Files with missing lines Coverage Δ
src/prometheus/http.zig 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dnut dnut added this pull request to the merge queue Mar 8, 2025
Merged via the queue into main with commit f8c02b1 Mar 8, 2025
16 of 17 checks passed
@dnut dnut deleted the dnut/fix/metrics/flaky branch March 8, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants