fix(prometheus): flaky metrics server #608

dnut · 2025-03-07T01:23:18Z

Problems

Prometheus does not consistently get metrics from our metrics endpoint. The metrics are intermittent.

Also, the server often crashes due to errors. In #590 I added a wrapper to catch these errors and restart the server. But sometimes an error occurs that is handled, a server restart is attempted, but the server never responds to any http requests any more, so the metrics become completely inaccessible until sig is completely restarted. Usually this is caused by error.BrokenPipe

Solution

Revert "fix(prometheus): remove httpz again and fix prometheus metrics (#555)"

This reverts commit 3502333.

I understand the desire to eliminate unnecessary dependencies. In the grand scheme of things, eliminating httpz is not that important. Having a working metrics endpoint is critical though. We have a solution that works with httpz. We should test more thoroughly before replacing it.

#555)" This reverts commit 3502333.

0xNineteen · 2025-03-07T15:03:10Z

if our zig server implementation is flaky rn we should keep this in mind when were using it for rpc (and if we should take a different approach there too)

codecov · 2025-03-07T15:22:36Z

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/prometheus/http.zig	0.00%	8 Missing ⚠️

Files with missing lines	Coverage Δ
src/prometheus/http.zig	`0.00% <0.00%> (ø)`

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Revert "fix(prometheus): remove httpz again and fix prometheus metrics (

2f0d0a5

#555)" This reverts commit 3502333.

dnut requested a review from Rexicon226 March 7, 2025 01:23

dnut self-assigned this Mar 7, 2025

0xNineteen approved these changes Mar 7, 2025

View reviewed changes

dnut added this pull request to the merge queue Mar 8, 2025

Merged via the queue into main with commit f8c02b1 Mar 8, 2025
16 of 17 checks passed

dnut deleted the dnut/fix/metrics/flaky branch March 8, 2025 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prometheus): flaky metrics server #608

fix(prometheus): flaky metrics server #608

dnut commented Mar 7, 2025 •

edited

Loading

0xNineteen commented Mar 7, 2025

codecov bot commented Mar 7, 2025 •

edited

Loading

fix(prometheus): flaky metrics server #608

fix(prometheus): flaky metrics server #608

Conversation

dnut commented Mar 7, 2025 • edited Loading

Problems

Solution

0xNineteen commented Mar 7, 2025

codecov bot commented Mar 7, 2025 • edited Loading

Codecov Report

dnut commented Mar 7, 2025 •

edited

Loading

codecov bot commented Mar 7, 2025 •

edited

Loading