-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app/health: app_health_metrics_high_cardinality #2967
Conversation
Tested with compose (simnet) cluster. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2967 +/- ##
==========================================
- Coverage 54.24% 54.22% -0.02%
==========================================
Files 195 195
Lines 27510 27551 +41
==========================================
+ Hits 14922 14939 +17
- Misses 10833 10850 +17
- Partials 1755 1762 +7 ☔ View full report in Codecov by Sentry. |
Is this metric impacted by the amount of validators in the cluster? |
Indeed, I forgot about the fact the threshold shall depend on number of validators. Now fixed. |
Quality Gate passedIssues Measures |
This is to monitor metrics high cardinality, when a given metric gets too many distinct labels which results in high memory consumption in charon.
To this end this PR introduces one new metric:
app_health_metrics_high_cardinality
(gauge) which renders metric names => max labels count, if this metric exceeded the threshold (100 * num_of_validators).In addition to the above metric which can be used in alerting, this also adds a new "health check" named
metrics_high_cardinality
which is triggered whenapp_health_metrics_high_cardinality
reported any offense.Note:
app_health_metrics_high_cardinality
will not "reset", because the purpose of this feature is to detect and signal an opportunity of memory leak.category: feature
ticket: #2446