Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app/health: app_health_metrics_high_cardinality #2967

Merged
merged 2 commits into from
Mar 16, 2024

Conversation

pinebit
Copy link
Contributor

@pinebit pinebit commented Mar 15, 2024

This is to monitor metrics high cardinality, when a given metric gets too many distinct labels which results in high memory consumption in charon.
To this end this PR introduces one new metric: app_health_metrics_high_cardinality (gauge) which renders metric names => max labels count, if this metric exceeded the threshold (100 * num_of_validators).
In addition to the above metric which can be used in alerting, this also adds a new "health check" named metrics_high_cardinality which is triggered when app_health_metrics_high_cardinality reported any offense.

Note: app_health_metrics_high_cardinality will not "reset", because the purpose of this feature is to detect and signal an opportunity of memory leak.

category: feature
ticket: #2446

@pinebit
Copy link
Contributor Author

pinebit commented Mar 15, 2024

Tested with compose (simnet) cluster.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 8.57143% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 54.22%. Comparing base (f09498a) to head (03b0897).
Report is 2 commits behind head on main.

Files Patch % Lines
app/health/checker.go 0.00% 27 Missing ⚠️
app/health/checks.go 50.00% 2 Missing and 1 partial ⚠️
app/app.go 0.00% 1 Missing ⚠️
app/monitoringapi.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2967      +/-   ##
==========================================
- Coverage   54.24%   54.22%   -0.02%     
==========================================
  Files         195      195              
  Lines       27510    27551      +41     
==========================================
+ Hits        14922    14939      +17     
- Misses      10833    10850      +17     
- Partials     1755     1762       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gsora
Copy link
Collaborator

gsora commented Mar 15, 2024

Is this metric impacted by the amount of validators in the cluster?

@pinebit
Copy link
Contributor Author

pinebit commented Mar 15, 2024

Is this metric impacted by the amount of validators in the cluster?

Indeed, I forgot about the fact the threshold shall depend on number of validators. Now fixed.

Copy link

sonarcloud bot commented Mar 15, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@pinebit pinebit added the merge when ready Indicates bulldozer bot may merge when all checks pass label Mar 16, 2024
@obol-bulldozer obol-bulldozer bot merged commit 08e0920 into main Mar 16, 2024
14 checks passed
@obol-bulldozer obol-bulldozer bot deleted the pinebit/metrics-high-cardinality branch March 16, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge when ready Indicates bulldozer bot may merge when all checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants