-
Notifications
You must be signed in to change notification settings - Fork 4.5k
v1.16: Change getHealth to compare optimistically confirmed slots (backport of #33651) #33716
Conversation
The current getHealth mechanism checks a local accounts hash slot vs. those of other nodes as specified by --known-validator. This is a very coarse comparison given that the default for this value is 100 slots. More so, any nodes using a value larger than the default (ie --incremental-snapshot-interval 500) will likely see getHealth return status behind at some point. Change the underlying mechanism of how health is computed. Instead of using the accounts hash slots published in gossip, use the latest optimistically confirmed slot from the cluster. Even when a node is behind, it is able to observe cluster optimistically confirmed by slots by viewing votes published in gossip. Thus, the latest cluster optimistically confirmed slot can be compared against the latest optimistically confirmed bank from replay to determine health. This new comparison is much more granular, and not needing to depend on individual known validators is also a plus. (cherry picked from commit 8bd0e4c) # Conflicts: # rpc/src/rpc.rs
4daf22e
to
fd2c780
Compare
fd2c780
to
9e78635
Compare
Codecov Report
@@ Coverage Diff @@
## v1.16 #33716 +/- ##
=======================================
Coverage 81.8% 81.8%
=======================================
Files 766 766
Lines 209091 209073 -18
=======================================
+ Hits 171144 171161 +17
+ Misses 37947 37912 -35 |
Per discussion elsewhere, we're considering this PR for backport based on the following criteria:
I'm going to run this commit locally for a bit as an extra sanity check; after that, I'll reply back here and request reviewers to take a look |
Here are the catchup logs from a node that was running this change with incremental snapshot interval of 1000 slots + health check distance of 25 slots. Again, I had
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this picked pretty clean?
Yep, only issue stemmed from |
This is an automatic backport of pull request #33651 done by Mergify.
Cherry-pick of 8bd0e4c has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally
Mergify commands and options
More conditions and actions can be found in the documentation.
You can also trigger Mergify actions by commenting on this pull request:
@Mergifyio refresh
will re-evaluate the rules@Mergifyio rebase
will rebase this PR on its base branch@Mergifyio update
will merge the base branch into this PR@Mergifyio backport <destination>
will backport this PR on<destination>
branchAdditionally, on Mergify dashboard you can:
Finally, you can contact us on https://mergify.com