-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify persisting tower local cluster tests #875
Conversation
f7d52b3
to
41ae0a5
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #875 +/- ##
=========================================
- Coverage 81.8% 81.8% -0.1%
=========================================
Files 851 851
Lines 231471 231471
=========================================
- Hits 189534 189507 -27
- Misses 41937 41964 +27 |
local-cluster/tests/local_cluster.rs
Outdated
} | ||
|
||
// A bit convoluted test case; but this roughly follows this test theoretical scenario: | ||
// Validator A, B, C have 31, 36, 33 % of stake respectively. Leader schedule is split, first half | ||
// of the test B is always leader, second half C is. Additionally we have a non voting validator D with 0 | ||
// stake to propagate gossip info. | ||
// of the test B is always leader, second half C is. | ||
// | ||
// Step 1: Kill C, only A, B and D should be running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a D now. By the way, why did we need D before but not anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while, but I think this comment is a clue:
agave/local-cluster/tests/local_cluster.rs
Line 3166 in 41ae0a5
// also important so `C` doesn't run into NoPropagatedConfirmation errors on making its first forked |
I think it comes from a time before we added this change:
agave/local-cluster/tests/local_cluster.rs
Lines 3196 to 3199 in 41ae0a5
// C should not produce any blocks at this time | |
validator_configs[2].fixed_leader_schedule = Some(FixedSchedule { | |
leader_schedule: Arc::new(c_leader_schedule), | |
}); |
C
from making blocks before the base slot forking point.
and when we relied on waiting for validators A
and C
to vote on and OC some slot > next_slot_on_a, where that slot could have been produced by validator C
In this scenario, in order for C
to reliably restart and make another fork, I think it would have needed to pass a propagated check, which implies seeing the vote from A in gossip, which is no longer necessary
410fe6b
to
12a115a
Compare
12a115a
to
2b771c9
Compare
local-cluster/tests/local_cluster.rs
Outdated
// stake to propagate gossip info. | ||
// of the test B is always leader, second half C is. | ||
// We don't give validator A any slots because it's going to be deleting its ledger, | ||
// so it may create versions of slots it's already created on a different fork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean: it may create different versions of blocks for some blocks already created on a different fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, clarified!
local-cluster/tests/local_cluster.rs
Outdated
// | ||
// Step 1: Kill C, only A, B and D should be running | ||
// Step 1: Kill C, only A, B should be running | ||
// | ||
// S0 -> S1 -> S2 -> S3 (A & B vote, optimistically confirmed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, we are now checking:
last_vote >= 1
is it possible we only have S2 and S3 (no S0 and S1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the only important slots are S2 and S3 because S2 is the last common ancestor slot slot and S3 is the fork that gets deleted, but still exists/doesn't exist in tower depending on which test we run.
You're right this S0, S1, S2, S3 stuff is quite confusing, I deleted all of this and replaced it with just the variable names base_slot, next_slot_on_a
for clarity
Problem
Optimistic confirmation tests are needlessly convoluted.
The optimistic confirmation is not necessary for the core of the test which is to guarantee tower prevents bad voting behavior across restarts.
Summary of Changes
D
Ran each test 100 times, no failures
Fixes #