-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix past session slashing Zombienet test #578
Comments
CC @ordian |
The test is still flaky. Can we please fix this or disable it? |
@ordian is this fixed? |
Somewhat. I'll have to ask Zombienet team for stats on test failures to see how flaky it is nowadays. I haven't look into issues with Malus/Undying collator and simply replaced the collator with cumulus based. So that part is not fixed, but may doesn't need to be fixed. Sometimes it does fail on the last assertion about finality stall. In the test we pause a couple of nodes (2/4), so it breaks our assumption that no more than 1/3 is offline at the same time. And approvals are missing in that case leading to finality stall. This could either be fixed by being more aggressive in approval-distribution, or only disabling dispute resolution somehow on two nodes instead of pausing them. It does fail sometimes on other assertions e.g. no parachain block is produced within 300 seconds after all nodes are up, likely due to slowness in zombienet/nodes. |
@ordian let me know if you need those stats. Thx! |
I still see it failing quite regularly. So, it is still too flay. Especially if you consider that we already restart failing zombienet jobs. |
There are 2 issues to be fixed:
Finality stall
After resume
Alice
is lacking approval votes from honest validators because they don’t distribute - the blocks are approved in their view and also no need to enable aggression. 2/4 votes are not enough to approve, so finality stalls.We should adjust the amount of validators to avoid this issue.
Collators stop block production after chain is reverted
This is only relevant on the async backing branch, but it is to be merged soon - paritytech/polkadot#5022
We've fixed Malus/Undying in paritytech/polkadot#7618 and because of this both cumulus and undying stop collating when they fail to build on top of the malus garbage candidate. This persists even after the chain is reverted.
The text was updated successfully, but these errors were encountered: