-
Notifications
You must be signed in to change notification settings - Fork 4.5k
'Snapshot bank failed to verify', ledger/src/snapshot_utils.rs:462:9 #8130
Comments
|
Newer snapshots from the bootstrap leader produce a similar bankhash failure:
|
I picked up a new snapshot at slot 221774 from the BSV. BSV log output (bank hash is 80aac54190...)
but when I boot my validator, I get bank hash
|
While the BSV is still serving bad snapshots (accounts_db::verify_bank_hash() fails), the node at 62.171.132.17 is serving good snapshots. Unfortunately the BSV is pruning its ledger so we may not have the full history to go back and figure out when the bad hash was introduced. However I've started up another node at Next up is to start digging into the differences between the BSV and the new node |
@mvines Sorry for the disturbance.. I'll look into this. |
Yes please! The best plan I have right now is to compare the snapshots produced by 35.230.25.59:8001 (bad) with those produced by 51.143.93.203:8001 (good) Both nodes are at the tip of the cluster and generally voting in the same way, I suspect there's something corrupt in the accounts files of 35.230.25.59. If you need more access/logging from these machines let me know |
@ryoqun one idea is maybe backing out the latest snapshot changes. How confident are you that that might have affected something? |
Ok, at https://drive.google.com/drive/u/1/folders/1uD-ll87-5pmKvdSfQDguQUreR18K94_1 I've put two snapshots, both of slot 281297. One from the "bad" node, one from the "good" node. The snapshots themselves are very different in size but I've done no further analysis on it beyond observing that |
status report: I have found an oddity. @mvines Thanks for preparing good and bad snapshots with identical slot. That greatly accelerated the investigation. Seems that account db is too eagerly purging AppendVec in some corner case. I'll try to fix it first quickly. But if that is found too hard for me, I'll share all the findings.
@sakridge Yeah, I've suspected my recent #7892 . But it seems that it's not culprit. Sadly, we've found yet another long standing unnoticed bug. |
It looks like the bootstrap leader on TdS is serving up a bad snapshot:
Steps to reproduce on the 0.23.2 release:
solana-validator --ledger bad-snapshot/ -o -
The text was updated successfully, but these errors were encountered: