-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - Invalid snapshot DiskSnapshot, Replaying ledger from genesis #5908
Comments
I suspect this issue is expected. Often times when upgrading the node, the disk format of snapshots changes thus requiring a replay from genesis. I will let @IntersectMBO/cardano-ledger confirm if this is expected though. |
I got this too. If it is intended, it would be nice if it could be handled better and a more suitable message so it's clearer of this is a bug - for example by storing a version number and outputting that the version number has changed and it must be replayed. Relying on it failing because of a size difference also feels odd, because what if there are changes to the format but the size does not change? Would it blindly read potentially incorrect data? 🤔 |
In some sense the ship has already sailed so perhaps the point is moot but it would be nice to see this type of functionality extended in more of a backward compatible way. For example having the node fall back to the old format if it encounters an error during reading, but write the new format going forward. Would only need to keep the backward compatibility around for the next 2 minor versions. I might be speaking nonsense since I have little knowledge of the underlying implementation logic, in which case I apologize. |
It has for the current version, but surely the next time the data format changes, a version number could be added, and then subsequent versions could check it?
I know nothing about how this works, but my guess is that reading the old format would involve keeping two sets of code/data structures that they currently do not, and they'd need to convert them as they're read at startup into the new format anyway (so the node can operate with the new structures). I also know nothing about this though 😄 I'm not against the rebuilding of the data now and then, it would just be nice if it was clear from the logs whether this is expected or might be that my store got corrupted (in which case, I might have a problem I want to investigate!). |
Two things:
|
The new version of the node doesn't know the old format. That serialization is in a previous version of the code. It can only tell the serialization is broken, but not why it's broken. I do agree, a better error message based on a versioning number would make this nicer though. |
Thanks, I was worried that would be the answer. I'm running with microk8s and I shut the nodes down by scaling to 0 instances. My understanding was that this would gracefully shutdown the container (it's using the official docker container). What's odd though, is that of all the times I recall seeing it do this, it's always been when I upgrade to a new version of the node, I don't recall every seeing it when I do the same to make k8s config changes or reboot the host for OS updates 🤔 Is it possible the error could include more information (for example what file it was reading, what data it read?) to help understand what might be wrong when it occurs? |
I strongly disagree with this suggestion and statement. I use the same rollout process for all node changes and that process works correctly for other updates that I have made in the weeks prior and since this bug report (upgrade from 9.0 to 9.1 was seamless in preview and preprod). Also the error message for a dirty shutdown is completely different than this snapshot error message. The only variable that reliably triggers this behaviour is going from 8.x to 9.x.
Yes, I was suggesting keeping both implementations in the same version for a couple releases of transition as only one possible strategy (out of many possible strategies) for backward compatability. Inspired slightly by the combinator approach used by cardano node itself for its own protocol migration. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days. |
Closing as the point is more or less moot for this specific version change and negligible value. I do hope the next time the schema changes someone makes it backward compatible for at least one version cycle please. |
Internal/External
External
Area
Governance Related to CIP1694 implementation
Other
Summary
I am trying to test 9.0.0 on preview net. A clean deployment of 9.0.0 from version 8.12.2 produces the below errors and causes the leger to be replayed from genesis.
Steps to reproduce
Steps to reproduce the behavior:
I also tested this on preprod net with the same results. Before trying on the preprod net I verified 8.12.2 was able to restart without any issues at all. The issue seems specific to the upgrade.
Expected behavior
I expect the node to detect the clean shutdown and resume from where the ledger is like when I restart the 8.12.2 version.
System info (please complete the following information):
Additional context
I did change a config value to disable PeerSharing but I would not expect that to cause this.
First bug report, apologies if I missed something.
The text was updated successfully, but these errors were encountered: