Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--snapshot=false is broken in 1.10.4-stable? #23075

Closed
ryny24 opened this issue Jun 20, 2021 · 19 comments
Closed

--snapshot=false is broken in 1.10.4-stable? #23075

ryny24 opened this issue Jun 20, 2021 · 19 comments

Comments

@ryny24
Copy link

ryny24 commented Jun 20, 2021

I run several geth nodes on Raspberry PI (8GB). It is the only thing running on the Pi, but I find that system resources are MUCH better with snapshot disabled. So, I run the the option --snapshot=false. I just upgraded today to 1.10.4, and it appears this option is being ignored. I now have the "Aborting/Resuming state snapshot generation" in the log.

I don't see in the release notes that this was changed. Am I missing something?

Thank you.

@PcDeaDZ
Copy link

PcDeaDZ commented Jun 20, 2021

Have the same issue.

@ryny24
Copy link
Author

ryny24 commented Jun 20, 2021

I'm also getting these error messages flooding my LOG file. I shortened it because it's very long (repeating) but it keeps recurring.

WARN [06-20|19:31:51.273] Synchronisation failed, dropping peer    peer=66d65ad07a81f86a3d66aad42cae4668a6c03f30f69f5c9ccab454506eccaa51 err="retrieved hash chain i
s invalid: invalid merkle root (remote: 551fd3fedb0dad82a910e1a0a34ca28a17c57b408267b90f8fe7c29c97006b58 local: 4a3d51243edc3437a9006d433d18f5788b71781154be59736ffc
cadee64afbea)"
WARN [06-20|19:31:51.274] Synchronisation failed, retrying         err="peer is unknown or unhealthy"
INFO [06-20|19:31:52.284] Generating state snapshot                root=b903d7..82c50b in=008f65..483efb at=23f5e7..b20e56 accounts=302,939 slots=92,044,386 storage
=7.92GiB elapsed=1h36m14.924s
INFO [06-20|19:31:55.137] Skip duplicated bad block                number=12,674,202 hash=de35c0..7857bd
ERROR[06-20|19:31:55.380]
########## BAD BLOCK #########
Chain config: {ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 4370000 Constantinople: 728000
0 Petersburg: 7280000 Istanbul: 9069000, Muir Glacier: 9200000, Berlin: 12244000, London: <nil>, Engine: ethash}

Number: 12674202
Hash: 0xde35c08cab97e831ba6ba3040d46cb35ed13336c0414ce9994aee8a7457857bd
         0: cumulative: 34989 gas: 34989 contract: 0x0000000000000000000000000000000000000000 status: 1 tx: 0xd6489eead932de881ca1b817b236e752218438097efe036c0ab500
ba14e0c8cd logs: [0x40a11c6f20] bloom: 00000000000000000000000000001000000000000000000400000000000000000000000000000080000000000000000000000000000000000000000000000
00000000000000000004000000800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000009400000000000000000000000000000001000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000000000000000000040000000000000000020000000000000000000
00000000000000000000000000000000000000000000000000000000000 state:
..........
Error: invalid merkle root (remote: 551fd3fedb0dad82a910e1a0a34ca28a17c57b408267b90f8fe7c29c97006b58 local: 4a3d51243edc3437a9006d433d18f5788b71781154be59736ffccadee64afbea)
##############################

I went back to 1.10.3, resources are back to normal and the errors are gone.

@karalabe
Copy link
Member

Hmm, my guess is that the switch to --syncmode=snap by default is also forcing snapshots. Still, snapshots do make a huge difference is avoiding a DoS attack, so if you can afford a bit of downtime, I'd recommend resyncing with snap sync. That will net you an almost perfect snapshot that can be fixed very quickly afterwards. If you keep your ancient folder inside chaindata, then you won't even download the blocks, only the state (around 30GB).

@karalabe
Copy link
Member

The invalid merkle root is more worrying. Did you node crash during snapshot gneration? Did you turn it off non-gracefully?

@karalabe
Copy link
Member

Just FYI, --snapshot=false will be removed in the future completely, so it's better to try and not rely on it sooner rather than later.

@ryny24
Copy link
Author

ryny24 commented Jun 21, 2021

The invalid merkle root is more worrying. Did you node crash during snapshot gneration? Did you turn it off non-gracefully?

I always turn it off gracefully. systemd has a 300 second wait. I always confirm "Blockchain stopped". It runs for about an hour before those messages start flooding the logs. I'm back to 1.10.3 and it's running fine again.

@easeev
Copy link

easeev commented Jun 21, 2021

Can we please not break it? We use --snapshot=false for archive nodes that sync better this way and other Geth forks as well

@yfl92
Copy link

yfl92 commented Jun 22, 2021

Moving forward, how should I turn off snapshotting given --snapshot=false has been broken with 1.10.4 release and will be deprecated in the future? For context, our archival nodes cannot sync unless we explicitly turn off snapshotting

@petejkim
Copy link

+1. Please preserve --snapshot=false, it does not make much sense/doesn't seem to work very well for archival nodes.

@begetan
Copy link

begetan commented Jun 23, 2021

A node running with --snapshot=false prevents the whole network to go to the snapshot syncing completely. If you run a new node version 1.10.4 right now which relies on the snapshot sync you may see a lack of peers supporting it.

@AusIV
Copy link
Contributor

AusIV commented Jun 23, 2021

Running with --syncmode=fast resolves this problem for me.

I believe what happened here is that at one point, --snapshot defaulted to false and you had to explicitly pass it to set it to true. --syncmode defaulted to fast, and you had to explicitly set it to snap. At that time, the assumption was that if you set --syncmode=snap without setting --snapshot=true that you had actually intended to set --snapshot=true.

Then --snapshot started defaulting to true, but --syncmode still defaulted to fast, so setting --snapshot=false worked as expected. In 1.10.4, they changed the default --syncmode to snap, and now the assumption that if --syncmode is set to snap meant you intended --snapshot=true violates peoples expectations, because the only way --snapshot=false is if somebody explicitly set it that way.

@AusIV
Copy link
Contributor

AusIV commented Jun 23, 2021

A node running with --snapshot=false prevents the whole network to go to the snapshot syncing completely. If you run a new node version 1.10.4 right now which relies on the snapshot sync you may see a lack of peers supporting it.

If having a few Geth operators running with --snapshot=false is going to prevent the network from going to snapshot syncing, other clients like Nethermind, OpenEthereum, Besu, etc. are probably going to cause even bigger problems. The network needs to have a critical mass of nodes running with --snapshot, but client diversity is more important for the network than snapshot sync, and to my knowledge none of the other clients have any serious plans to implement it. One Geth operator running with --snapshot=false is no more of a risk to the network than if they had elected to run Nethermind instead of Geth.

@ryny24
Copy link
Author

ryny24 commented Jun 24, 2021

Excellent suggestion @AusIV. --syncmode=fast worked for me. I'm gonna try it on my backup node for a few days, though. Thank you!

@AusIV
Copy link
Contributor

AusIV commented Jan 28, 2022

Note that --syncmode=fast will no longer work, but --syncmode=full will.

@karalabe
Copy link
Member

fast was replaced with snap. (i.e. --syncmode=snap)

@AusIV
Copy link
Contributor

AusIV commented Jan 31, 2022

Yes, but if you want the --snapshot=false flag to work, as is the point of this thread, you must also explicitly set the --syncmode flag to something other than snap.

@raulmonge
Copy link

Yeah I have the same problem, I updated my node to the 1.10.15 version and with --syncmode snap --snapshot=false it still start with the snapshot generation

@MariusVanDerWijden
Copy link
Member

Closing because stale.
Geth ignores --snapshot=false if the node was synced with --syncmode=snap. This behavior is grandfathered in and --snapshot=false was thought of as an emergency flag that would/will be removed in the future

@Austindgk232
Copy link

#26965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

12 participants