-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fresh snap sync did not start worldstate heal #5549
Comments
This could be related to #5191 |
Had a similar issue doing a fresh sepolia sync. Turned on debug after it halted and this error appears a lot. @pinges or @matkt do you think this could be related to sync halting at 99% and/or heal stage not starting...or is it just noise? Full log: besu-sync-halt.log
|
@matkt correction to the above comment. I found an ERROR in the logs, think the DEBUG error I posted above is a red herring.
@pinges had the same error on a mainnet node too, which resulted in a 99% sync halt. |
Also, my issue may be related to the recent flat db commit (since it's calling getFlatDbReaderStrategy even without the feature enabled) which could be different to Yorick's original issue which was using 23.4.1 |
@yorickdowne on the off chance you still have the complete logs, can you share them please? I'm looking for an error that may have occurred sometime between the worldstate download completing and the block import completing |
Happened again on another mainnet fresh sync...
In both my cases, restarting fixed it |
I do not, sorry |
nullpointer issue should now be fixed on main branch |
Awesome. I'll leave this story open since there must be some other error causing the halt, likely still the Snapsync thread dying due to an unhandled exception. Other than waiting for it to happen again and being able to collect DEBUG logs, we could progress this by hunting for unhandled exceptions in the code. |
OS: Ubuntu 22.04.3 LTS Dealing with the same logs as original post so I turned on DEBUG level logs: teku-besu_node-1 | 2023-08-13T11:13:34.893753385Z org.hyperledger.besu.ethereum.p2p.rlpx.handshake.HandshakeException: Unable to create ECDH Key agreement due to Crypto engine failure |
teku-besu_node-1 | 2023-08-14T15:04:23.002041466Z com.google.common.cache.CacheLoader$InvalidCacheLoadException: CacheLoader returned null for key [Connection with hashCode -1647236115 with peer 0x210643148ea56b6465bd32995d389aaeb6ca1362456c6abfa4bbc307eaa2d9107f34c1309911d6877d5ad77f618487d94c131198eaf4121eee009f78c2b21751 inboundInitiated true initAt 1692025462893]. |
Any updates on this ? @yorickdowne @siladu |
Hi, thanks for sharing |
Now getting this error instead: Error: 2023-09-29T16:59:12.295244858Z 2023-09-29 16:59:12.295+00:00 | EthScheduler-Services-51 (importBlock) | INFO | FastImportBlocksStep | Block import progress: 18233689 of 18235363 (99%) |
Could you check if you have another older exception during sync. Like Busy Exception or other . A log starting with "Unexpected exception in pipeline. " This exception can occur during sync so it is important to look at all the logs from the start of the sync The last bugs that causes a 99% stuck we found are all linked to a Rocksdb Busy Exception which arrives during the sync . We have an idea to solve it but I would like to be sure that this is also what you have |
I've enabled DEBUG modus and now receiving the following output: (hope this helps) teku-besu_node-1 | 2023-09-30T21:50:18.261270853Z 2023-09-30 21:50:18.261+00:00 | vert.x-worker-thread-5 | DEBUG | JsonRpcExecutor | JSON-RPC request -> eth_getBlockByNumber ["0x2bed6",false] |
Don't have the old logs? I mean the exception had to appear long time before the stuck. If you have the old logs by doing a grep you should see a few things |
2023-10-02T11:46:48.076024438Z 2023-10-02 11:46:48.067+00:00 | EthScheduler-Services-8 (batchPersistStorageData) | INFO | Pipeline | Unexpected exception in pipeline. Aborting. |
Description
A fresh sync of mainnet Ethereum on Besu using snap sync got through FastImportBlocksStep, but did not start worldstate heal. It was "stuck" here.
Steps to Reproduce (Bug)
Logs (if a bug)
Node restarted at this point
Versions (Add all that apply)
besu/v23.4.1/linux-x86_64/openjdk-java-17
5.10.0-23-amd64
24.0.2
teku/v23.5.0/linux-x86_64/-eclipseadoptium-openjdk64bitservervm-java-17
The text was updated successfully, but these errors were encountered: