-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync hang for 22 minutes on mainnet after checkpoint error #1181
Comments
@hdevalence I think you were the last person to modify the sync code, do you have any insights here? @yaahc how can we show the detailed checkpoint error message in the logs, rather than just "block could not be checkpointed" ? |
I tried adding If that's a good way to fix this issue, let's open a follow-up ticket to fix all the error types in every zebra crate? |
The Windows CI sync tests also hang intermittently, when initialising the syncer: The hangs might be due to this bug, or they might be caused by the network on the CI host. |
I saw similar issues while getting #1041 working but plumbing error context through the checkpointer was more work than I wanted to commit to doing in that chunk of work. |
I think we may want to make a style choice here and choose to avoid Tuple style enum variants in our error types. If we write the errors as struct-style enum variants where we always remember to name the source error field |
We think this issue is fixed by the RocksDB change in #1325. |
Version
zebrad
main branch, as at 19 October 2019.Platform
Linux ... 5.4.68 #1-NixOS SMP Sat Sep 26 16:03:16 UTC 2020 x86_64 GNU/Linux
Description
zebrad start
on mainnet did not output any logs at info level for 22 minutes. It restarted sync after a checkpoint error, startedobtain_tips
, then hung for 22 minutes. Then it continued to log similar checkpoint errors, without making progress.After restarting Zebra, the sync continued to make progress.
I tried this:
Running
zebrad start
on mainnet, with some global and some local peers.My network is slower and higher latency than the average Zebra developer.
I expected to see this happen:
zebrad start
syncs up to the tip of the best chain.Checkpoint errors contain a detailed explanation of the error, rather than a generic error message.
Instead, this happened:
Error Before Hang
Metadata
SpanTrace and Logs
There was no info-level output for 22 minutes after the hang, then it logged a similar error, but only delayed for about 2 minutes.
The text was updated successfully, but these errors were encountered: