-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TimeoutInfo replay will produce deadlock #7833
Comments
Thanks for filing the issue @TheWinds, would you mind attaching some logs so from the process from when this happens? |
I was also running into this with node/ validator restart during consensus with multiple rounds. I can confirm that moving the |
Because the replay process is blocked, no helpful logs are output. You can reproduce this problem with continuous writes at scale. |
yes i did that too |
good job |
Tendermint version (use
tendermint version
orgit rev-parse --verify HEAD
if installed from source):v0.34.x
v0.35.x
ABCI app (name for built-in, URL for self-written if it's publicly available):
any
Environment:
centos7
What happened:
The consensus module gets stuck when replaying the timeout message
What you expected to happen:
Have you tried the latest version: yes/no
yes
How to reproduce it (as minimally and precisely as possible):
Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):
Config (you can paste only the changes you've made):
node command runtime flags:
Please provide the output from the
http://<ip>:<port>/dump_consensus_state
RPC endpoint for consensus bugsno,this api will get stuck
Anything else we need to know:
Maybe these codes should be promoted before catch up replay?
https://github.com/tendermint/tendermint/blob/master/internal/consensus/state.go#L433-L440
catchUpReplay method will call
cs.scheduleTimeout
,buttimeoutRoutine
not created,sothe following code will lock
The text was updated successfully, but these errors were encountered: