-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beefy: Tolerate pruned state on runtime API call #5197
Changes from 4 commits
de0e209
6998037
f06cce4
02aa612
568ba1b
98f8a98
46d87af
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Schema: Polkadot SDK PRDoc Schema (prdoc) v1.0.0 | ||
# See doc at https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/prdoc/schema_user.json | ||
|
||
title: Prevent `ConsensusReset` by tolerating runtime API errors in BEEFY | ||
|
||
doc: | ||
- audience: Node Operator | ||
description: | | ||
After warp sync, the BEEFY worker was trying to execute runtime calls on | ||
blocks which had their state already pruned. This led to an error and restarting | ||
of the beefy subsystem in a loop. After this PR, the worker tolerates call errors and therefore prevents this | ||
worker restart loop. | ||
|
||
crates: | ||
- name: sc-consensus-beefy | ||
bump: minor |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,7 @@ use crate::{ | |
metrics::register_metrics, | ||
}; | ||
use futures::{stream::Fuse, FutureExt, StreamExt}; | ||
use log::{debug, error, info, warn}; | ||
use log::{debug, error, info, log_enabled, trace, warn, Level}; | ||
use parking_lot::Mutex; | ||
use prometheus_endpoint::Registry; | ||
use sc_client_api::{Backend, BlockBackend, BlockchainEvents, FinalityNotification, Finalizer}; | ||
|
@@ -451,7 +451,8 @@ where | |
state.set_best_grandpa(best_grandpa.clone()); | ||
// Overwrite persisted data with newly provided `min_block_delta`. | ||
state.set_min_block_delta(min_block_delta); | ||
debug!(target: LOG_TARGET, "🥩 Loading BEEFY voter state from db: {:?}.", state); | ||
debug!(target: LOG_TARGET, "🥩 Loading BEEFY voter state from db."); | ||
trace!(target: LOG_TARGET, "🥩 Loaded state: {:?}.", state); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unrelated, but the state is quite big and causes log files to be multiple gigabytes. |
||
|
||
// Make sure that all the headers that we need have been synced. | ||
let mut new_sessions = vec![]; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,7 +38,7 @@ use futures::{stream::Fuse, FutureExt, StreamExt}; | |
use log::{debug, error, info, trace, warn}; | ||
use sc_client_api::{Backend, HeaderBackend}; | ||
use sc_utils::notification::NotificationReceiver; | ||
use sp_api::ProvideRuntimeApi; | ||
use sp_api::{ApiError, ProvideRuntimeApi}; | ||
use sp_arithmetic::traits::{AtLeast32Bit, Saturating}; | ||
use sp_consensus::SyncOracle; | ||
use sp_consensus_beefy::{ | ||
|
@@ -458,13 +458,16 @@ where | |
notification.tree_route, | ||
); | ||
|
||
self.runtime | ||
.runtime_api() | ||
.beefy_genesis(notification.hash) | ||
.ok() | ||
.flatten() | ||
.filter(|genesis| *genesis == self.persisted_state.pallet_genesis) | ||
.ok_or(Error::ConsensusReset)?; | ||
match self.runtime.runtime_api().beefy_genesis(notification.hash) { | ||
Ok(Some(genesis)) if genesis != self.persisted_state.pallet_genesis => | ||
return Err(Error::ConsensusReset), | ||
Ok(_) => {}, | ||
Err(api_error) => { | ||
// This can happen in case the block was already pruned. | ||
// Mostly after warp sync when finality notifications are piled up. | ||
debug!(target: LOG_TARGET, "🥩 Unable to check beefy genesis: {}", api_error); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this is safe since this way we might miss consensus resets. And in this case, further worker processing might be incorrect. Thinking. Longer term we plan to add a header log to handle this kind of situations. Something like this: paritytech/substrate#14765 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option would be to store the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This could work 👍 . Regarding missing the ConsensusReset. So eventually we will catch up to the latest blocks and will find that the beefy_genesis is indeed different to what we have stored. In that case we would trigger the consensus reset, just later as we reach the tip. Is that acceptable? What are the risks here? If not acceptable we could add beefy genesis in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm looked a bit on the code. I was thinking there might be problems with the authority set, or the payload, but it looks like this can't happen. I think it's ok. But I would also wait for @acatangiu 's input on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The change itself is correct in what it does, but we might still have undiscovered corner cases that this will surface. I suggest to test it on Rococo where we have actually reset consensus and see how it behaves. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah okay I was not aware that this was done on rococo, will perform some tests there before merging |
||
}, | ||
} | ||
|
||
let mut new_session_added = false; | ||
if *header.number() > self.best_grandpa_block() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
are
log_enabled
andLevel
used?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will remove