Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Dangerously long stack trace when syncing a new node #8950

Closed
koute opened this issue May 31, 2021 · 2 comments · Fixed by #8956
Closed

Dangerously long stack trace when syncing a new node #8950

koute opened this issue May 31, 2021 · 2 comments · Fixed by #8956
Labels
I3-bug The node fails to follow expected behavior.

Comments

@koute
Copy link
Contributor

koute commented May 31, 2021

I was doing some memory profiling over the weekend, and over the course of 60h of profiling I've got this stack trace that was 5763 frames long:

clone
start_thread
std::sys::unix::thread::Thread::new::thread_start
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
<alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
core::ops::function::FnOnce::call_once{{vtable.shim}}
std::sys_common::backtrace::__rust_begin_short_backtrace
tokio::runtime::context::enter
tokio::runtime::blocking::pool::Inner::run
tokio::runtime::task::harness::Harness<T,S>::poll
<std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
tokio::runtime::task::core::Core<T,S>::poll
futures_executor::local_pool::block_on
std::thread::local::LocalKey<T>::with
<tracing_futures::Instrumented<T> as core::future::future::Future>::poll
<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
<sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
<futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
<core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
<sc_consensus_babe::BabeVerifier<Block,Client,SelectChain,CAW,CIDP> as sp_consensus::import_queue::Verifier<Block>>::verify::{{closure}}
sp_block_builder::BlockBuilder::check_inherents
sp_block_builder::runtime_decl_for_BlockBuilder::check_inherents_call_api_at
<sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
<sc_service::client::call_executor::LocalCallExecutor<Block,B,E> as sc_client_api::call_executor::CallExecutor<Block>>::contextual_call
<kusama_runtime::RuntimeApiImpl<__SR_API_BLOCK__,RuntimeApiImplCall> as sp_api::Core<__SR_API_BLOCK__>>::Core_initialize_block_runtime_api_impl
sp_api::runtime_decl_for_Core::initialize_block_call_api_at
<sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
<sc_service::client::call_executor::LocalCallExecutor<Block,B,E> as sc_client_api::call_executor::CallExecutor<Block>>::contextual_call
sp_state_machine::execution::StateMachine<B,H,N,Exec>::execute_using_consensus_failure_handler
sp_state_machine::execution::StateMachine<B,H,N,Exec>::execute_aux
<sc_executor::native_executor::NativeExecutor<D> as sp_core::traits::CodeExecutor>::call
sc_executor::wasm_runtime::RuntimeCache::with_instance
sc_executor::native_executor::WasmExecutor::with_instance::{{closure}}
std::thread::local::LocalKey<T>::with
std::panicking::try
sc_executor_common::wasm_runtime::WasmInstance::call_export
<sc_executor_wasmi::WasmiInstance as sc_executor_common::wasm_runtime::WasmInstance>::call
wasmi::func::FuncInstance::invoke
wasmi::runner::Interpreter::start_execution
wasmi::func::FuncInstance::invoke
<sc_executor_wasmi::FunctionExecutor as wasmi::host::Externals>::invoke_index
<<sp_io::storage::HostFunctions as sp_wasm_interface::HostFunctions>::host_functions::ExtStorageNextKeyVersion1 as sp_wasm_interface::Function>::execute
sp_io::storage::next_key_version_1
tracing::span::Span::in_scope
std::thread::local::LocalKey<T>::with
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
...repeats 5685 times...
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
<sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key
sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H>::next_storage_key_from_root
<trie_db::triedb::TrieDBIterator<L> as trie_db::TrieIterator<L>>::seek
trie_db::iterator::TrieDBNodeIterator<L>::seek_prefix
trie_db::triedb::TrieDB<L>::get_raw_or_lookup
<sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H> as hash_db::HashDBRef<H,alloc::vec::Vec<u8>>>::get
<sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H> as hash_db::HashDB<H,alloc::vec::Vec<u8>>>::get
<sc_client_db::StorageDb<Block> as sp_state_machine::trie_backend_essence::Storage<<<Block as sp_runtime::traits::Block>::Header as sp_runtime::traits::Header>::Hashing>>::get
sc_state_db::StateDb<BlockHash,Key>::get
<sp_database::kvdb::DbAdapter<D> as sp_database::Database<H>>::get
<kvdb_rocksdb::Database as kvdb::KeyValueDB>::get
kvdb_rocksdb::Database::get
rocksdb::db::DB::get_pinned_cf_opt
rocksdb_get_pinned_cf
rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*) [clone .localalias]
rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&)
rocksdb::MemTableListVersion::Get(rocksdb::LookupKey const&, std::string*, std::string*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, unsigned long*, rocksdb::ReadOptions const&, rocksdb::ReadCallback*, bool*)
rocksdb::MemTableListVersion::GetFromList(std::__cxx11::list<rocksdb::MemTable*, std::allocator<rocksdb::MemTable*>>*, rocksdb::LookupKey const&, std::string*, std::string*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, unsigned long*, rocksdb::ReadOptions const&, rocksdb::ReadCallback*, bool*)
rocksdb::MemTable::Get(rocksdb::LookupKey const&, std::string*, std::string*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, unsigned long*, rocksdb::ReadOptions const&, rocksdb::ReadCallback*, bool*, bool)
rocksdb::MemTable::GetFromTable(rocksdb::LookupKey const&, unsigned long, bool, rocksdb::ReadCallback*, bool*, std::string*, std::string*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, bool*, bool*)
rocksdb::(anonymous namespace)::SkipListRep::Get(rocksdb::LookupKey const&, void*, bool (*)(void*, char const*))
rocksdb::SaveValue(void*, char const*)
std::string::_M_replace(unsigned long, unsigned long, char const*, unsigned long)
std::string::_M_mutate(unsigned long, unsigned long, char const*, unsigned long)

The <sp_state_machine::ext::Ext<H,N,B> as sp_externalities::Externalities>::next_storage_key appears 5693 times in total in this stack trace; looks like unbound recursion.

This happened during a fresh sync of a new node on Kusama, with interpreted WASM execution, with freshly compiled polkadot from f0ce0ca1f8b03d5ad3b80ce66793f18bde21cd14.

Probably the same issue as #4527 if I'd had to guess.

@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label May 31, 2021
@xlc
Copy link
Contributor

xlc commented May 31, 2021

Maybe relates to paritytech/polkadot#3053 (comment) as well. Something isn't right.

@arkpar arkpar added I3-bug The node fails to follow expected behavior. and removed J2-unconfirmed Issue might be valid, but it’s not yet known. labels May 31, 2021
@arkpar
Copy link
Member

arkpar commented May 31, 2021

Two things to fix here:

  1. Externalities::next_storage_key should not be recursive.
  2. It should not be linear time w.r.t. the overlay size. Looks like currently if there's a lot of None items in the overlay, it will iterate them all until it finds an existing key. Instead the overlay should probably maintain two separate BtreeMaps for existing and non-existing key.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants