-
Notifications
You must be signed in to change notification settings - Fork 2.6k
sc-consensus-slots tries to claim slot before fully synced #11510
Comments
Hmm, so we should actually skip it: |
Hm, but it does seem to skip already, isn't it? Ah, I see what you mean. Yeah, doesn't make sense to me either 🤔 |
According to @liuchengxu we found |
I think the tricky part is that substrate/client/network/src/service.rs Lines 2104 to 2108 in 58fd5b7
While sync state like this: substrate/client/network/sync/src/lib.rs Lines 583 to 590 in c2fc4b3
And turns out max number of blocks queued for importing is around the number I saw: substrate/client/network/sync/src/lib.rs Lines 85 to 86 in c2fc4b3
So |
|
I'm reading it as follows:
Hence |
Yeah, I can provide more evidence. Use this subspace branch with the following debugging patch, let the first node produce a number of blocks and then spin up another node, as you can see, the second node is connect to the first node, the best seen block is 183, the local best number is 1, but the sync state is Idle and it's not major syncing, so it's all confusing to me. diff --git a/cumulus/client/cirrus-executor/src/bundle_processor.rs b/cumulus/client/cirrus-executor/src/bundle_processor.rs
index e30d52fc5..3d0da16d0 100644
--- a/cumulus/client/cirrus-executor/src/bundle_processor.rs
+++ b/cumulus/client/cirrus-executor/src/bundle_processor.rs
@@ -297,6 +297,19 @@ where
return Ok(())
}
+ let network_status = self.primary_network.status().await.expect("Failed to get network status");
+
+ println!(
+ "==== sync_state: {:?}, best_seen_block: {:?}, num_sync_peers: {:?}, num_connected_peers: {:?}, num_active_peers: {:?}, best_number: {:?}, is_major_syncing: {:?}",
+ network_status.sync_state,
+ network_status.best_seen_block,
+ network_status.num_sync_peers,
+ network_status.num_connected_peers,
+ network_status.num_active_peers,
+ self.primary_chain_client.info().best_number,
+ self.primary_network.is_major_syncing()
+ );
+
// Ideally, the receipt of current block will be included in the next block, i.e., no
// missing receipts.
if header_number == best_execution_chain_number + One::one() {
|
We actually hit a fairly pathological case today with the launch of the new network. We had a few bootstrap nodes with full history available (some 32k blocks) and hundreds of other nodes that try to sync from them. Not only nodes are trying to solve prematurely as described above when they approach the top, I think there is another issue that should or maybe shouldn't be solved that depends on the sync speed and connectivity:
So I'm wondering if node should advertise its target somehow as well? To explain to other nodes that it itself isn't done syncing yet and that should also be accounted for somehow during calculation of |
On latest Substrate it still considers major sync to be over when 2k blocks are queued among other issues 😞 I believe this is because of Current
Should I submit a PR leaving behavior of Also I believe this should be reopened, since the issue is clearly not fixed, node will try to claim slots long before it is actually synced. |
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
After Substrate upgrade from c4f3d02 to 5d3e7c4 I have noticed that
sc-consensus-slots
tries to claim slot before it is fully synced (might be unrelated to upgrade, just haven't seen this before).In particular this happens when node is ~2000 blocks from the sync target:
Whatever the reason of this is, it seems wrong and I'm not convinced right now that the root cause is in our consensus. I have not yet tried this with just Babe and Substrate's node template, this is our custom consensus protocol, but it is using
sc-consensus-slot
in a similar way to Babe consensus.Node was started with
--validator --pruning archive
, forced authoring wasn't enabled.Steps to reproduce
The text was updated successfully, but these errors were encountered: