-
Notifications
You must be signed in to change notification settings - Fork 1.6k
polkadot-collator nodes stop blocks importing by accident with "Collation wasn't advertised to any validator" #3139
Comments
We have the same issue. We updated all our dependencies to track 0.9.3 and are using the docker image |
I reproduced the issue locally with alice, bob at 6077ed0 and one polkadot-collator (commit). Looks like the issue is After the collator receives an invalid justification from Alice, it's only connected to Bob on the collation protocol, but when it receives an invalid justification from Bob (at some random point in the future), it has 0 collation peers and hence
|
@ordian Thank you for answer!
Do you mean that it's a bug that's needed to be fixed or collators just need to wait for recover collation peers? |
Could you try running the validators with |
Indeed, this is a BEEFY justification and running the validators with There is another issue, namely, if the validator's reputation goes below the banning threshold, the collator should try to reconnect at least a half a minute later, but that doesn't happen. Should probably be fixed with paritytech/substrate#9025. |
Closing as paritytech/substrate#9075 and paritytech/substrate#9025 are merged. |
@ordian @bkchr I'm running westend-local with alice,bob, and westmint-local with alice,bob two both v0.9.5 version in one node(as local testnet, I change the epoch time to 3 min). The parachain produce some block and stuck at block#16. after some dig log, I found the block stuck because of "PeerDisconnected" instead of "PeerMessages", causing "Collation wasn't advertised to any validator". Any idea why PeerDisconnected happend? as my understanding, this only one node started two relay chain and two collator, and the relay chain keep produce block normally, the network should be working fine. PS: I've already open debug: -l parachain=trace, but can't find "Invalid justification..." just like above comment, so may be there're other issue causing collator found peer validator disconnect? PPS: I also tried --no-beefy or latest v0.9.6, but with no luck, still has the same problem which stuck at some random block. my command:
|
We found the same problem in 0.9.13. |
@transxask we've merged a few fixes recently #4640, #4642. Could you try master branch? The fixes are going into the next release (0.9.15). |
Initial conditions:
Rococo-local relay chain with two polkadot nodes
Build:
cargo build --release
Build spec command:
./polkadot build-spec --chain rococo-local --disable-default-bootnode --raw > rococo-local-cfde.json
Run:
./polkadot --chain rococo-local-cfde.json --alice -d ~/cumulus-playground/relay-alice
./polkadot --chain rococo-local-cfde.json --bob -d ~/cumulus-playground/relay-bob --port 30334
Parachain with two polkadot-collator nodes as collators. The link points to fork with "rococo-local" in
/polkadot-parachains/src/chain_spec.rs
Build:
cargo build --release
Export genesis and state:
./polkadot-collator export-genesis-wasm > genesis-wasm
./polkadot-collator export-genesis-state --parachain-id 200 > genesis-state
Run:
./polkadot-collator --collator --alice --force-authoring -d ~/cumulus-playground/polkadot-collator-alice --parachain-id 200 --port 40335 --ws-port 9946 -- --execution wasm --chain ./rococo-local-cfde.json --port 30335
./polkadot-collator --collator --bob --force-authoring -d ~/cumulus-playground/polkadot-collator-bob --parachain-id 200 --port 40336 --ws-port 9947 -- --execution wasm --chain ./rococo-local-cfde.json --port 30335
Register parachain from polkadot.js.org/apps connected to relay's alice node: Developer -> Sudo -> parasSudoWrapper -> parasInitialize
The bug:
Accidently parachain's blocks just stopped importing, it can be on 80 or on 200 or some another block number. Logs from parachain's Alice node, where last successfull imported block was 548:
The text was updated successfully, but these errors were encountered: