Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate intermitent Bad assignment v1 error on a single node #2995

Closed
alexggh opened this issue Jan 19, 2024 · 4 comments
Closed

Investigate intermitent Bad assignment v1 error on a single node #2995

alexggh opened this issue Jan 19, 2024 · 4 comments
Assignees

Comments

@alexggh
Copy link
Contributor

alexggh commented Jan 19, 2024

Out of the blue one of our kusama validators started printing occasionally this error:

2024-01-19 10:37:49.817 ERROR tokio-runtime-worker parachain::approval-distribution: Bad assignment v1 block_hash=0x2116aac1d9005173f8d3d2c578c47f6d7692ed599e92b0bbedbb470c1ab8ea8f candidate_index=2147483744 validator_index=ValidatorIndex(120) kind=RelayVRFDelay { core_index: CoreIndex(30) }

https://grafana.teleport.parity.io/goto/x747ta5IR?orgId=1

Initial analysis

All messages seem to come from the same Peer which was updated to the new 1.6.0 release around that time, our node is already running 1.6.0

The current theory is that the invalid candidates indexes candidate_index=16777284 & candidate_index=67108940 are actually valid candidates if instead of using CandidateIndex as the type you use CandidateBitfield to deserialize those bytes.

So, it seems that when Peer upgrades we get into a situation where network-bridge-rx on sender sends us the newer validation protocol while our node still thinks it should be on the older and we deserialize the message as such, hence it won't pass the sanitization part.

The timeline is like this

  1. On 2024-01-17 our nodes get upgraded and negotiate the older protocol with RadiumBlock. [OK]
  2. On 2024-01-18 when Peer gets upgraded it negotiates newer protocol with our node. [OK]
  3. Network bridge on our node doesn't get the memo, the PeerConnected comes 3 minutes after the error.
network-bridge-rx: action="PeerConnected" peer_set=Validation version=3 
Bad assignment v1
  1. Peer sends us a newer protocol message and we deserialize it as an older one [NOK]
@alexggh alexggh self-assigned this Jan 19, 2024
@altonen
Copy link
Contributor

altonen commented Jan 19, 2024

Not sure if it's related to this but there is a known issue in networking where if you disconnect a remote peer, it doesn't get notified of the disconnection. It will detect it only when it tries to send some data to you. One annoying thing about this is that if you try to reconnect to the node after you've disconnected and the remote peer has not noticed this, it will accept you without notifying the remote peer's NetworkBridge. So essentially the remote peer has a pair of substreams: an inbound substream that works and an outbound substream that doesn't work. It's possible that this new inbound substream had been negotiated with the new protocol but since NetworkBridge wasn't notified, it assumes that it's still using an older version of the protocol.

@alexggh
Copy link
Contributor Author

alexggh commented Jan 19, 2024

@altonen This is exactly what happened here.

To double check I understood, you are saying that currently if we don't graciously close the connection like kill -9 of the node and then we restart it, then when connect to our peers their NetworkBridge won't receive updates about the newly negotiated protocols version ?

@altonen
Copy link
Contributor

altonen commented Jan 22, 2024

Connection for the validation protocol in this case would mean the pair of substreams (notification stream). If the inbound substream gets closed, NetworkBridge is not notified. If the entire TCP connection gets closed, it should get notified.

@alexggh
Copy link
Contributor Author

alexggh commented May 9, 2024

Stale issue, issue wasn't seen since Jan 19, closing it.

@alexggh alexggh closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants