-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate intermitent Bad assignment v1
error on a single node
#2995
Comments
Not sure if it's related to this but there is a known issue in networking where if you disconnect a remote peer, it doesn't get notified of the disconnection. It will detect it only when it tries to send some data to you. One annoying thing about this is that if you try to reconnect to the node after you've disconnected and the remote peer has not noticed this, it will accept you without notifying the remote peer's |
@altonen This is exactly what happened here. To double check I understood, you are saying that currently if we don't graciously close the connection like |
Connection for the validation protocol in this case would mean the pair of substreams (notification stream). If the inbound substream gets closed, |
Stale issue, issue wasn't seen since Jan 19, closing it. |
Out of the blue one of our kusama validators started printing occasionally this error:
https://grafana.teleport.parity.io/goto/x747ta5IR?orgId=1
Initial analysis
All messages seem to come from the same Peer which was updated to the new
1.6.0
release around that time, our node is already running1.6.0
The current theory is that the invalid candidates indexes
candidate_index=16777284
&candidate_index=67108940
are actually valid candidates if instead of using CandidateIndex as the type you use CandidateBitfield to deserialize those bytes.So, it seems that when Peer upgrades we get into a situation where
network-bridge-rx
on sender sends us the newer validation protocol while our node still thinks it should be on the older and we deserialize the message as such, hence it won't pass the sanitization part.The timeline is like this
2024-01-17
our nodes get upgraded and negotiate the older protocol with RadiumBlock. [OK]2024-01-18
whenPeer
gets upgraded it negotiates newer protocol with our node. [OK]Peer
sends us a newer protocol message and we deserialize it as an older one [NOK]The text was updated successfully, but these errors were encountered: