-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Use block requests to check if block responses are correct #7653
Conversation
Before this pr sync relied on recently announced blocks to check if a given peer response is correct. However this could lead to situations where we requested a block from a peer and it gave us the requested, but we rejected the response because this peer never send us an announcement for the given block. See the added tests for a reproduction of the problem. With this pr, we now take the block request to check if a given response matches the request. A node should not send us a block response without a request anyway. Essentially there is still a bug, because as you see in the test, we are requesting block 2, while we already have this block imported. It even happens that we request a block from the network that we have authored. However a fix for this would require some more refactoring of the sync code.
In some cases this is inevitable. If a peer suddenly sends an announcement for a future block that we don't know anything about, we request a chain starting from the last known common block. If they did not announce that they now have intermediate blocks, we can't be sure if that future block is continuing a known chain. It looks like in your tests some peers skip some announcements. I. e. Peer1 was on block 1 but suddenly sends and announcement for block 4? During normal operation this should not happen. All blocks are announced to all peers, unless full sync mode. |
@@ -1685,20 +1668,60 @@ fn validate_blocks<Block: BlockT>(blocks: &Vec<message::BlockData<Block>>, who: | |||
} | |||
} | |||
} | |||
|
|||
if let Some(request) = request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checked first, before doing more expensive checks, such as computing trie roots.
Also, some of the checks against request are being performed on the protocol level here:
https://github.com/paritytech/substrate/blob/master/client/network/src/protocol.rs#L709
Might be a good idea to put them in one place.
I can give you a trace. This is exactly what the logs showed 🤷 |
Would like to check it. Might be a further problem with how announcements are sent out or ignored when received. Don't get me wrong, this is still a good test as we should account for all kinds of misbehaviour. And peers might legitimately skip announcements if they fall back to full sync mode for some reason, but this should not happen regularly. And if they do skip announcements and make us download extra blocks, we should not disconnect, but decrease reputation by a bit. This is what the current code does. |
Yeah :) The problem we have seen is that peers are getting disconnected really fast with latest polkadot master. I will send you the log |
Examining the logs, I guess the issue is that we don't receive back announcements for blocks that we have already announced to peers (e.g. blocks that are collated by this node). So the original issue is that recently_announced should really include not only received, but sent out announcements as well. But since you've removed it anyway, is should be OK Looks good, modulo @tomaka's concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me
…h#7653) * Use block requests to check if block responses are correct Before this pr sync relied on recently announced blocks to check if a given peer response is correct. However this could lead to situations where we requested a block from a peer and it gave us the requested, but we rejected the response because this peer never send us an announcement for the given block. See the added tests for a reproduction of the problem. With this pr, we now take the block request to check if a given response matches the request. A node should not send us a block response without a request anyway. Essentially there is still a bug, because as you see in the test, we are requesting block 2, while we already have this block imported. It even happens that we request a block from the network that we have authored. However a fix for this would require some more refactoring of the sync code. * Revert change * Give the test a proper name * Add moar logging * Move cheaper checks * Move checks to common place
Before this pr sync relied on recently announced blocks to check if a
given peer response is correct. However this could lead to situations
where we requested a block from a peer and it gave us the requested, but
we rejected the response because this peer never send us an announcement
for the given block. See the added tests for a reproduction of the
problem.
With this pr, we now take the block request to check if a given response
matches the request. A node should not send us a block response
without a request anyway.
Essentially there is still a bug, because as you see in the test, we are
requesting block 2, while we already have this block imported. It even
happens that we request a block from the network that we have authored.
However a fix for this would require some more refactoring of the sync code.