Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Use block requests to check if block responses are correct #7653

Merged
merged 8 commits into from
Dec 3, 2020

Conversation

bkchr
Copy link
Member

@bkchr bkchr commented Dec 2, 2020

Before this pr sync relied on recently announced blocks to check if a
given peer response is correct. However this could lead to situations
where we requested a block from a peer and it gave us the requested, but
we rejected the response because this peer never send us an announcement
for the given block. See the added tests for a reproduction of the
problem.
With this pr, we now take the block request to check if a given response
matches the request. A node should not send us a block response
without a request anyway.

Essentially there is still a bug, because as you see in the test, we are
requesting block 2, while we already have this block imported. It even
happens that we request a block from the network that we have authored.
However a fix for this would require some more refactoring of the sync code.

Before this pr sync relied on recently announced blocks to check if a
given peer response is correct. However this could lead to situations
where we requested a block from a peer and it gave us the requested, but
we rejected the response because this peer never send us an announcement
for the given block. See the added tests for a reproduction of the
problem.
With this pr, we now take the block request to check if a given response
matches the request. A node should not send us a block response
without a request anyway.

Essentially there is still a bug, because as you see in the test, we are
requesting block 2, while we already have this block imported. It even
happens that we request a block from the network that we have authored.
However a fix for this would require some more refactoring of the sync code.
@bkchr bkchr added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Dec 2, 2020
@bkchr bkchr requested review from tomaka and arkpar December 2, 2020 10:11
@arkpar
Copy link
Member

arkpar commented Dec 2, 2020

Essentially there is still a bug, because as you see in the test, we are
requesting block 2, while we already have this block imported.

In some cases this is inevitable. If a peer suddenly sends an announcement for a future block that we don't know anything about, we request a chain starting from the last known common block. If they did not announce that they now have intermediate blocks, we can't be sure if that future block is continuing a known chain.

It looks like in your tests some peers skip some announcements. I. e. Peer1 was on block 1 but suddenly sends and announcement for block 4? During normal operation this should not happen. All blocks are announced to all peers, unless full sync mode.

@@ -1685,20 +1668,60 @@ fn validate_blocks<Block: BlockT>(blocks: &Vec<message::BlockData<Block>>, who:
}
}
}

if let Some(request) = request {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked first, before doing more expensive checks, such as computing trie roots.
Also, some of the checks against request are being performed on the protocol level here:
https://github.com/paritytech/substrate/blob/master/client/network/src/protocol.rs#L709
Might be a good idea to put them in one place.

@bkchr
Copy link
Member Author

bkchr commented Dec 2, 2020

It looks like in your tests some peers skip some announcements. I. e. Peer1 was on block 1 but suddenly sends and announcement for block 4? During normal operation this should not happen. All blocks are announced to all peers, unless full sync mode.

I can give you a trace. This is exactly what the logs showed 🤷

@arkpar
Copy link
Member

arkpar commented Dec 2, 2020

I can give you a trace. This is exactly what the logs showed 🤷

Would like to check it. Might be a further problem with how announcements are sent out or ignored when received.

Don't get me wrong, this is still a good test as we should account for all kinds of misbehaviour. And peers might legitimately skip announcements if they fall back to full sync mode for some reason, but this should not happen regularly. And if they do skip announcements and make us download extra blocks, we should not disconnect, but decrease reputation by a bit. This is what the current code does.

@bkchr
Copy link
Member Author

bkchr commented Dec 2, 2020

Yeah :) The problem we have seen is that peers are getting disconnected really fast with latest polkadot master. I will send you the log

@arkpar
Copy link
Member

arkpar commented Dec 2, 2020

Examining the logs, I guess the issue is that we don't receive back announcements for blocks that we have already announced to peers (e.g. blocks that are collated by this node). So the original issue is that recently_announced should really include not only received, but sent out announcements as well. But since you've removed it anyway, is should be OK

Looks good, modulo @tomaka's concerns.

Copy link
Contributor

@tomaka tomaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from me

@bkchr bkchr merged commit 864ce58 into master Dec 3, 2020
@bkchr bkchr deleted the bkchr-sync-fix-39434 branch December 3, 2020 14:49
darkfriend77 pushed a commit to mogwaicoin/substrate that referenced this pull request Jan 11, 2021
…h#7653)

* Use block requests to check if block responses are correct

Before this pr sync relied on recently announced blocks to check if a
given peer response is correct. However this could lead to situations
where we requested a block from a peer and it gave us the requested, but
we rejected the response because this peer never send us an announcement
for the given block. See the added tests for a reproduction of the
problem.
With this pr, we now take the block request to check if a given response
matches the request. A node should not send us a block response
without a request anyway.

Essentially there is still a bug, because as you see in the test, we are
requesting block 2, while we already have this block imported. It even
happens that we request a block from the network that we have authored.
However a fix for this would require some more refactoring of the sync code.

* Revert change

* Give the test a proper name

* Add moar logging

* Move cheaper checks

* Move checks to common place
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants