-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what my review is worth, this looks good to me.
@@ -51,5 +51,8 @@ message BlockData { | |||
bytes message_queue = 5; // optional | |||
// Justification if requested. | |||
bytes justification = 6; // optional | |||
// True if justification should be treated as present be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment does not make sense to me. Did you meant:
// True if justification should be treated as present be empty. | |
// True if justification should be treated as empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was meant to be:
// True if justification should be treated as present be empty. | |
// True if justification should be treated as present but empty. |
@@ -51,5 +51,8 @@ message BlockData { | |||
bytes message_queue = 5; // optional | |||
// Justification if requested. | |||
bytes justification = 6; // optional | |||
// True if justification should be treated as present be empty. | |||
// This hack is unfortunately necessary because of shortcomings in the protobuf format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding your details from the pull request description here?
Looks like timeout handling logic is broken. Also, |
response: message::BlockResponse<B>, | ||
}, | ||
} | ||
|
||
/// Configuration options for `BlockRequests`. | ||
#[derive(Debug, Clone)] | ||
pub struct Config { | ||
max_block_data_response: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is reached? How's it different from max_response_len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the maximum number of blocks that we respond with, by default 128. If a node requests 129 blocks, we will only answer with 128. That's exactly the same as the old behaviour.
@@ -127,6 +151,8 @@ pub struct BlockRequests<B: Block> { | |||
chain: Arc<dyn Client<B>>, | |||
/// Futures sending back the block request response. | |||
outgoing: FuturesUnordered<BoxFuture<'static, ()>>, | |||
/// Events to return as soon as possible from `poll`. | |||
pending_events: VecDeque<NetworkBehaviourAction<OutboundProtocol<B>, Event<B>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this unbounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but libp2p guarantees that poll
is called every time after inject_event
.
Consequently, the only way to make this list grow past one element is to call send_request
multiple times in a row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which we totally do for gossiping, propagating extrinsics and such, don't we?
response: message::BlockResponse<B>, | ||
}, | ||
} | ||
|
||
/// Configuration options for `BlockRequests`. | ||
#[derive(Debug, Clone)] | ||
pub struct Config { | ||
max_block_data_response: u32, | ||
max_request_len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if it is reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's considered as a protocol error and we disconnect the node. There has to be some sort of limit, since we have to allocate the memory that contains the request. I think requests are typically less than 100 bytes, so the default 1MiB is way more than enough.
Please keep to existing style when adding log messages. In particular, each message should be capitalized. Existing messages with |
I've made sure that these fields were only ever used with the single substream. In other words, if The logic of the timeout is handled directly by libp2p now, in the
Will fix that too. |
So what's the timeout in seconds for block requests now and where is it set? I've observed that timeouts are never triggered and sync stalls.
What about obsolete requests tracking that these fields were handling? Is there an alternative in the new protocol? I could not find it. |
Ok, this is the last step towards #5670 that needs to be done ASAP for backcompat reasons. The rest of this issue can be tackled later.
Remarks:
I added a CLI flag to restore the legacy behaviour, just in case.
Unless the CLI flag is enabled, this PR breaks compatibility with Polkadot 0.7.20 and below. I think that's fine. According to the telemetry, there are only 16 nodes which are concerned by this breakage, only 6 of which seem to stay in sync while the others are more or less in limbo. It seems that compatibility with versions 0.7.19 and below is already broken anyway (for a reason I don't know).
The
protocol.rs
file now emits requests that are then handled by theblock_requests
andfinality_requests
modules. When these module notify of a response, we notifyprotocol.rs
.Any failure (bad response, protocol not supported, timeout, ...) is handled by disconnecting the node. This keeps compatibility with the current protocol, but we should obviously do something about that later.
I removed the
Toggle
fromfinality_proofs
, otherwise I can't use it to send requests. I made it so that if the finality proof provider isNone
then the list of negotiable inbound protocols is empty.The limit of 16MiB for block response sizes is the one currently in use in the legacy protocol. I initially put it at 1MiB but that was too little and we were discarding responses.
I had to add an
is_empty_justification
field to the protocol, because tests send out empty justifications and we don't differentiate between an absence of justification and an empty justification. While this is not great, it was done in a backwards-compatible way.The format of a "block request" and a "block response" are
message::BlockRequest
andmessage::BlockResponse
. I unfortunately didn't have the courage to do the deeper refactorings required to have something a bit more strongly-typed.I'm testing this on my Google Cloud node at the moment, and it seems to work totally fine.