Use new block requests protocol #5760

tomaka · 2020-04-23T14:25:13Z

Ok, this is the last step towards #5670 that needs to be done ASAP for backcompat reasons. The rest of this issue can be tackled later.

Remarks:

I added a CLI flag to restore the legacy behaviour, just in case.
Unless the CLI flag is enabled, this PR breaks compatibility with Polkadot 0.7.20 and below. I think that's fine. According to the telemetry, there are only 16 nodes which are concerned by this breakage, only 6 of which seem to stay in sync while the others are more or less in limbo. It seems that compatibility with versions 0.7.19 and below is already broken anyway (for a reason I don't know).
The protocol.rs file now emits requests that are then handled by the block_requests and finality_requests modules. When these module notify of a response, we notify protocol.rs.
Any failure (bad response, protocol not supported, timeout, ...) is handled by disconnecting the node. This keeps compatibility with the current protocol, but we should obviously do something about that later.
I removed the Toggle from finality_proofs, otherwise I can't use it to send requests. I made it so that if the finality proof provider is None then the list of negotiable inbound protocols is empty.
The limit of 16MiB for block response sizes is the one currently in use in the legacy protocol. I initially put it at 1MiB but that was too little and we were discarding responses.
I had to add an is_empty_justification field to the protocol, because tests send out empty justifications and we don't differentiate between an absence of justification and an empty justification. While this is not great, it was done in a backwards-compatible way.
The format of a "block request" and a "block response" are message::BlockRequest and message::BlockResponse. I unfortunately didn't have the courage to do the deeper refactorings required to have something a bit more strongly-typed.
I'm testing this on my Google Cloud node at the moment, and it seems to work totally fine.

twittner

LGTM.

mxinden

For what my review is worth, this looks good to me.

mxinden · 2020-04-23T18:09:57Z

client/network/src/protocol/schema/api.v1.proto

@@ -51,5 +51,8 @@ message BlockData {
 	bytes message_queue = 5; // optional
 	// Justification if requested.
 	bytes justification = 6; // optional
+	// True if justification should be treated as present be empty.


Comment does not make sense to me. Did you meant:

Suggested change

// True if justification should be treated as present be empty.

// True if justification should be treated as empty.

That was meant to be:

Suggested change

// True if justification should be treated as present be empty.

// True if justification should be treated as present but empty.

mxinden · 2020-04-23T18:11:33Z

client/network/src/protocol/schema/api.v1.proto

@@ -51,5 +51,8 @@ message BlockData {
 	bytes message_queue = 5; // optional
 	// Justification if requested.
 	bytes justification = 6; // optional
+	// True if justification should be treated as present be empty.
+	// This hack is unfortunately necessary because of shortcomings in the protobuf format.


Would you mind adding your details from the pull request description here?

…uests

arkpar · 2020-04-24T19:04:31Z

Looks like timeout handling logic is broken. Also, Peer::block_request should be set immediately upon queueing the request. This is broken too as far as I can see.

arkpar · 2020-04-24T19:18:20Z

client/network/src/protocol/block_requests.rs

+		response: message::BlockResponse<B>,
+	},
+}
+
 /// Configuration options for `BlockRequests`.
 #[derive(Debug, Clone)]
 pub struct Config {
 	max_block_data_response: u32,


What happens if this is reached? How's it different from max_response_len?

That's the maximum number of blocks that we respond with, by default 128. If a node requests 129 blocks, we will only answer with 128. That's exactly the same as the old behaviour.

arkpar · 2020-04-24T19:20:26Z

client/network/src/protocol/block_requests.rs

@@ -127,6 +151,8 @@ pub struct BlockRequests<B: Block> {
 	chain: Arc<dyn Client<B>>,
 	/// Futures sending back the block request response.
 	outgoing: FuturesUnordered<BoxFuture<'static, ()>>,
+	/// Events to return as soon as possible from `poll`.
+	pending_events: VecDeque<NetworkBehaviourAction<OutboundProtocol<B>, Event<B>>>,


Is this unbounded?

Yes, but libp2p guarantees that poll is called every time after inject_event.
Consequently, the only way to make this list grow past one element is to call send_request multiple times in a row.

Which we totally do for gossiping, propagating extrinsics and such, don't we?

arkpar · 2020-04-24T19:20:42Z

client/network/src/protocol/block_requests.rs

+		response: message::BlockResponse<B>,
+	},
+}
+
 /// Configuration options for `BlockRequests`.
 #[derive(Debug, Clone)]
 pub struct Config {
 	max_block_data_response: u32,
 	max_request_len: usize,


What happens if it is reached?

It's considered as a protocol error and we disconnect the node. There has to be some sort of limit, since we have to allocate the memory that contains the request. I think requests are typically less than 100 bytes, so the default 1MiB is way more than enough.

arkpar · 2020-04-24T19:23:58Z

Please keep to existing style when adding log messages. In particular, each message should be capitalized. Existing messages with target: "sync" should be preserved, and not replaced with new messages with no target.

tomaka · 2020-04-25T08:12:12Z

Looks like timeout handling logic is broken. Also, Peer::block_request should be set immediately upon queueing the request. This is broken too as far as I can see.

I've made sure that these fields were only ever used with the single substream. In other words, if use_legacy_network is true we use these fields (that's why I didn't remove them) and if it is false we never use them.

The logic of the timeout is handled directly by libp2p now, in the OneShotHandler. A timeout is a protocol error (which I agree isn't great if we have to debug that) and will result in a disconnect.
I'll improve the whole debugging story and add metrics next week.

In particular, each message should be capitalized. Existing messages with target: "sync" should be preserved, and not replaced with new messages with no target.

Will fix that too.

arkpar · 2020-04-25T09:33:44Z

So what's the timeout in seconds for block requests now and where is it set? I've observed that timeouts are never triggered and sync stalls.

I've made sure that these fields were only ever used with the single substream. In other words, if use_legacy_network is true we use these fields (that's why I didn't remove them) and if it is false we never use them.

What about obsolete requests tracking that these fields were handling? Is there an alternative in the new protocol? I could not find it.

Use new block requests protocol

d441c65

tomaka added A0-please_review Pull request needs code review. B1-clientnoteworthy labels Apr 23, 2020

tomaka requested review from twittner and mxinden April 23, 2020 14:25

tomaka requested a review from cecton as a code owner April 23, 2020 14:25

tomaka added this to the 2.0 milestone Apr 23, 2020

gnunicorn added the B8-breakseverything label Apr 23, 2020

tomaka mentioned this pull request Apr 23, 2020

Remove legacy single substream tracking issue #5670

Closed

6 tasks

twittner approved these changes Apr 23, 2020

View reviewed changes

mxinden approved these changes Apr 23, 2020

View reviewed changes

Tweak comment

cbdc40d

tomaka added A8-mergeoncegreen and removed A0-please_review Pull request needs code review. labels Apr 24, 2020

Merge remote-tracking branch 'upstream/master' into use-new-block-req…

d3d2651

…uests

tomaka merged commit ee098e9 into paritytech:master Apr 24, 2020

tomaka deleted the use-new-block-requests branch April 24, 2020 11:48

arkpar reviewed Apr 24, 2020

View reviewed changes

tomaka mentioned this pull request Apr 27, 2020

Extra timeout handling in block_requests #5794

Merged

tomaka mentioned this pull request Jul 21, 2020

Add sync_legacy_requests_received metric #6698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new block requests protocol #5760

Use new block requests protocol #5760

tomaka commented Apr 23, 2020 •

edited

Loading

twittner left a comment

mxinden left a comment

mxinden Apr 23, 2020

tomaka Apr 24, 2020

mxinden Apr 23, 2020

arkpar commented Apr 24, 2020

arkpar Apr 24, 2020

tomaka Apr 25, 2020 •

edited

Loading

arkpar Apr 24, 2020

tomaka Apr 25, 2020 •

edited

Loading

arkpar Apr 25, 2020

arkpar Apr 24, 2020

tomaka Apr 25, 2020 •

edited

Loading

arkpar commented Apr 24, 2020

tomaka commented Apr 25, 2020 •

edited

Loading

arkpar commented Apr 25, 2020

	// True if justification should be treated as present be empty.
	// True if justification should be treated as empty.

Use new block requests protocol #5760

Use new block requests protocol #5760

Conversation

tomaka commented Apr 23, 2020 • edited Loading

twittner left a comment

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

mxinden Apr 23, 2020

Choose a reason for hiding this comment

tomaka Apr 24, 2020

Choose a reason for hiding this comment

mxinden Apr 23, 2020

Choose a reason for hiding this comment

arkpar commented Apr 24, 2020

arkpar Apr 24, 2020

Choose a reason for hiding this comment

tomaka Apr 25, 2020 • edited Loading

Choose a reason for hiding this comment

arkpar Apr 24, 2020

Choose a reason for hiding this comment

tomaka Apr 25, 2020 • edited Loading

Choose a reason for hiding this comment

arkpar Apr 25, 2020

Choose a reason for hiding this comment

arkpar Apr 24, 2020

Choose a reason for hiding this comment

tomaka Apr 25, 2020 • edited Loading

Choose a reason for hiding this comment

arkpar commented Apr 24, 2020

tomaka commented Apr 25, 2020 • edited Loading

arkpar commented Apr 25, 2020

tomaka commented Apr 23, 2020 •

edited

Loading

tomaka Apr 25, 2020 •

edited

Loading

tomaka Apr 25, 2020 •

edited

Loading

tomaka Apr 25, 2020 •

edited

Loading

tomaka commented Apr 25, 2020 •

edited

Loading