Extra timeout handling in block_requests #5794

tomaka · 2020-04-27T08:55:26Z

It's been reported in #5760 that timeouts aren't working properly, which indeed seems the case.

There's indeed a problematic situation caused by the fact that we handle timeouts by disconnecting: even if a request times out, we will only kill that specific connection, but a different connection with the same peer might have been opened in the meanwhile (which would also be a logic explanation for the timeout happening). When that happens, we don't actually report a disconnect to the sync/protocol state machine (hacks on top of hacks).

This PR changes that mechanism by explicitly emitting an event when a request times out, and also an event when a request has been cancelled.
I've had planned to do this exact change anyway in order to add proper metrics to block requests. This will be done on top of this PR.

Additionally, responses to overridden requests are now discarded instead of being reported anyway.

Also changes the log target, as demanded

tomaka · 2020-04-27T09:04:23Z

The red line is the block syncing speed on my machine running master when major-syncing. There are two big stalls.
I'm going to run this PR and see if this continues to happen (but it's going to take a couple days to be sure, so we shouldn't wait).

arkpar · 2020-04-27T09:20:24Z

client/network/src/protocol/block_requests.rs

+				if let Some(connections) = self.peers.get_mut(&peer) {
+					if let Some(connection) = connections.iter_mut().find(|c| c.id == connection_id) {
+						if let Some(ongoing_request) = &mut connection.ongoing_request {
+							if ongoing_request.request == original_request {


Does this compare IDs or all of the request data?

It compares all the data, but that's basically just a bitfield of the requested fields, the source block hash or number, and the number of blocks.

twittner · 2020-04-27T09:57:54Z

client/network/src/protocol/block_requests.rs

 			outgoing: FuturesUnordered::new(),
 			pending_events: VecDeque::new(),
 		}
 	}

 	/// Issue a new block request.
 	///
+	/// Cancels any existing request targeting the same `PeerId`.


Why is it safe to invalidate requests to a peer if more requests are sent to the same peer?

It's not a matter of safety or correctness. It's a design decision by the sync code to only allow one request per node.

From an API user point of view, previous requests are "cancelled", but internally we just forget about them. If a response comes and it doesn't match a known request, we discard that response.

Extra timeout handling in block_requests

0c2a29d

tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes labels Apr 27, 2020

tomaka requested review from twittner and arkpar April 27, 2020 08:55

arkpar reviewed Apr 27, 2020

View reviewed changes

arkpar approved these changes Apr 27, 2020

View reviewed changes

tomaka added the I3-bug The node fails to follow expected behavior. label Apr 27, 2020

twittner reviewed Apr 27, 2020

View reviewed changes

gavofyork merged commit a81dddc into paritytech:master Apr 27, 2020

tomaka deleted the new-block-requests-fixes branch April 27, 2020 10:19

tomaka mentioned this pull request Apr 28, 2020

Add metrics about block requests #5811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra timeout handling in block_requests #5794

Extra timeout handling in block_requests #5794

tomaka commented Apr 27, 2020 •

edited

Loading

tomaka commented Apr 27, 2020 •

edited

Loading

arkpar Apr 27, 2020

tomaka Apr 27, 2020

twittner Apr 27, 2020

tomaka Apr 27, 2020 •

edited

Loading

tomaka Apr 27, 2020 •

edited

Loading

Extra timeout handling in block_requests #5794

Extra timeout handling in block_requests #5794

Conversation

tomaka commented Apr 27, 2020 • edited Loading

tomaka commented Apr 27, 2020 • edited Loading

arkpar Apr 27, 2020

Choose a reason for hiding this comment

tomaka Apr 27, 2020

Choose a reason for hiding this comment

twittner Apr 27, 2020

Choose a reason for hiding this comment

tomaka Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

tomaka Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

tomaka commented Apr 27, 2020 •

edited

Loading

tomaka commented Apr 27, 2020 •

edited

Loading

tomaka Apr 27, 2020 •

edited

Loading

tomaka Apr 27, 2020 •

edited

Loading