Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Extra timeout handling in block_requests #5794

Merged
merged 1 commit into from
Apr 27, 2020

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Apr 27, 2020

It's been reported in #5760 that timeouts aren't working properly, which indeed seems the case.

There's indeed a problematic situation caused by the fact that we handle timeouts by disconnecting: even if a request times out, we will only kill that specific connection, but a different connection with the same peer might have been opened in the meanwhile (which would also be a logic explanation for the timeout happening). When that happens, we don't actually report a disconnect to the sync/protocol state machine (hacks on top of hacks).

This PR changes that mechanism by explicitly emitting an event when a request times out, and also an event when a request has been cancelled.
I've had planned to do this exact change anyway in order to add proper metrics to block requests. This will be done on top of this PR.

Additionally, responses to overridden requests are now discarded instead of being reported anyway.

Also changes the log target, as demanded

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes labels Apr 27, 2020
@tomaka tomaka requested review from twittner and arkpar April 27, 2020 08:55
@tomaka
Copy link
Contributor Author

tomaka commented Apr 27, 2020

Screenshot from 2020-04-27 11-03-24

The red line is the block syncing speed on my machine running master when major-syncing. There are two big stalls.
I'm going to run this PR and see if this continues to happen (but it's going to take a couple days to be sure, so we shouldn't wait).

if let Some(connections) = self.peers.get_mut(&peer) {
if let Some(connection) = connections.iter_mut().find(|c| c.id == connection_id) {
if let Some(ongoing_request) = &mut connection.ongoing_request {
if ongoing_request.request == original_request {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this compare IDs or all of the request data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It compares all the data, but that's basically just a bitfield of the requested fields, the source block hash or number, and the number of blocks.

@tomaka tomaka added the I3-bug The node fails to follow expected behavior. label Apr 27, 2020
outgoing: FuturesUnordered::new(),
pending_events: VecDeque::new(),
}
}

/// Issue a new block request.
///
/// Cancels any existing request targeting the same `PeerId`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to invalidate requests to a peer if more requests are sent to the same peer?

Copy link
Contributor Author

@tomaka tomaka Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a matter of safety or correctness. It's a design decision by the sync code to only allow one request per node.

Copy link
Contributor Author

@tomaka tomaka Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API user point of view, previous requests are "cancelled", but internally we just forget about them. If a response comes and it doesn't match a known request, we discard that response.

@gavofyork gavofyork merged commit a81dddc into paritytech:master Apr 27, 2020
@tomaka tomaka deleted the new-block-requests-fixes branch April 27, 2020 10:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants