-
Notifications
You must be signed in to change notification settings - Fork 2.6k
client/network: Use request response for light client requests #7895
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request is ready to be reviewed.
There are still 3 outstanding / in-progress issues, highlighted as comments below.
// TODO: Still need to figure out how to pass the peerset to the handler. Can't do it in | ||
// `builder.rs` as the peerset is only constructed later on in `protocol.rs`. | ||
// /// Handle to use for reporting misbehaviour of peers. | ||
// peerset: PeersetHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Architectural issue for which I have not yet found a neat solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking to #7958 here for future reference.
client/network/src/behaviour.rs
Outdated
CustomMessageOutcome::SyncConnected(peer_id) => { | ||
// TODO: Should this be tied to sync connections, or to connections in general? | ||
self.light_client_request_sender.inject_connected(peer_id); | ||
self.events.push_back(BehaviourOut::SyncConnected(peer_id)) | ||
} | ||
CustomMessageOutcome::SyncDisconnected(peer_id) => { | ||
// TODO: Should this be tied to sync connections, or to connections in general? | ||
self.light_client_request_sender.inject_disconnected(peer_id); | ||
self.events.push_back(BehaviourOut::SyncDisconnected(peer_id)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two arguments for using nodes in the sync
peer set:
- Simplicity
- Likely long-lived, thus good for requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds OK to me, but maybe @tomaka has a more educated opinion on the matter, being intimately familiar with the peer sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaka any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to answer.
Yes, that is the right thing to do. Being "sync connected" proves that they're on the correct chain.
Sentry nodes are deprecated. Thus there is no need for `.maintain/sentry-node` to spin up a sentry node test environment. Instead this commit rewrites the setup to contain two full-connected validators and one light client. With the steps below one can now spin up a local test network with two validators, one light-client, Prometheus and Grafana. - cargo build --release - sudo docker-compose -f .maintain/local-docker-test-network/docker-compose.yml up
With 5d1d682 I have rewritten Bug that @svyatonik spotted, namely that polkadot.js/apps only reports "Remote fetch failed" is fixed with 6da6207. UI is still facing issues fetching data via light-client request API. Same issues appear on current |
continue | ||
} | ||
|
||
pending_request.attempts_left -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it is something that has already been in the master and you've just moved that code, but this actually causing this error in logs:
Light: [2021-01-21 14:42:48] 2021-01-21 14:42:48.245 DEBUG tokio-runtime-worker sc_network::light_client_requests::sender: No peer available to send request to.
Light: [2021-01-21 14:42:48] 2021-01-21 14:42:48.246 WARN tokio-runtime-worker txpool: Failed to fetch block body: RemoteFetchFailed
Couple of lines below (see // Break in case there is no idle peer.
), the request is returned to the queue if there are no peers or all peers are busy with other requests. The initial idea was that request only should fail if we have received enough "reject" from full nodes. E.g. 2 different full nodes have responded with error to the same request.
Probably this is causing an issues? I'd just move this line (pending_request.attempts_left -= 1;
) just before converting it into_sent
request. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion sounds good and indeed this solves the remaining connection issues. Thanks @svyatonik!
Commit 2477f34 does the corresponding change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I have gone through all the code as I was looking mostly on light handler + sender. But: (1) what I have seen looks fine and (2) it works in my local tests. So OK from me
Allow request response protocol handlers to issue reputation changes, by sending them back along with the response payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I've only left insignificant comments and references for other and future readers.
StorageProof, | ||
light | ||
}; | ||
// use sc_peerset::{PeersetHandle, ReputationChange}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// use sc_peerset::{PeersetHandle, ReputationChange}; |
// TODO: Still need to figure out how to pass the peerset to the handler. Can't do it in | ||
// `builder.rs` as the peerset is only constructed later on in `protocol.rs`. | ||
// /// Handle to use for reporting misbehaviour of peers. | ||
// peerset: PeersetHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking to #7958 here for future reference.
@@ -233,7 +233,8 @@ impl RequestResponsesBehaviour { | |||
if let Some((protocol, _)) = self.protocols.get_mut(protocol) { | |||
if protocol.is_connected(target) { | |||
let request_id = protocol.send_request(target, request); | |||
self.pending_requests.insert(request_id, (Instant::now(), pending_response)); | |||
let prev_req_id = self.pending_requests.insert(request_id, (Instant::now(), pending_response)); | |||
debug_assert!(prev_req_id.is_none(), "Expect request id to be unique."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, this assertion can fail with this code, which is addressed in #7957.
client/network/src/behaviour.rs
Outdated
CustomMessageOutcome::SyncConnected(peer_id) => { | ||
// TODO: Should this be tied to sync connections, or to connections in general? | ||
self.light_client_request_sender.inject_connected(peer_id); | ||
self.events.push_back(BehaviourOut::SyncConnected(peer_id)) | ||
} | ||
CustomMessageOutcome::SyncDisconnected(peer_id) => { | ||
// TODO: Should this be tied to sync connections, or to connections in general? | ||
self.light_client_request_sender.inject_disconnected(peer_id); | ||
self.events.push_back(BehaviourOut::SyncDisconnected(peer_id)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds OK to me, but maybe @tomaka has a more educated opinion on the matter, being intimately familiar with the peer sets.
Correct. Paraphrasing the above. This pull request should not be merged before #7957. In addition this pull request should not be merged before finding a solution for the |
I have tested this once more via the docker-compose setup. All outstanding comments and TODOs are addressed. I will go ahead and merge this pull request. Thanks @svyatonik @romanb @tomaka for the help! |
bot merge |
Trying merge. |
This is the last step to close #6968, namely to use
libp2p-request-response
introduced in #6634 for light client requests. The patch is similar to the block request changes done in #7478.Given that the list of changes required is already large, this pull request is scoped to refactorings within
client/network
for now. Once merged, this pull request enables many more refactorings, especially around light-client logic initialization outside ofclient/network
.