Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statement Distribution Per Peer Rate Limit #3444
Statement Distribution Per Peer Rate Limit #3444
Changes from 5 commits
1b1ba25
3345aa6
1734c09
cb1d210
58d828d
b5e7e06
456a466
975e3bd
7056159
3f234e1
7e817fa
913b234
e812dcc
c01be29
af234ef
fdaaf4a
e988c60
83cf297
37fd618
adf84a2
f0a40d5
6099498
07d44ee
09e7367
d1f50c5
e9680d1
678b141
7ff1ddd
f6e8def
8e79dc4
a38762b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be handled in the context of a candidate/relay parent ? Otherwise we would drop legit requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a fair point. Rob mentioned it https://github.com/paritytech-secops/srlabs_findings/issues/303#issuecomment-1588185410 as well.
It's a double-edged sword. If we make it 1 per context then the request/response protocol is more efficient, but with async backing enabled (which to my understanding broadens the scope of possible contexts) the mpsc channel (size 25-ish) can still be easily overwhelmed. The problem of dropped requests gets somewhat a bit better after the recent commit
[Rate limit sending from the requester side](https://github.com/paritytech/polkadot-sdk/pull/3444/commits/58d828db45b7c023e8a8266d3180be4361723519)
which also introduces the rate limit on the requester side so effort is not wasted.If we are already requesting from a peer and want to request something else as well we simply wait for the first one to finish.
What could be argued is the limit as a small constant 1-3 instead of 1 per peerID. I would like to do some testing and more reviews and changing it to a configurable constant instead of a hard limit to 1 is actually a trivial change. (I don't think it's necessary. Better have a single completely sent candidate than two half sent candidates.)
Truth be told this whole solution is far from ideal. It's a fix but the final solution is to manage DoS at a significantly lower level than in parachain consensus. It will do for now but DoS needs to be looked on in more detail in near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather put the constraint on the responder side rather than the requester. The reasoning is to avoid any bugs in other clients implementation due to the subtle requirement to wait for first request to finish.
I totally agree with DoS protection as low as possible in the stack, but it seems to be sensible. Maybe instead of relying on a hardcoded constant we can derive the value based on the async backing parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that constraint on the responder side is more important, but having it in both places seems even better.
Even without this PR the risk seems exactly the same. We already were checking for the number of parallel requests being handled in the requester so even if there would be bugs in other clients they could surface there with or without this PR.
The only really bad scenario would be if for some reason we keep a PeerID marked as still being active when it really isn't effectively blacklisting that peer. If that would happen it means that the future connected to that PeerID is still in the
pending_responses
which would eventually brick the requester anyway since we cannotgo over the max limit ofMAX_PARALLEL_ATTESTED_CANDIDATE_REQUESTS
. Both situations are just as fatal, but the new system at least has extra protections against DoS attempts so it's a security gain.By adding the constraint on the requester side we limit the wasted effort and can potentially safely add reputation updates if we get those unsolicited requests which protects us from DoS even further. (Some honest requests might slip through so the rep update needs to be small.)
We potentially could but also I don't why this value would need to change a lot. Even if we change some async backing params we should be fine. It seems to be more sensitive to channel size as we want to make it hard to dominate that queue as a malicious node and the
MAX_PARALLEL_ATTESTED_CANDIDATE_REQUESTS
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need not choose a uniform distribution over recent relay parents either. Instead the full value on the first couple, then 1/2 and 1/4 for a few, and then 1 one for a while. You need to figure out if the selected distribution breaks collators, but maybe works.