-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Even smarter full availability recovery from backers #3216
Comments
Why? We'd expect fewer first line sources to be a pessimization. As a thought experement, imagine bittorrent being our transport: We'd artificially have a torrent with one file for each availability chunk and a local tracker populated by our knowledge of upon backing votes, availability votes, and approval votes. We'd employ partial torrent mode, initially select only the systematic chunks for download, and then flip to full torrent mode later once we have trouble. We only care here about the systematic chunks download phase for this conversation though. How does bittorrent determine from whom it downloads which bittorrent chunks? It'll maybe bias towards more useful peers, but overall it cares little so long as they arrive and the hashes match. It definitely does not exclude peers just because they only hold one chunk. We alraedy have open TLS+Noise connections between all validator pairs. Availability providers should hold systematic chunks in memory, exactly like backers do. It's possible this optimizes over weaknesses elsewhere in our protocol stack, but if so then maybe those places should simply be fixed? |
What I describe here is a cheap incremental optimization and tuning of parameters which is quite easy to implement.
This is somehwat similar to what we have now in the sense that we try systematic chunk holders first and go "full torrent" later. A similar discussion is also in #575
If you meant the network stack , that is being taken care of with litep2p. Otherwise there seem to be other implementations of reed solomon which outperform our current one. |
Why is it an optimization? Is it just because some choice is being made? Aka right now we're dong multiple things unecessarily or something? |
What Andrei is suggesting is to request the full unencoded POV from the backing group (trying them one at a time), even for larger POVs. We currently only try this first for small POVs (under 128Kib). Now, I'm not convinced it's a good idea to do this for all POV sizes. We should test this first |
Thanks @alindima, this is exactly what I mean.
Using @burdges I don't really believe the current systematic chunks approach is tractable in practice if we assume validators haven't got enough bandwidth. Your suggested approach makes the assumption that we'd be faster fetching partial torrent-style vs fetching full availability from a single guy. I agree that would be the case for huge PoVs, 100MB maybe, but for just 5MB, I'd doubt this make any sense with 500Mbit/s network. Systematic chunk recovery would work so much better if we'd have validators hold more than 1 chunk and also fetch them from approval checkers assuming we are later tranche or just slow and have seen other same tranche approve it. |
I see, thanks. Yes good idea. :)
Yeah, we maybe do not reequire the complexity here.
We tune later tranches to be rare-ish, like maybe 15% (?) of total downloads. If we did torrent-like systematic downloads then yes approval checkers could collaboratively download like bittorrent swarms do, even during tranch zero. We could've validators hold multiple chunks of course. Ignoring overhead our bandwidth looks like
It'd total increase availability bandwidht by 5-8 % if we sent one systematic chunk to the 2/3rd of validators not holding systematic chunks. You could also do a second faster erasure code for those other chunks, like sending these validators the xor of some random systematic chunks. As you say, our PoVs blocks remain small enough so that maybe all this looks like overkill. |
Currently we have a static threshold for deciding when we fetch full availability from backers as a first strategy: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/network/availability-recovery/src/lib.rs#L81
We can improve this choice by taking into account multiple parameters:
My suggestion would be to always fetch from backers first if backing group size is 4 or 5, no matter the size of the PoV when the context is
approval
. In a healthy network it should fine as we'd expect the validators to not be hammered by this (around only 48Mpbs which is only 10% of recommended specs )For disputes we might want to be more conservative while for smaller backing groups we should determine a higher threshold.
The text was updated successfully, but these errors were encountered: