-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assignment of availability-chunk indices to validators #47
Assignment of availability-chunk indices to validators #47
Conversation
Signed-off-by: alindima <alin@parity.io>
This message type will include the relay block hash where the candidate was included. This information will be used | ||
in order to query the runtime API and retrieve the core index that the candidate was occupying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a second thought: is this even possible? If the validator is not aware of the fork, how can it call a runtime API on that fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the validator hasn't seen that fork, it's can't call into runtime API of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree we need such a mapping, but not convinced it needs to be a runtime API (the only benefit to me seems simplifying alternative client impls, but by not much).
Such a mapping doesn't enable systematic chunk recovery - it's possible already - but it spreads the load more evenly across validators (right now all systematic chunks are assigned to the same validators per session regardless of block or core id).
My suggestion is to use a hash of ParaId
for now instead of core_index
, which is readily available in the candidate receipt.
1. mitigates the problem of third-party libraries changing the implementations of the `ChaCha8Rng` or the `rand::shuffle` | ||
that could be introduced in further versions, which would stall parachains. This would be quite an "easy" attack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this argument. Are you talking about supply chain attacks? How is it specific to this RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's an example:
- all validators are running normally, using systematic recovery.
- a contributor on the
rand
crate (malicious or not) makes a perfectly valid change to the shuffling algorithm, that results in a different output forshuffle()
. - there's a polkadot node release that bumps rand to this new version.
- some of the validators upgrade their node.
- as a result, some/all parachains are stalled, because some validators have a differing view of the validator->chunk mapping. Also, the validators that upgraded stop receiving rewards
IMO this has a high chance of happening in the future.
I view this mapping in a similar way to the per-session validator shuffle we do in the runtime to choose the validator active set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now, we could implement our own version of shuffle
, which would mitigate this issue.
The thing I'm concerned about is implementing our own version of ChaCha8Rng
. How feasible is it to assume that it won't change in the future? CC: @burdges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a supply chain attack so much as rand having a history of churn & instability.
1. relatively quick to compute and resource-efficient. | ||
1. when considering the other params besides `validator_index` as fixed, | ||
the function should describe a random permutation of the chunk indices | ||
1. considering `session_index` and `block_number` as fixed arguments, the validators that map to the first N/3 chunk indices should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about my argument to use session_index
here, I think it's a very niche edge-case that's not worth worrying about. An alternative to (session_index
, block_number
) would be block hash which would also sidestep that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair. Another issue I just thought of with using block number is that for disputes hapenning on unimported forks, the ChainAPI call for getting the block number would also fail.
As an alternative to core_index, the `ParaId` could be used. It has the advantage of being readily available in the | ||
`CandidateReceipt`, which would enable the dispute communication protocol to not change and would simplify the | ||
implementation. | ||
However, in the context of [CoreJam](https://github.com/polkadot-fellows/RFCs/pull/31), `ParaId`s will no longer exist | ||
(at least not in their current form). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we shouldn't worry too much about CoreJam compatibility. IIUC there might be a drop-in replacement for ParaId
(AuthId
), so we should first explore an avenue with ParaId
perhaps.
The (bigger) problem with ParaId
is that it's claimable by users, so in the theory one could create a collision attack where multiple paras have the same systematic chunks. In practice, I believe such an attack would be high cost and low benefit. And, perhaps, it could be mitigated by using a cryptographic hash of ParaId
(or AuthId
or core_index
in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the core index here probably, not the para id. I'd thought the core indexes could be spaced somewhat evenly, but with the actual sequence being random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my view here above in #47 (comment)
It's probably more future proof if we do not require the core index here, so the map uses num_cores, num_validators, relay parent, and paraid. We've other options but this looks pretty future proof. CoreJam cannot really work without something like paraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using paraid was my first attempt, but it complicates things quite a bit.
As Andronik notes, it's claimable by a user. I don't think it's the biggest problem though, they can't get a different ParaId every day (can they?).
paraid
's biggest problem is that it doesn't form a strictly monontonically increasing sequence, so we'd have to do something else with it. My initial attempt was to seed a ChaCha8 RNG with it and generate one random number. Then this number was the offset into the chunk index vector.
But then if we complicate it so much, we may want to embed it into the runtime, so that future updates to the rand
crate don't break us (which creates all sorts of problems that could be avoided).
We've had this paraid -> core_index back and forth for a while now, it'd be great if we could all (the ones interested in this RFC) hop on a call or somehow reach a conclusion. There are pros and cons to both, AFAICT.
I think this remains the only bit that prevents this RFC from moving on (anybody correct me if I'm wrong)
This message type will include the relay block hash where the candidate was included. This information will be used | ||
in order to query the runtime API and retrieve the core index that the candidate was occupying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the validator hasn't seen that fork, it's can't call into runtime API of that
Considering feedback and latest discoveries, I'll rewrite the proposal using ParaId instead of CoreIndex and using relay_parent_hash instead of (session_index, block_number) |
Rewritten the proposal. Have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments.
I am still not convinced we need runtime API for this, but curious what others think. If we put the idea to an extreme, we could put most of the client code into the runtime, so it can be automatically and atomically upgraded. But do we want to?
mapping, this mitigates the problem of third-party libraries changing the implementations of the `ChaCha8Rng` or the `rand::shuffle` | ||
that could be introduced in further versions. This would stall parachains if only a portion of validators upgraded the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should obviously not rely in consensus on something that is not specified to be deterministic, like https://polkadot.network/blog/a-polkadot-postmortem-24-05-2021#the-good. And try to reduce third-party dependencies in general.
I would assume that a library that implements ChaCha8Rng
adheres to the spec, otherwise it's a bug.
To mitigate supply-chain issues including bugs, we should probably use cargo-vet
or a similar tool, but again, this is out of scope of and not limited to this RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What we would want is some version/spec identifier. That is only allowed to change at session boundaries. Then we can put a requirements on nodes to implement that spec and once enough clients did, we can do the switch.
While we are at it, this should probably take into account the used erasure coding itself as well. We should be able to swap it out for a better implementation if the need arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is off-topic for this RFC, but as a heads up we already use ChaCha20
and shuffle
for the validators gossip topology: https://github.com/paritytech/polkadot-sdk/blob/2d09e83d0703ca6bf6aba773e80ea14576887ac7/polkadot/node/network/gossip-support/src/lib.rs#L601-L610
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What we would want is some version/spec identifier. That is only allowed to change at session boundaries. Then we can put a requirements on nodes to implement that spec and once enough clients did, we can do the switch.
While we are at it, this should probably take into account the used erasure coding itself as well. We should be able to swap it out for a better implementation if the need arises.
Yes. We'll use the new NodeFeatures
runtime API for that: paritytech/polkadot-sdk#2177
If we'll make changes in the future to either the shuffling algorithm or the underlying reed-solomon algorithm, we can add a new feature bit there
Thanks for the reviews and suggestions Andronik! |
mapping, this mitigates the problem of third-party libraries changing the implementations of the `ChaCha8Rng` or the `rand::shuffle` | ||
that could be introduced in further versions. This would stall parachains if only a portion of validators upgraded the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What we would want is some version/spec identifier. That is only allowed to change at session boundaries. Then we can put a requirements on nodes to implement that spec and once enough clients did, we can do the switch.
While we are at it, this should probably take into account the used erasure coding itself as well. We should be able to swap it out for a better implementation if the need arises.
) -> ChunkIndex { | ||
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3 | ||
let seed = derive_seed(relay_parent); | ||
let mut rng: ChaCha8Rng = SeedableRng::from_seed(seed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Can we not deterministically arrive at an assignment based on block number and para ids, in a way that perfectly evens out load?
For example: We use block number % n_validators
as a starting index. Then we take threshold
validators starting from that position to be systemic chunk indices for the first para in that block. The next para starts at the additional offset threshold
, so block_number + threshold % n_validators
and so on.
We could shuffle validator indices before doing that, but I don't see how this gains us anything.
Now, using information that is not available as part of the candidate receipt was part of the problem we wanted to avoid. What is the "next" para, this information is not necessarily available in disputes. Essentially this means we are using the core number.
But:
- It usually is, most validators should have seen a block that got disputed, otherwise the candidates could never have gotten included.
- Disputes are not the hot path. They are an exceptional event, that should barely ever happen. It should not be an issue, if disputes are not able to use systemic chunks always or even ever.
There are other cases, e.g. collators recovering availability because the block author is censoring. But also those should be exceptional. If systemic chunk recovery would not be possible here, it would not be a huge problem either. On top of the fact that this should also not be a common case, the recovery here is also not done by validators - so worse recovery performance would not be an issue to the network.
Summary:
Systemic chunk recovery should only be important/relevant in approval voting, where we have to recover a whole lot of data every single relay chain block.
Therefore it would be good, if systemic chunks worked always, but I would consider it totally acceptable if it were an optimization that would only be supported in approval voting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid paraid here imho, not overly keen on relay parent either.
Instead use era/session, slot, and chain spec to define the validator sequence, and then core index to define the start position in the validator sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validators are already randomized at the beginning of the session. Core index for start position makes sense. What is wrong with using the block number in addition?
Reason, I would like to have it dependent on the block (could also be slot, I just don't see the benefit) is that by having a start position by core index, we ensure equal distribution of systemic chunks across a block, but paras are not all equal. Some could be heavier than others, hence it would be beneficial to change the mapping each block.
In my opinion we could also use the hash instead of the block number, here I think anything is likely better than static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want slot or block I think. We're not overly adversarial here, but you can manipulate block numbers easily, while slot numbers represent an opportunity for doing something, so you always pay 19 DOTs or more to chose another one. I'd say slot not block.
We randomize the validator list based upon the randomness two epochs/sessions ago? Cool. Any idea if we similarly randomize the map from paraids to cores per era/session too? If yes, then maybe we could just progress sequentially through them?
let k = num_validators / num_cores;
let fist_validator_index_for_core = ((core_id - slot) * k % num_validators) as u32;
We'd prefer the randomization of core_id for this because otherwise you could still make hot spots. We could've hot spots even with this scheme, but not so easy to make them. We'd avoid those if we randomize per slot.
Also this exactly computation does not work due to signed vs unsigned, and the fact that it suggests things progress backwards as time progresses time, which again tried to avoid hot spots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea if we similarly randomize the map from paraids to cores per era/session too?
AFAICT from the scheduler code, we don't (at least for the parachain auction model; for on-demand, I see more complicated logic which takes into account core affinities).
We're not overly adversarial here, but you can manipulate block numbers easily, while slot numbers represent an opportunity for doing something, so you always pay 19 DOTs or more to chose another one. I'd say slot not block.
I'm having a hard time understanding the advantage of slot number vs block number. This may be simply because I don't know that much about slots. AFAICT, slots are equally useful as block number for the mapping function (monotonically increasing by 1), except that there may be slots that go unoccupied (if the chain is stalled for example) and are therefore skipped. Is that correct?
so you always pay 19 DOTs or more to chose another one
what is this fee? where can I read more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@burdges I don't get your arguments. How can block numbers be manipulated? They are always increasing by 1, you clearly must be talking about forks. So an adversary could create multiple forks, with all the same load distribution and tries to overload validators this way? If we create forks or reversions we already have a performance problem anyway.
Really not getting how slot numbers are better here. I also don't get your argument about hot spots, my proposal above was precisely to avoid hot spots (by not using randomness). What do you mean by hot spots and how would randomness help here?
A single validator not providing its systemic chunk would be enough to break systemic recovery. I don't see how randomization schemes help here. If we were smart we could somehow track the validators withholding systemic chunks and then make an assignment where they are all pooled together into one candidate. This way, at least only one systemic recovery fails. (We could equally well just remove them entirely.)
But honestly, I would not worry too much about this here. If we ever found that validators try to mess with this on purpose, the threat is low enough that social (calling them out) and governance would be adequate measures to deal with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the ideal would be to use the relay parent hash directly. Then we don't even need to be able to lookup the block number to determine systemic chunk indices. We will eventually have the core index in the candidate receipt - it would be really good to have this self contained at least once we have that.
Obviously hashes can be influenced trivially by block authors ... but is this a real issue? Assuming we have enough cores, load will be pretty much evenly distributed among validators no matter the hash. The only thing that changes is to which core one gets assigned. I don't mind too much if this can be influenced ... Are there any real concerns here?*)
*) As the system matures, I would assume (especially with CoreJam or similar developments) that candidates will be pretty evened out in load (maxed out). So a validator should not gain much by picking which parachain it wants to have a systemic chunk for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single validator not providing its systemic chunk would be enough to break systemic recovery
Slightly offtopic: In my WIP PR I added functionality to request up to 5 systematic chunks from the backing group as a backup solution, so that a couple of validators not returning their systematic chunks would not invalidate the entire procedure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could permit requesting them all from the backing group, but make the availability providers the first choice, meaning maybe: 1st, try all systemic chunk providers. 2nd, try remaining systemic chunks from backers. 3rd, fetch random non-systemic chunks. The concern is just that we overload the backers.
It'll also make rewards somewhat more fragile, but likely worth the difficulty for the performance.
(from backing to inclusion). | ||
|
||
Availability-recovery can currently be triggered by the following phases in the polkadot protocol: | ||
1. During the approval voting process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as recovery is possible in disputes, it should be fine if it can not always use systemic recovery. What is missing for me, is to understand why this needs to be an incompatible change. Shouldn't recovery be possible always, even if you don't know what the systemic chunks are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when you request a chunk, you need to know which chunk to request. If you specify a wrong chunk, you'll get an empty response. So everyone should be onboard how chunks are shuffled in order for recovery to work, not just from systematic chunks. Unless we rely on backers to have all the chunks, which we can't in case of a dispute.
We could add an additional request type to request any chunk you have. That could probably be a reasonable fallback in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add an additional request type to request any chunk you have. That could probably be a reasonable fallback in this case.
I don't think this achieves the purpose of availability-recovery working with an arbitrary mix of upgraded & non-upgraded validators. Adding a new request type (or even changing the meaning of the existing one) would also mean a node upgrade (because responding to the new type of request is only useful for the nodes that haven't yet upgraded to use the new mapping). So validators might as well just upgrade to the version that uses the new shuffle. I'm not sure this has any benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant as a fallback in case we want to use core_id
s which might be not readily available in some cases of disputes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant as a fallback in case we want to use core_ids which might be not readily available in some cases of disputes.
I see. In this case, it would be a reasonable fallback. But this would mean giving up systematic recovery in those cases. I'm wondering if it's worth doing all of this just to replace the para-id thingy with the core_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh fine. We'll loose the mappings sometime before the data expires? We should probably avoid doing that. We can obviously ask everyone and account for the DLs differently, but I'd prefer to be more like bittorrent trackers here, so that parachains can fetch data this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll loose the mappings sometime before the data expires?
It can happen if a validator imports disputes statements for a dispute ongoing on some fork that the node has not imported yet.
Or even if a validator was just launched and has no data recorded about what core the block was occupying while pending availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. All validators will know the correct mapping, if they have seen (and still know) the relay parent. There is a good chance that this is the case even in disputes, but it is not guaranteed and recovery must work even in that case (but can be slower).
Using the core index should be fine, what makes that even more acceptable to me is that we actually want (and need to) include the core index in the candidate receipt at some point. So the moment we have this change, then the block availability will cease to matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the core index should be fine, what makes that even more acceptable to me is that we actually want (and need to) include the core index in the candidate receipt at some point.
That's needed for CoreJam I would assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elastic scaling also.
Thanks to everybody for the feedback!
Note that we'll keep asserting that enabling this feature needs to happen atomically between validators and collators. One thing that I've not decided on is the exact assignment function implementation. I think it could be (inspired by other comments): pub fn get_chunk_index(
n_validators: u32,
validator_index: ValidatorIndex,
block_number: BlockNumber,
core_index: CoreIndex
) -> ChunkIndex {
let threshold = systematic_threshold(n_validators); // Roughly n_validators/3
let core_start_pos = abs(core_index - block_number) * threshold;
(core_start_pos + validator_index) % n_validators
} |
Do you mean introducing v3 version of the req-resp protocol?
If the answer to the previous question is yes, I don't see how this is not a breaking change. If no, the receiving side is validating the chunk index matches the response, so I don't see how it's not a breaking change. I think it would be reasonable to introduce req-resp v3 and later after some time enable this assignment via the runtime feature flag. |
No, I didn't want to add a new version to the protocol. The idea was that we assume all validators have upgraded to have the same validator->index mapping and we also assume that two thirds of them are honest and responded with the right chunk they should be holding according to the mapping (but we don't check that they responded with the right chunk, because we may not know it beforehand).
That's a good point. Same as before. The validator would issue the request, and in this specific case also check the index. The idea is that validators will usually check the indices they get (during availability-distribution or systematic recovery during approval checking), but the regular recovery will also work when the entity (collator or validator participating in a dispute) does not know/check the indices against the canonical mapping.
you're right, it's a breaking change. I guess what I meant is that, after the upgrade, regular recovery will still work even if the collator does not have access to the canonical mapping. I'll reword my comment.
Experimenting a bit more with what I proposed above, I realised that we either bump the protocol or we do some other arguably hacky way of discovering the chunk index on the receveing side (iterating through keys after reconstructing the partial trie from the proof and checking which value matches the chunk hash). Considering that you called out the fact that even not bumping the protocol version would still be a breaking change to collators, I think it's best to bump the protocol version and add the |
This is actually not true with agile core time: A parachain bids specifically on one particular core. But it also should not matter much: We are rotating the mapping every block. On top of that, there is only so much a parachain can do here anyway.
This is not what we discussed. I suggested to keep requesting via validator index. Right now we also store in the av-store per validator index - if we don't change that, then a node can provide a chunk as requested by validator index, even if it does not know the mapping itself*). Hence everything would keep working with neither the requester nor the responder still knowing the mapping. *) Assuming the chunk data itself gets stored together with the chunk index. Once we have the core index in the receipt, this would strictly speaking no longer be necessary, but I also don't see much harm and if we really wanted to, we could change the network protocol again once we have this. Given that changing the candidate receipt is already a pita ... would not be that much of an additional annoyance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Ideally we could at least eventually make this self-contained, but block number or slot number of the including block would be a deal breaker. Either we figure out something better or we have to conclude that fully self contained mapping is not possible even with an updated candidate receipt.
I mean it is possible - worst thing that can happen is that load distribution is not optimal and that it can be influenced by carefully picking relay parents. 🤔
pub fn get_chunk_index( | ||
n_validators: u32, | ||
validator_index: ValidatorIndex, | ||
block_number: BlockNumber, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used block hash (relay parent), then by putting the core index into the candidate receipt we would have everything needed for the lookup to be self contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok using the relay parent of the candidate is at the very least not ideal because with async backing, the used relay parent could vary from candidate to candidate in the same block, which means that the equal load distribution might end up not being that equal after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, we could avoid using block hash or number entirely and just use the core index. Assuming all cores are occupied, it will just mean fixed mapping within a session from validators to bulk parachains chunks (on-demand would still rotate their mapping to cores I assume). That way it might be easier to screw (intentionally or not) the systematic recovery for a particular parachain for an entire session. OTOH, we need to handle one (or two) missing systematic recovery chunk in practice and fall back to using backers to some extent. Maybe worth mentioning the fallback bit in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, we could avoid using block hash or number entirely and just use the core index. Assuming all cores are occupied, it will just mean fixed mapping within a session from validators to bulk parachains chunks (on-demand would still rotate their mapping to cores I assume). That way it might be easier to screw (intentionally or not) the systematic recovery for a particular parachain for an entire session.
Yeah, as you said, I think it would be too easy to screw up. It could result in a not-so-even load distribution, because sessions can be quite long and some validators would be too unlucky to be assigned to a high-throughput para for several hours. We also don't know which validators are lazier or ill-intended and switching only once per session would make this visible for some parachains more than the others.
OTOH, we need to handle one (or two) missing systematic recovery chunk in practice and fall back to using backers to some extent. Maybe worth mentioning the fallback bit in the RFC.
Yes, I added this bit to the RFC. I suggest we request at most one chunk from each validator in the backing group as a fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is whether this matters? Yes indeed, validators could then consistently mess with the systemic recovery of one particular parachain, instead of messing with systemic recovery of multiple parachains.... so what?
- It should not make that much of a difference: They are slowing down other validators. This is also in availability recovery, not distribution: Therefore the only affect that such an attack can have is that finality is a bit slower or that validators get overloaded and fail to back things for example. But both are independent of a parachain - it does not matter for which parachain causes this.
- Backers are still rotating. So if some validators refuse to provide systemic chunks, we can still fetch them from the backers.
Now the only real argument for rotation every block in my opinion is, again to even out load. In this particular case it would make a difference, if some paras always fully fill their blocks, while others are keeping them mostly empty. But, I would argue that we we should solve this problem by incentivizing full block utilization and not worry about this here too much, at least not until it manifests in a real problem. In fact, we also have another way of solving this if it ever proves beneficial: We could rotate para id to core id assignments instead.
TL;DR: I like @ordian 's idea to just not rotate. There are downsides to this, but having the recovery be self contained is quite valuable. Let's start simple and go only more complex if it proves necessary?
@burdges am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what you mean here. We'll have the candidate reciept when fetching data, no?
Yes, and we'd like availability-recovery to be possible when having nothing more than the candidate receipt. The problem is that the receipt does not contain any info about the slot or block number.
For the moment, the core index isn't there either, but the plan is to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you. We could only use some historical slot, not the slot where the candidate recipet gets backed.
All these options give some control of course, even the slot where backed, so relay parent hash works better than any slot choice, due to simplicity.
As I understand it @eskimor suggests our map depend upon only num_cores, num_validators, and core index, so each core has their systemic validators fixed for the session. It avoids bias except through selecting your core. I'd wager it worsens user experence, but only slightly.
We do however rotate backing groups and backers provide systemic chunks too, hence the slightly above. How do we determin the backers from the candidate recipet? Just because they signed the candidate recipet?
It's likely fine either way. We'll anyways have systemic reconstruction if each systemic chunk can be fetched from some backers or its one availability provider.
Actually who determines core index? We'd ideas where this occurs after the candidate reciept. We'd enable these if the map depends upon only num_cores, num_validators, relay parent, and paraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these options give some control of course, even the slot where backed, so relay parent hash works better than any slot choice, due to simplicity.
As @eskimor said, "using the relay parent of the candidate is at the very least not ideal because with async backing, the used relay parent could vary from candidate to candidate in the same block, which means that the equal load distribution might end up not being that equal after all."
As I understand it @eskimor suggests our map depend upon only num_cores, num_validators, and core index, so each core has their systemic validators fixed for the session. It avoids bias except through selecting your core. I'd wager it worsens user experence, but only slightly.
yes, that's the suggestion AFAIU.
How do we determin the backers from the candidate recipet? Just because they signed the candidate recipet?
No, we only use the backers currently during approval-voting, when we have access to the relay block and we can query the session's validator groups from the runtime. For complexity reasons, we don't even verify that validators in the group all backed the candidate. We just assume that to be true (in practice, it's mostly true).
Actually who determines core index? We'd ideas where this occurs after the candidate reciept. We'd enable these if the map depends upon only num_cores, num_validators, relay parent, and paraid.
Currently, for the existing lease parachains, there's a fixed assignment between the paraid and the core index. @eskimor you know more here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just go with only the core index, from the discussion I don't see any real problems with that and once we have the core index in the candidate receipt we are golden from the simplicity/robustness perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I'll update the document
this will generally not be true. | ||
|
||
The requester will send the request to validator with index `V`. The responder will map the `V` validator index to the | ||
`C` chunk index and respond with the `C`-th chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The responder does not even need to do that, if we keep storing per validator index in the av store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does. It needs to supply the chunk index to the requester (for verification purposes and because the reconstruction algorithm seems to need it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have sworn, that I wrote somewhere that we would need to store the chunk index with the chunk in the av-store for this of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. yeah, that makes sense now 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit more.
The backing subsystem will issue a request to av-store to store all chunks. For that, we need to know the core index in order to store the ValidatorIndex -> Chunk as you suggest (so that we can compute the mapping).
Since we don't yet have the core index in the receipt, the backing subsystem needs to know the per-relay parent core_index assignment of the local validator.
From my knowledge, this would be just fine. When doing attestation, the availabiliy_cores runtime API already gets the core_index for us (but doesn't store it yet).
The slight caveat is that, when importing a statement, we may also have to call the availability_cores runtime API to see which core our para has been scheduled on. but it's no big deal, we need to have access to the relay parent anyway when doing candidate validation.
@eskimor please weigh in on my analysis. until elastic scaling, a para id couldn't be scheduled on multiple cores and the core assignment could only change on a relay-block boundary. And when we'll get elastic scaling, we'll have the core_index in the receipt anyway. so all good.
|
||
On the other hand, collators will not be required to upgrade, as regular chunk recovery will work as before, granted | ||
that version 1 of the networking protocol has been removed. However, they are encouraged to upgrade in order to take | ||
advantage of the faster systematic recovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think collators have to upgrade as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
I think they just need to upgrade to the v2 networking protocol. Once that's done, they don't really need to upgrade to use systematic recovery. They'll request regular chunks as before.
That's why I added this bit only on the step 2 (which enables systematic recovery). Upgrade for step 1 will still be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly unrelated, but collators don't use recovery in a happy path. In can be needed in case there are malicious collators withholding the data. So it doesn't need to be optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think they just need to upgrade to the v2 networking protocol.
Ok, to me that sounded like an upgrade. In other words: If you already have the requirement that they upgrade their network protocol, the rest does not sound like an issue anymore.
It's likely fine. We want the systemic chunks spread evenly enough to minimize other protocol manipulation. It's possible some parachains chose their cores to make frierndly validators more profitable, which overworks some other validators, and causes reconstructions. It's probably fine though. |
Noted 👍🏻
Yes, that's what I meant - requesting by validator index just as before. About storing in the DB by validator index - that would still require that one of the actors (requester/responder) needs to know the chunk index because the reconstruction algorithm needs the indices. As discussed, the requester may not know this, so that's why I suggested we return the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left small remarks, looking good otherwise
|
||
#### Step 1: Enabling new network protocol | ||
In the beginning, both `/polkadot/req_chunk/1` and `/polkadot/req_chunk/2` will be supported, until all validators and | ||
collators have upgraded to use the new version. V1 will be considered deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a heads up that parachains may take a very long time to upgrade their collators's polkadot-sdk branch, many of them are still based on 0.9.x versions
|
||
On the other hand, collators will not be required to upgrade, as regular chunk recovery will work as before, granted | ||
that version 1 of the networking protocol has been removed. However, they are encouraged to upgrade in order to take | ||
advantage of the faster systematic recovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly unrelated, but collators don't use recovery in a happy path. In can be needed in case there are malicious collators withholding the data. So it doesn't need to be optimized.
pub fn get_chunk_index( | ||
n_validators: u32, | ||
validator_index: ValidatorIndex, | ||
block_number: BlockNumber, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, we could avoid using block hash or number entirely and just use the core index. Assuming all cores are occupied, it will just mean fixed mapping within a session from validators to bulk parachains chunks (on-demand would still rotate their mapping to cores I assume). That way it might be easier to screw (intentionally or not) the systematic recovery for a particular parachain for an entire session. OTOH, we need to handle one (or two) missing systematic recovery chunk in practice and fall back to using backers to some extent. Maybe worth mentioning the fallback bit in the RFC.
/// The index of the chunk to fetch. | ||
pub index: ValidatorIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be stay the same IIUC in v2, but the meaning (and the doc comment) will be different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will stay the same, indeed. The meaning will change, in the sense that we cannot expect any more that the ValidatorIndex will be equal to the returned ChunkIndex. I hope I describe this in sufficient detail in the following paragraph
} | ||
} | ||
|
||
Decode::decode(&mut &systematic_bytes[..]).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this remove the end zero padding if odd length?
Anyways we should be careful about the boundary here, like maybe this should live in the erasure coding crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, scale decoding ignores the trailing zeros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to know the exact number of zeroed bytes added as padding, we need to know the size of the input data that was encoded.
Unfortunately, we don't have easy access to that in polkadot, unless we add it to the CandidateReceipt
.
But scale decoding it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran some roundtrip quickcheck tests on regular chunk reconstruction. The regular reed-solomon code can have extra zeroed padding when reconstructing. So truncation to the expected size was already needed.
let mut threshold = (n_validators - 1) / 3; | ||
if !is_power_of_two(threshold) { | ||
threshold = next_lower_power_of_2(threshold); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, next lower power of two is correct here.
We encode 2^{x+1}
chunks where 2^x < n_validators <= 2^{x+1}
of course. We need threshold < (n_validators-epsilon)/3
though, so assuming epsilon=1
then we reconstruct from 2^{x-1}
chunks where 2^{x-1} <= (n_validators-1) / 3 < 2^x
.
If n_validators = 2^{x+1}
, so no wasted encoding, then we reconstruct from n_validators / 4
chunks, and our reconstruction threshold winds up (n_validators - 4) / 12
smaller than necessary. If otoh (n_validators-1)/3 = 2^{x-1}
, so optimal reconstruction threshold, then n_validators = 3 * 2^{x-1} + 1
unecessarily encodes 2^{x-1} - 1
extra chuinks.
I think that the RFC is in a form that can be proposed and voted on. Please give another review and approve it if you think so. |
Previously, it was only possible to retry the same request on a different protocol name that had the exact same binary payloads. Introduce a way of trying a different request on a different protocol if the first one fails with Unsupported protocol. This helps with adding new req-response versions in polkadot while preserving compatibility with unupgraded nodes. The way req-response protocols were bumped previously was that they were bundled with some other notifications protocol upgrade, like for async backing (but that is more complicated, especially if the feature does not require any changes to a notifications protocol). Will be needed for implementing polkadot-fellows/RFCs#47 TODO: - [x] add tests - [x] add guidance docs in polkadot about req-response protocol versioning
/rfc propose |
Hey @eskimor, here is a link you can use to create the referendum aiming to approve this RFC number 0047. Instructions
It is based on commit hash 4ae75296bfdeb1b2ca1e9d20f78c1a783475de47. The proposed remark text is: |
/rfc process |
Please provider a block hash where the referendum confirmation event is to be found.
|
/rfc process 0xe750513eb583410686c094e8e5f00c0b18dc33e0296c1c8572c2e83f021c03a5 |
The on-chain referendum has approved the RFC. |
Previously, it was only possible to retry the same request on a different protocol name that had the exact same binary payloads. Introduce a way of trying a different request on a different protocol if the first one fails with Unsupported protocol. This helps with adding new req-response versions in polkadot while preserving compatibility with unupgraded nodes. The way req-response protocols were bumped previously was that they were bundled with some other notifications protocol upgrade, like for async backing (but that is more complicated, especially if the feature does not require any changes to a notifications protocol). Will be needed for implementing polkadot-fellows/RFCs#47 TODO: - [x] add tests - [x] add guidance docs in polkadot about req-response protocol versioning
**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes #598 Also implements [RFC #47](polkadot-fellows/RFCs#47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](#2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: #598 (comment) #### TODO: - [x] [RFC #47](polkadot-fellows/RFCs#47) - [x] merge #2177 and rebase on top of those changes - [x] merge #2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see #598 (comment) - [x] Rewrite the implementer's guide documentation - [x] #3065 - [x] paritytech/zombienet#1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by: alindima <alin@parity.io> Co-authored-by: Javier Viola <javier@parity.io>
**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes paritytech#598 Also implements [RFC paritytech#47](polkadot-fellows/RFCs#47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](paritytech#2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: paritytech#598 (comment) #### TODO: - [x] [RFC paritytech#47](polkadot-fellows/RFCs#47) - [x] merge paritytech#2177 and rebase on top of those changes - [x] merge paritytech#2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see paritytech#598 (comment) - [x] Rewrite the implementer's guide documentation - [x] paritytech#3065 - [x] paritytech/zombienet#1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by: alindima <alin@parity.io> Co-authored-by: Javier Viola <javier@parity.io>
**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes paritytech#598 Also implements [RFC paritytech#47](polkadot-fellows/RFCs#47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](paritytech#2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: paritytech#598 (comment) #### TODO: - [x] [RFC paritytech#47](polkadot-fellows/RFCs#47) - [x] merge paritytech#2177 and rebase on top of those changes - [x] merge paritytech#2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see paritytech#598 (comment) - [x] Rewrite the implementer's guide documentation - [x] paritytech#3065 - [x] paritytech/zombienet#1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by: alindima <alin@parity.io> Co-authored-by: Javier Viola <javier@parity.io>
Rendered
This RFC proposes a way of permuting the availability chunk indices assigned to validators for a given core and relay chain block, in the context of recovering available data from systematic chunks.