-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Initial guide text for approvals and especially approvals assignments #1518
Initial guide text for approvals and especially approvals assignments #1518
Conversation
|
||
### Future work | ||
|
||
We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine tune the assignments "no show" system, but long enough "no show" delays suffice probably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should avoid the political problems with validator operators wnting everything to be a remote signer.
|
||
- **Assignments** ensures that each candidates receives enough random checkers, while reducing adversaries odds for obtaining enough checkers, and limiting adversaries foreknowledge. It tracks approval votes to identify "no show" approval check takes suspiciously long, perhaps indicating the node being under attack, and assigns more checks in this case. It tracks relay chain equivocations to determine when adversaries possibly gained foreknowledge about assignments, and adds additional checks in this case. | ||
|
||
- **Approval checks** listens to the assignments subsystem for outgoing assignment notices that we shall check specific candidates. It then performs these checks by first invoking the reconstruction subsystem to obtain the candidate, second invoking the candidate validity utility subsystem upon the candidate, and finally sending out an approval vote, or perhaps initiating a dispute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This references an assignments subsystem which hasn't been defined.
What are "outgoing assignment notices"? Are these notifications from some other piece of code that we need to be checking some particular candidate?
Great that this references reconstruction & candidate validity : ) - that's exactly how this will be implemented.
|
||
### Approval keys | ||
|
||
We need two separate keys for the approval subsystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so this implies yet another session key?
not sure if we have yet migrated a Substrate chain to add more session keys, but it should be doable. However, there is a bootstrapping concern here. When migrating the chain nobody will have yet registered the extra key, but we can't just throw out the validator set.
So I think the process would be that we'd have to add the session key, make an announcement that everyone should rotate session keys, and then enable parachains.
Or is there some way that both of these keys can be the same? It would make the practicalities of upgrading the relay-chain much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the approval vote key be the grandpa key. We could always separate them if some distant future super-sentry node iteration would support consensus running on a separate machine from candidate worker nodes or whatever.
We need either the assignments key to be some new key immune to large slashes, or else some validators would ask that assignments by run in a remote signer, which sounds absolutely nightmarish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes a lot of sense. Expounding on the "risk" these keys have as the reason for the separation would make sense in this section. I'm fine with reusing the GRANDPA key for that, although we will be upgrading that to BLS at some point. Maybe makes most sense to just upgrade and add the extra key type, put out an announcement that every validator should rotate their session keys, and then enable parachains a few weeks later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually cannot wholly replace Schnorr with BLS because BLS verification seems just too slow. We'll likely add a BLS12-381 G1 point with a Schnorr proof-of-possession as either the only or as a second GRANDPA public key, but we've then two choices:
-
We could sign GRANDPA messages first with this BLS public key and then sign that signed message with the Ed25519 public key. We could even replace Ed25519 with Rabin-Williams here, which gets shockingly fast. We'll need slashing condition for when something messy happens.
-
We sign GRANDPA messages with a Schnorr VRF using this BLS public key, so verification runs much slower than Ed25519, but still vastly faster than BLS signatures. At this point the VRF pre-output actually is a BLS signature however, so we can transition smoothly to BLS verification whenever we gain enough signatures for aggregation to help, but doing this avoids any slashing conditions.
I'd wager 1 sounds easiest, so maybe Ed25519 would stick around for quite a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just say to use ed25519 for approval vote keys now, which along with previous tweaks seemingly finishes this one. Anything else?
We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine tune the assignments "no show" system, but long enough "no show" delays suffice probably. | ||
|
||
We shall develop more practical experience with UDP once the availability system works using direct UDP connections. In this, we should discover if reconstruction performs adequately with a complete graphs or | ||
benefits from topology restrictions. At this point, an assignment notices could implicitly request pieces from a random 1/3rd, perhaps topology restricted, which saves one gossip round. If this preliminary fast reconstruction fails, then nodes' request alternative pieces directly. There is an interesting design space in how this overlaps with "slow availability" claims. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that's cool. cc @infinity0
|
||
We liberate availability cores when their candidate becomes available of course, but one approval assignment criteria continues associating each candidate with the core number it occupied when it became available. | ||
|
||
Assignment operates in loosely timed rounds determined by this `DelayTranche`s, which proceed roughly 12 times faster than six second block production assuming half second gossip times. If a candidate `C` needs more approval checkers by the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`. We continue until all candidates have enough approval checkers assigned. We take entire tranches together if we do not yet have enough, so we expect strictly more than enough checkers. We also take later tranches if some checkers return their approval votes too slow (see no shows below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point on delay-tranches. It seems that there is no consensus on which delay-tranches should be used.
For reconstruction and gossip, this seems important. If I receive a reconstruction request, I want it to be legitimized by an assignment proof.
And as I gossip assignments, I will only want to gossip assignments from tranches that I believe should be active. However, how are my peers supposed to know what I accept and what I drop?
The common thread here is to make sure that there is no way for a single validator to create an unbounded amount of assignment proofs that other nodes are forced to circulate or respond to for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point on delay-tranches. It seems that there is no consensus on which delay-tranches should be used.
It's one tranche every k seconds after the relay chain block's slot. I've two numbers in the code: delay tranches start from zero with the relay chain block's slot, while AnV slots are 12 * relay_chain_slot + delay_tranche
give an absolute close. I'll let someone else figure out which should be more or less exposed in the interface, etc.
For reconstruction and gossip, this seems important. If I receive a reconstruction request, I want it to be legitimized by an assignment proof.
Yes and no, we could let validators reconstruct anything, but prioritize approval assignments.
And as I gossip assignments, I will only want to gossip assignments from tranches that I believe should be active. However, how are my peers supposed to know what I accept and what I drop?
You need not drop anything:
- Approval votes are a huge deal, so gossip them always.
- Assignment notices are inherently somewhat limited in number due to being VRFs, so merely save them, and regossip them only when you believe they become viable.
We still need politeness for relay chain block knowledge of course.
Added notes on parameters in 2b2e4f9 |
Added draft code PR in #1558 :) |
We've added a discussion in ecfce2b about this scenario that came up in chat with @pepyakin : A validator with a tranche zero (or other low) assignment never makes their announcement, like because they postponed their work (which is allowed). Yet, they then made this announcement later right around finality. If this announcement gets on-chain (also allowed), then yes it delays finality. If it does not get on-chain, then yes we've one announcement that the off-chain consensus system says is valid, but the chain says was too slow. In this case, the chain wins I'd think. Yet, if the chain wins here then this requires imposing some annoying universal delay upon finality. :( We could prevent nodes from delaying announcing their assignments by too much I think, but not sure about the parameters yet. |
@burdges, Your signature has been received. |
@rphmeier We should chat about the equivocation symmetry: If X and Y are equivocation than differ in parachain rho, so included candidates X[rho] and Y[rho] differ. We risk some subsystem deciding X does not warrant work because Y looks better, but maybe X is an attack, X[rho] is invalid, and Y exists to distract from X. We could say all inclusions get checked, meaning no subsystem could decide X does not warrant work. We might need this for other chain distraction, like maybe X and Y are not even equivocations, but not necessarily. We could alternatively say the candidate equivocations X[rho] and Y[rho] should always be checked, even if we give up on X for other reasons. |
* master: Companion for Substrate #6815 (Dynamic Whitelist) (#1612) Candidate backing respects scheduled collator (#1613) implementers-guide: in TOC move collators before backing, to match protocol pipeline (#1611) Initial guide text for approvals and especially approvals assignments (#1518) Implement validation data refactor (#1585) Implementer's Guide: Make HRMP use upward message kinds (#1591)
No description provided.