Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Initial guide text for approvals and especially approvals assignments #1518

Merged
merged 30 commits into from
Aug 18, 2020

Conversation

burdges
Copy link
Contributor

@burdges burdges commented Aug 1, 2020

No description provided.

@cla-bot-2020
Copy link

cla-bot-2020 bot commented Aug 1, 2020

@burdges it looks like you have not signed our contributor license aggreement yet. Please visit this link to sign our agreement. This pull request cannot be merged until the agrement is signed.


### Future work

We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine tune the assignments "no show" system, but long enough "no show" delays suffice probably.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should avoid the political problems with validator operators
wnting everything to be a remote signer.

- **Assignments** ensures that each candidates receives enough random checkers, while reducing adversaries odds for obtaining enough checkers, and limiting adversaries foreknowledge. It tracks approval votes to identify "no show" approval check takes suspiciously long, perhaps indicating the node being under attack, and assigns more checks in this case. It tracks relay chain equivocations to determine when adversaries possibly gained foreknowledge about assignments, and adds additional checks in this case.

- **Approval checks** listens to the assignments subsystem for outgoing assignment notices that we shall check specific candidates. It then performs these checks by first invoking the reconstruction subsystem to obtain the candidate, second invoking the candidate validity utility subsystem upon the candidate, and finally sending out an approval vote, or perhaps initiating a dispute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This references an assignments subsystem which hasn't been defined.

What are "outgoing assignment notices"? Are these notifications from some other piece of code that we need to be checking some particular candidate?

Great that this references reconstruction & candidate validity : ) - that's exactly how this will be implemented.


### Approval keys

We need two separate keys for the approval subsystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so this implies yet another session key?

not sure if we have yet migrated a Substrate chain to add more session keys, but it should be doable. However, there is a bootstrapping concern here. When migrating the chain nobody will have yet registered the extra key, but we can't just throw out the validator set.

So I think the process would be that we'd have to add the session key, make an announcement that everyone should rotate session keys, and then enable parachains.

Or is there some way that both of these keys can be the same? It would make the practicalities of upgrading the relay-chain much simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the approval vote key be the grandpa key. We could always separate them if some distant future super-sentry node iteration would support consensus running on a separate machine from candidate worker nodes or whatever.

We need either the assignments key to be some new key immune to large slashes, or else some validators would ask that assignments by run in a remote signer, which sounds absolutely nightmarish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes a lot of sense. Expounding on the "risk" these keys have as the reason for the separation would make sense in this section. I'm fine with reusing the GRANDPA key for that, although we will be upgrading that to BLS at some point. Maybe makes most sense to just upgrade and add the extra key type, put out an announcement that every validator should rotate their session keys, and then enable parachains a few weeks later

Copy link
Contributor Author

@burdges burdges Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually cannot wholly replace Schnorr with BLS because BLS verification seems just too slow. We'll likely add a BLS12-381 G1 point with a Schnorr proof-of-possession as either the only or as a second GRANDPA public key, but we've then two choices:

  1. We could sign GRANDPA messages first with this BLS public key and then sign that signed message with the Ed25519 public key. We could even replace Ed25519 with Rabin-Williams here, which gets shockingly fast. We'll need slashing condition for when something messy happens.

  2. We sign GRANDPA messages with a Schnorr VRF using this BLS public key, so verification runs much slower than Ed25519, but still vastly faster than BLS signatures. At this point the VRF pre-output actually is a BLS signature however, so we can transition smoothly to BLS verification whenever we gain enough signatures for aggregation to help, but doing this avoids any slashing conditions.

I'd wager 1 sounds easiest, so maybe Ed25519 would stick around for quite a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just say to use ed25519 for approval vote keys now, which along with previous tweaks seemingly finishes this one. Anything else?

We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine tune the assignments "no show" system, but long enough "no show" delays suffice probably.

We shall develop more practical experience with UDP once the availability system works using direct UDP connections. In this, we should discover if reconstruction performs adequately with a complete graphs or
benefits from topology restrictions. At this point, an assignment notices could implicitly request pieces from a random 1/3rd, perhaps topology restricted, which saves one gossip round. If this preliminary fast reconstruction fails, then nodes' request alternative pieces directly. There is an interesting design space in how this overlaps with "slow availability" claims.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's cool. cc @infinity0


We liberate availability cores when their candidate becomes available of course, but one approval assignment criteria continues associating each candidate with the core number it occupied when it became available.

Assignment operates in loosely timed rounds determined by this `DelayTranche`s, which proceed roughly 12 times faster than six second block production assuming half second gossip times. If a candidate `C` needs more approval checkers by the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`. We continue until all candidates have enough approval checkers assigned. We take entire tranches together if we do not yet have enough, so we expect strictly more than enough checkers. We also take later tranches if some checkers return their approval votes too slow (see no shows below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point on delay-tranches. It seems that there is no consensus on which delay-tranches should be used.

For reconstruction and gossip, this seems important. If I receive a reconstruction request, I want it to be legitimized by an assignment proof.

And as I gossip assignments, I will only want to gossip assignments from tranches that I believe should be active. However, how are my peers supposed to know what I accept and what I drop?

The common thread here is to make sure that there is no way for a single validator to create an unbounded amount of assignment proofs that other nodes are forced to circulate or respond to for some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point on delay-tranches. It seems that there is no consensus on which delay-tranches should be used.

It's one tranche every k seconds after the relay chain block's slot. I've two numbers in the code: delay tranches start from zero with the relay chain block's slot, while AnV slots are 12 * relay_chain_slot + delay_tranche give an absolute close. I'll let someone else figure out which should be more or less exposed in the interface, etc.

For reconstruction and gossip, this seems important. If I receive a reconstruction request, I want it to be legitimized by an assignment proof.

Yes and no, we could let validators reconstruct anything, but prioritize approval assignments.

And as I gossip assignments, I will only want to gossip assignments from tranches that I believe should be active. However, how are my peers supposed to know what I accept and what I drop?

You need not drop anything:

  • Approval votes are a huge deal, so gossip them always.
  • Assignment notices are inherently somewhat limited in number due to being VRFs, so merely save them, and regossip them only when you believe they become viable.

We still need politeness for relay chain block knowledge of course.

@burdges
Copy link
Contributor Author

burdges commented Aug 7, 2020

Added notes on parameters in 2b2e4f9

@burdges burdges marked this pull request as ready for review August 7, 2020 02:05
@burdges
Copy link
Contributor Author

burdges commented Aug 10, 2020

Added draft code PR in #1558 :)

@burdges
Copy link
Contributor Author

burdges commented Aug 10, 2020

We've added a discussion in ecfce2b about this scenario that came up in chat with @pepyakin :

A validator with a tranche zero (or other low) assignment never makes their announcement, like because they postponed their work (which is allowed). Yet, they then made this announcement later right around finality. If this announcement gets on-chain (also allowed), then yes it delays finality. If it does not get on-chain, then yes we've one announcement that the off-chain consensus system says is valid, but the chain says was too slow.

In this case, the chain wins I'd think. Yet, if the chain wins here then this requires imposing some annoying universal delay upon finality. :( We could prevent nodes from delaying announcing their assignments by too much I think, but not sure about the parameters yet.

@cla-bot-2020
Copy link

cla-bot-2020 bot commented Aug 10, 2020

@burdges, Your signature has been received.

@burdges
Copy link
Contributor Author

burdges commented Aug 15, 2020

@rphmeier We should chat about the equivocation symmetry: If X and Y are equivocation than differ in parachain rho, so included candidates X[rho] and Y[rho] differ. We risk some subsystem deciding X does not warrant work because Y looks better, but maybe X is an attack, X[rho] is invalid, and Y exists to distract from X. We could say all inclusions get checked, meaning no subsystem could decide X does not warrant work. We might need this for other chain distraction, like maybe X and Y are not even equivocations, but not necessarily. We could alternatively say the candidate equivocations X[rho] and Y[rho] should always be checked, even if we give up on X for other reasons.

@rphmeier rphmeier added A8-mergeoncegreen B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 18, 2020
@rphmeier rphmeier merged commit 7fcefb8 into paritytech:master Aug 18, 2020
ordian added a commit that referenced this pull request Aug 20, 2020
* master:
  Companion for Substrate #6815 (Dynamic Whitelist) (#1612)
  Candidate backing respects scheduled collator (#1613)
  implementers-guide: in TOC move collators before backing, to match protocol pipeline (#1611)
  Initial guide text for approvals and especially approvals assignments  (#1518)
  Implement validation data refactor (#1585)
  Implementer's Guide: Make HRMP use upward message kinds (#1591)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants