Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get parachain runtime/code upgrades off chain #971

Open
eskimor opened this issue May 5, 2021 · 7 comments
Open

Get parachain runtime/code upgrades off chain #971

eskimor opened this issue May 5, 2021 · 7 comments
Labels
I5-enhancement An additional feature request. I6-meta A specific issue for grouping tasks or bugs of a specific category. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@eskimor
Copy link
Member

eskimor commented May 5, 2021

Right now we distribute a parachain runtime upgrade via the relay chain and statement distribution, which cannot really work long term as statement distribution is time critical and parachain runtime upgrades can be large, to make matters worse - that large data is transferred twice in a time critical path - once in statements and then in the distributed relay chain block. Storing large amounts of data in chain storage is also not good for performance.

Plan for resolving this:

  • Don't distribute code changes in statements, but only the hash.
  • Have a separate asynchronous gossip protocol with messages "I have code" - this message should be signed by the validator.
  • Other validators can request code updates from validators who sent them "I have code" messages (just as we do in statement distribution right now).
  • Validators keep track of those received "I have code messages" and check their signature.
  • Once enough signed "I have code messages" have been gathered on chain, a runtime upgrade is marked as good to go.
  • Nodes just store the code on disk with some pruning after getting updates.

Reasoning:

  • All validators need the code, so the current optimized distribution via statements and requests is actually a well suited architecture. The only issue is the timing constraints in statement distribution and that we don't want the data on chain.
  • By notification/requesting/queue management we get nice load balancing and efficient distribution which can continue in the background for how long it takes.
  • Also collators can be part of that gossip and request code from validators who claim to have it.
  • No incentivisation should be needed, as load will be evenly distributed and we assume, we have a significant amount of validators which have an interest in the network to work.

Open questions:

  • Apparently we might also want to be able to push a parachain upgrade from the relay chain to the parachains. I guess that can work in a similar way, by just storing the hash on chain and have the system of I have code messages do the rest.
  • We don't check whether collators actually have the code update, which is probably fine when the code upgrade originated from the parachain, but might be an issue if it is triggered by the relay chain.
  • In that vein: We might want a variant of those I have code messages protocol for full nodes, where those messages are not signed, so runtime upgrades can also be propagated by any non validator node on the network (reducing load on validators). Validators would not even need to be part of that gossip, so this could be implemented in a strictly load reducing way.
@eskimor
Copy link
Member Author

eskimor commented May 5, 2021

@burdges
Copy link

burdges commented May 5, 2021

"I have code" is not enough, although it's maybe a valid optimization.

A malicious chain can upgrade its code hash but then not give any honest validators the code. At this point, the malicious chain can attempt repeatedly to sneak through a bad block, either blocks survive due to early DoS attacks on approval checkers, or else approval checkers all no-show due to lacking code and we abandon the block. It's similarly problematic if WASM crashes when building the code.

We need to put code into the availability system, and eventually recover code from availability. It follows code should be distributed as parachain blocks, with backers and approval checkers doing builds. All this ensure some validator claims the code builds correctly

@eskimor
Copy link
Member Author

eskimor commented May 6, 2021

Not sure I can follow. If not enough validators sign off that they have the code, the code update will never go live, so a malicious actor not providing the code can do no harm at all. The system described above is actually rather similar to the availability system, except that we don't chunk the data as all validators need the full data anyways.

So in summary, code gets distributed and only when enough validators said (with a signed message) that they have the code (similar to bitfields), the update can go live at some point, if not, it can not. In that case it can either be abandoned after some timeout or just overwritten by another attempt.

About the second problem: The code distributed is already wasm, so it does not need to be built - just compiled for performance reasons maybe. If that code is faulty, then after upgrade the parachain won't be able to make any more progress. This is no different to what we have now though - we also just put the code on chain, without any further verification (that I am aware of). I am also not sure if there is a meaningful way of verifying the code ahead of time (apart from sanity checks). On the other hand, It should not be necessary, as the code comes from execution of the PVF executed by validators, which we trust to do sane stuff, if executed faithfully (which we do check).

@burdges
Copy link

burdges commented May 6, 2021

Yes, all validators having the full code before voting works fine for parachains. A priori, we'll likely want erasure coded distribution for parathreads however since some validators never run some parathreads, except we'll likely run parathreads in large pools all with identical code.

@rphmeier
Copy link
Contributor

rphmeier commented May 7, 2021

Some discussion of validator-set hand-offs (session change) is also required

@eskimor
Copy link
Member Author

eskimor commented Nov 29, 2021

With contextual execution statements are out of the "hot path" - will no longer need to propagate within 2 seconds, thus is issue becomes mostly about reducing load on chain storage.

@rphmeier
Copy link
Contributor

rphmeier commented Jun 2, 2022

In my opinion, maybe the best way to do this is the following:

  • Have the candidate receipt commit to the new code hash
  • Have some on-chain logic in the paras module where anyone can post the code corresponding to the upcoming code hash for free. This precedes other checks like the mandatory waiting time and the PVF pre-checking logic.
  • Once it's posted, we do other checks and eventually give a UpgradeGoAhead::GoAhead after other checks have passed.
  • If nobody posts the code within a certain amount of time (configurable by governance, somewhere like an hour or a day, then UpgradegoAhead::Abort is given.

This doesn't impact parachain liveness, as the parachain continues to operate with the old code until the new code is posted on-chain (and other checks pass and the UpgradeGoAhead::GoAhead is given by the relay-chain logic). The parachain PVF is responsible for only outputting hashes which it's reasonably certain someone in the community can post on the relay chain. Parachains can even give rewards for doing so. If a parachain outputs a code upgrade hash that doesn't have a preimage posted on-chain, then it's just shooting itself in the foot by preventing an upgrade for some amount of time. Cost to the relay-chain is minimal.

This is a very simple solution because it punts the problem of actually delivering the code to the relay chain completely off-chain. Parachain operators can manually post it but would eventually have bots to do it for them. And we don't need to implement a complex gossip protocol.

With asynchronous backing (formerly contextual execution) this issue is also about reducing statement-distribution code complexity as we could completely kill the large-statement codepaths as a result. That's a major net benefit.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. I5-enhancement An additional feature request. and removed I10-optimisation labels Aug 25, 2023
@the-right-joyce the-right-joyce added the I6-meta A specific issue for grouping tasks or bugs of a specific category. label Oct 11, 2023
claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
paritytech#971)

* reward relayer from asset hub sovereign account that matches asset hub

* fix tests

* updates sovereign account

* refactor

* fixes

* updates template parachain sovereign account

---------

Co-authored-by: claravanstaden <Cats 4 life!>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. I6-meta A specific issue for grouping tasks or bugs of a specific category. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants