Get parachain runtime/code upgrades off chain #971

eskimor · 2021-05-05T09:03:57Z

Right now we distribute a parachain runtime upgrade via the relay chain and statement distribution, which cannot really work long term as statement distribution is time critical and parachain runtime upgrades can be large, to make matters worse - that large data is transferred twice in a time critical path - once in statements and then in the distributed relay chain block. Storing large amounts of data in chain storage is also not good for performance.

Plan for resolving this:

Don't distribute code changes in statements, but only the hash.
Have a separate asynchronous gossip protocol with messages "I have code" - this message should be signed by the validator.
Other validators can request code updates from validators who sent them "I have code" messages (just as we do in statement distribution right now).
Validators keep track of those received "I have code messages" and check their signature.
Once enough signed "I have code messages" have been gathered on chain, a runtime upgrade is marked as good to go.
Nodes just store the code on disk with some pruning after getting updates.

Reasoning:

All validators need the code, so the current optimized distribution via statements and requests is actually a well suited architecture. The only issue is the timing constraints in statement distribution and that we don't want the data on chain.
By notification/requesting/queue management we get nice load balancing and efficient distribution which can continue in the background for how long it takes.
Also collators can be part of that gossip and request code from validators who claim to have it.
No incentivisation should be needed, as load will be evenly distributed and we assume, we have a significant amount of validators which have an interest in the network to work.

Open questions:

Apparently we might also want to be able to push a parachain upgrade from the relay chain to the parachains. I guess that can work in a similar way, by just storing the hash on chain and have the system of I have code messages do the rest.
We don't check whether collators actually have the code update, which is probably fine when the code upgrade originated from the parachain, but might be an issue if it is triggered by the relay chain.
In that vein: We might want a variant of those I have code messages protocol for full nodes, where those messages are not signed, so runtime upgrades can also be propagated by any non validator node on the network (reducing load on validators). Validators would not even need to be part of that gossip, so this could be implemented in a strictly load reducing way.

The text was updated successfully, but these errors were encountered:

eskimor · 2021-05-05T09:05:48Z

@burdges @FatemeShirazi @rphmeier

burdges · 2021-05-05T18:11:46Z

"I have code" is not enough, although it's maybe a valid optimization.

A malicious chain can upgrade its code hash but then not give any honest validators the code. At this point, the malicious chain can attempt repeatedly to sneak through a bad block, either blocks survive due to early DoS attacks on approval checkers, or else approval checkers all no-show due to lacking code and we abandon the block. It's similarly problematic if WASM crashes when building the code.

We need to put code into the availability system, and eventually recover code from availability. It follows code should be distributed as parachain blocks, with backers and approval checkers doing builds. All this ensure some validator claims the code builds correctly

eskimor · 2021-05-06T07:24:12Z

Not sure I can follow. If not enough validators sign off that they have the code, the code update will never go live, so a malicious actor not providing the code can do no harm at all. The system described above is actually rather similar to the availability system, except that we don't chunk the data as all validators need the full data anyways.

So in summary, code gets distributed and only when enough validators said (with a signed message) that they have the code (similar to bitfields), the update can go live at some point, if not, it can not. In that case it can either be abandoned after some timeout or just overwritten by another attempt.

About the second problem: The code distributed is already wasm, so it does not need to be built - just compiled for performance reasons maybe. If that code is faulty, then after upgrade the parachain won't be able to make any more progress. This is no different to what we have now though - we also just put the code on chain, without any further verification (that I am aware of). I am also not sure if there is a meaningful way of verifying the code ahead of time (apart from sanity checks). On the other hand, It should not be necessary, as the code comes from execution of the PVF executed by validators, which we trust to do sane stuff, if executed faithfully (which we do check).

burdges · 2021-05-06T08:38:02Z

Yes, all validators having the full code before voting works fine for parachains. A priori, we'll likely want erasure coded distribution for parathreads however since some validators never run some parathreads, except we'll likely run parathreads in large pools all with identical code.

rphmeier · 2021-05-07T17:04:35Z

Some discussion of validator-set hand-offs (session change) is also required

eskimor · 2021-11-29T11:15:00Z

With contextual execution statements are out of the "hot path" - will no longer need to propagate within 2 seconds, thus is issue becomes mostly about reducing load on chain storage.

rphmeier · 2022-06-02T19:52:10Z

In my opinion, maybe the best way to do this is the following:

Have the candidate receipt commit to the new code hash
Have some on-chain logic in the paras module where anyone can post the code corresponding to the upcoming code hash for free. This precedes other checks like the mandatory waiting time and the PVF pre-checking logic.
Once it's posted, we do other checks and eventually give a UpgradeGoAhead::GoAhead after other checks have passed.
If nobody posts the code within a certain amount of time (configurable by governance, somewhere like an hour or a day, then UpgradegoAhead::Abort is given.

This doesn't impact parachain liveness, as the parachain continues to operate with the old code until the new code is posted on-chain (and other checks pass and the UpgradeGoAhead::GoAhead is given by the relay-chain logic). The parachain PVF is responsible for only outputting hashes which it's reasonably certain someone in the community can post on the relay chain. Parachains can even give rewards for doing so. If a parachain outputs a code upgrade hash that doesn't have a preimage posted on-chain, then it's just shooting itself in the foot by preventing an upgrade for some amount of time. Cost to the relay-chain is minimal.

This is a very simple solution because it punts the problem of actually delivering the code to the relay chain completely off-chain. Parachain operators can manually post it but would eventually have bots to do it for them. And we don't need to implement a complex gossip protocol.

With asynchronous backing (formerly contextual execution) this issue is also about reducing statement-distribution code complexity as we could completely kill the large-statement codepaths as a result. That's a major net benefit.

paritytech#971) * reward relayer from asset hub sovereign account that matches asset hub * fix tests * updates sovereign account * refactor * fixes * updates template parachain sovereign account --------- Co-authored-by: claravanstaden <Cats 4 life!>

…tablished (#971)

eskimor added J0-enhancement labels May 5, 2021

eskimor mentioned this issue May 17, 2021

Fetch parachain runtime code from the relay chain #78

Open

rphmeier mentioned this issue Jun 6, 2022

PVF should accept data based on preimages #811

Open

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. I5-enhancement An additional feature request. and removed I10-optimisation labels Aug 25, 2023

the-right-joyce added the I6-meta A specific issue for grouping tasks or bugs of a specific category. label Oct 11, 2023

the-right-joyce added this to parachains team board Oct 11, 2023

the-right-joyce moved this to Backlog in parachains team board Oct 11, 2023

bkchr pushed a commit that referenced this issue Apr 10, 2024

in auto-relays keep trying to connect to nodes until connection is es…

f8f8f42

…tablished (#971)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get parachain runtime/code upgrades off chain #971

Get parachain runtime/code upgrades off chain #971

eskimor commented May 5, 2021 •

edited

Loading

eskimor commented May 5, 2021

burdges commented May 5, 2021

eskimor commented May 6, 2021 •

edited

Loading

burdges commented May 6, 2021

rphmeier commented May 7, 2021

eskimor commented Nov 29, 2021

rphmeier commented Jun 2, 2022 •

edited

Loading

Get parachain runtime/code upgrades off chain #971

Get parachain runtime/code upgrades off chain #971

Comments

eskimor commented May 5, 2021 • edited Loading

eskimor commented May 5, 2021

burdges commented May 5, 2021

eskimor commented May 6, 2021 • edited Loading

burdges commented May 6, 2021

rphmeier commented May 7, 2021

eskimor commented Nov 29, 2021

rphmeier commented Jun 2, 2022 • edited Loading

eskimor commented May 5, 2021 •

edited

Loading

eskimor commented May 6, 2021 •

edited

Loading

rphmeier commented Jun 2, 2022 •

edited

Loading