diff --git a/roadmap/implementers-guide/src/SUMMARY.md b/roadmap/implementers-guide/src/SUMMARY.md index 7d3d9138a793..bcf87aad8a49 100644 --- a/roadmap/implementers-guide/src/SUMMARY.md +++ b/roadmap/implementers-guide/src/SUMMARY.md @@ -10,6 +10,7 @@ - [Chain Selection and Finalization](protocol-chain-selection.md) - [Architecture Overview](architecture.md) - [Messaging Overview](messaging.md) + - [PVF Pre-checking](pvf-prechecking.md) - [Runtime Architecture](runtime/README.md) - [`Initializer` Module](runtime/initializer.md) - [`Configuration` Module](runtime/configuration.md) @@ -34,6 +35,7 @@ - [Candidate Events](runtime-api/candidate-events.md) - [Disputes Info](runtime-api/disputes-info.md) - [Candidates Included](runtime-api/candidates-included.md) + - [PVF Pre-checking](runtime-api/pvf-prechecking.md) - [Node Architecture](node/README.md) - [Subsystems and Jobs](node/subsystems-and-jobs.md) - [Overseer](node/overseer.md) @@ -66,6 +68,7 @@ - [Runtime API Requests](node/utility/runtime-api.md) - [Chain API Requests](node/utility/chain-api.md) - [Chain Selection Request](node/utility/chain-selection.md) + - [PVF Pre-Checking](node/utility/pvf-prechecker.md) - [Data Structures and Types](types/README.md) - [Candidate](types/candidate.md) - [Backing](types/backing.md) @@ -77,6 +80,7 @@ - [Network](types/network.md) - [Approvals](types/approval.md) - [Disputes](types/disputes.md) + - [PVF Pre-checking](types/pvf-prechecking.md) [Glossary](glossary.md) [Further Reading](further-reading.md) diff --git a/roadmap/implementers-guide/src/node/utility/candidate-validation.md b/roadmap/implementers-guide/src/node/utility/candidate-validation.md index c34672368c32..5393368c5c6b 100644 --- a/roadmap/implementers-guide/src/node/utility/candidate-validation.md +++ b/roadmap/implementers-guide/src/node/utility/candidate-validation.md @@ -12,15 +12,19 @@ Output: Validation result via the provided response side-channel. ## Functionality -This subsystem answers two types of requests: one which draws out validation data from the state, and another which accepts all validation data exhaustively. The goal of both request types is to validate a candidate. There are three possible outputs of validation: either the candidate is valid, the candidate is invalid, or an internal error occurred. Whatever the end result is, it will be returned on the response channel to the requestor. +This subsystem groups the requests it handles in two categories: *candidate validation* and *PVF pre-checking*. -Parachain candidates are validated against their validation function: A piece of Wasm code that is describes the state-transition of the parachain. Validation function execution is not metered. This means that an execution which is an infinite loop or simply takes too long must be forcibly exited by some other means. For this reason, we recommend dispatching candidate validation to be done on subprocesses which can be killed if they time-out. +The first category can be further subdivided in two request types: one which draws out validation data from the state, and another which accepts all validation data exhaustively. Validation returns three possible outcomes on the response channel: the candidate is valid, the candidate is invalid, or an internal error occurred. + +Parachain candidates are validated against their validation function: A piece of Wasm code that describes the state-transition of the parachain. Validation function execution is not metered. This means that an execution which is an infinite loop or simply takes too long must be forcibly exited by some other means. For this reason, we recommend dispatching candidate validation to be done on subprocesses which can be killed if they time-out. Upon receiving a validation request, the first thing the candidate validation subsystem should do is make sure it has all the necessary parameters to the validation function. These are: * The Validation Function itself. * The [`CandidateDescriptor`](../../types/candidate.md#candidatedescriptor). * The [`ValidationData`](../../types/candidate.md#validationdata). * The [`PoV`](../../types/availability.md#proofofvalidity). + +The second category is for PVF pre-checking. This is primarly used by the [PVF pre-checker](pvf-prechecker.md) subsystem. ### Determining Parameters diff --git a/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md b/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md new file mode 100644 index 000000000000..5b46bcf30e27 --- /dev/null +++ b/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md @@ -0,0 +1,17 @@ +# PVF Pre-checker + +The PVF pre-checker is a subsystem that is responsible for watching the relay chain for new PVFs that require pre-checking. Head over to [overview] for the PVF pre-checking process overview. + +## Protocol + +There is no dedicated input mechanism for PVF pre-checker. Instead, PVF pre-checker looks on the `ActiveLeavesUpdate` event stream for work. + +This subsytem does not produce any output messages either. The subsystem will, however, send messages to the [Runtime API] subsystem to query for the pending PVFs and to submit votes. In addition to that, it will also communicate with [Candidate Validation] Subsystem to request PVF pre-check. + +## Functionality + +TODO: Write up the description of the functionality of the PVF pre-checker. https://github.com/paritytech/polkadot/issues/4611 + +[overview]: ../../pvf-prechecking.md +[Runtime API]: runtime-api.md +[Candidate Validation]: candidate-validation.md diff --git a/roadmap/implementers-guide/src/pvf-prechecking.md b/roadmap/implementers-guide/src/pvf-prechecking.md new file mode 100644 index 000000000000..8cbc9294d172 --- /dev/null +++ b/roadmap/implementers-guide/src/pvf-prechecking.md @@ -0,0 +1,50 @@ +# PVF Pre-checking Overview + +> ⚠️ This discusses a mechanism that is currently not under-development. Follow the progress under [#3211]. + +## Motivation + +Parachains' and parathreads' validation function is described by a wasm module that we refer to as a PVF. Since it's a wasm module the typical way of executing it is to compile it to machine code. Typically an optimizing compiler consists of algorithms that are able to optimize the resulting machine code heavily. However, while those algorithms perform quite well for a typical wasm code produced by standard toolchains (e.g. rustc/LLVM), those algorithms can be abused to consume a lot of resources. Moreover, since those algorithms are rather complex there is a lot of room for a bug that can crash the compiler. + +If compilation of a Parachain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in limbo as to whether a candidate of that parachain is valid or not. + +The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF. + +This has the following implications: +- In backing, inclusion may be slow due to backing groups being unable to execute the block +- In approval checking, there may be many no-shows, leading to slow finality +- In disputes, neither side may reach supermajority. Nobody will get slashed and the chain will not be reverted or finalized. + +As a result of this issue we need a fairly hard guarantee that the PVFs of registered parachains/threads can be compiled within a reasonable amount of time. + +## Solution + +The problem is solved by having a pre-checking process which is run when a new validation code is included in the chain. A new PVF can be added in two cases: + +- A new parachain or parathread is registered. +- An existing parachain or parathread signalled an upgrade of its validation code. + +Before any of those processes finish, the PVF pre-checking vote is initiated. The PVF pre-checking vote is identified by the PVF code hash that is being voted on. If there is already PVF pre-checking process running, then no +new PVF pre-checking vote will be started. Instead, the process just subscribes to the existing vote. + +The pre-checking vote can be concluded either by obtaining a supermajority or if it expires. + +Each validator checks the list of PVFs available for voting. The vote is binary, i.e. accept or reject a given PVF. As soon as the supermajority of votes are collected for one of the sides of the vote, the voting is concluded in that direction and the effects of the voting are enacted. + +Only validators from the active set can participate in the vote. The set of active validators can change each session. That's why we reset the votes each session. A voting that observed a certain number of sessions will be rejected. + +The effects of the PVF accepting depend on the operations requested it: + +1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session boundaries they will be onboarded. +1. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing process. Upgrades with pre-checking are really the same process that is just delayed by the time required for pre-checking voting. In case of instant approval the mechanism is exactly the same. + +In case, PVF pre-checking process was concluded with rejection, then all the requesting operations get cancelled. For onboarding it means it gets without movement: the lifecycle of such parachain is terminated on the `Onboarding` state and after rejection the lifecycle is none. That in turn means that the caller can attempt registering the parachain once more. For upgrading it means that the upgrade process is aborted: that flashes go-ahead signal with `Abort` flag. Rejection leads to removing the allegedly bad validation code from the chain storage. + +The logic described above is implemented by the [paras] module. + +On the node-side, there is a PVF pre-checking [subsystem][pvf-prechecker-subsystem] that scans the chain for new PVFs via using [runtime APIs][pvf-runtime-api]. Upon finding a new PVF, the subsystem will initiate a PVF pre-checking request and wait for the result. Whenever the result is obtained, the subsystem will use the [runtime API][pvf-runtime-api] to submit a vote for the PVF. The vote is an unsigned transaction. The vote will be distributed via the gossip similarly to a normal transaction. Eventually a block producer will include the vote into the block where it will be handled by the [runtime][paras]. + +[#3211]: https://github.com/paritytech/polkadot/issues/3211 +[paras]: runtime/paras.md +[pvf-runtime-api]: runtime-api/pvf-prechecking.md +[pvf-prechecker-subsystem]: node/utility/pvf-prechecker.md diff --git a/roadmap/implementers-guide/src/runtime-api/pvf-prechecking.md b/roadmap/implementers-guide/src/runtime-api/pvf-prechecking.md new file mode 100644 index 000000000000..c74232367bff --- /dev/null +++ b/roadmap/implementers-guide/src/runtime-api/pvf-prechecking.md @@ -0,0 +1,22 @@ +# PVF Pre-checking + +> ⚠️ This runtime API was added in v2. + +There are two main runtime APIs to work with PVF pre-checking. + +The first runtime API is designed to fetch all PVFs that require pre-checking voting. The PVFs are +identified by their code hashes. As soon as the PVF gains required support, the runtime API will +not return the PVF anymore. + +```rust +fn pvfs_require_precheck() -> Vec; +``` + +The second runtime API is needed to submit the judgement for a PVF, whether it is approved or not. +The voting process uses unsigned transactions. The [`PvfCheckStatement`](../types/pvf-prechecking.md) is circulated through the network via gossip similar to a normal transaction. At some point the validator +will include the statement in the block, where it will be processed by the runtime. If that was the +last vote before gaining the super-majority, this PVF will not be returned by `pvfs_require_precheck` anymore. + +```rust +fn submit_pvf_check_statement(stmt: PvfCheckStatement, signature: ValidatorSignature); +``` diff --git a/roadmap/implementers-guide/src/runtime/configuration.md b/roadmap/implementers-guide/src/runtime/configuration.md index 37e5202429e1..739352b202b3 100644 --- a/roadmap/implementers-guide/src/runtime/configuration.md +++ b/roadmap/implementers-guide/src/runtime/configuration.md @@ -12,29 +12,51 @@ The configuration module is responsible for two main pieces of storage. /// The current configuration to be used. Configuration: HostConfiguration; /// A pending configuration to be applied on session change. -PendingConfiguration: Option; +PendingConfigs: Vec<(SessionIndex, HostConfiguration)>; +/// A flag that says if the consistency checks should be omitted. +BypassConsistencyCheck: bool; ``` ## Session change -The session change routine for the Configuration module is simple. If the `PendingConfiguration` is `Some`, take its value and set `Configuration` to be equal to it. Reset `PendingConfiguration` to `None`. +The session change routine works as follows: + +- If there is no pending configurations, then return early. +- Take all pending configurations that are less than or equal to the current session index. + - Get the pending configuration with the highest session index and apply it to the current configuration. Discard the earlier ones if any. ## Routines ```rust +enum InconsistentErrror { + // ... +} + +impl HostConfiguration { + fn check_consistency(&self) -> Result<(), InconsistentError> { /* ... */ } +} + /// Get the host configuration. pub fn configuration() -> HostConfiguration { Configuration::get() } -/// Updating the pending configuration to be applied later. -fn update_configuration(f: impl FnOnce(&mut HostConfiguration)) { - PendingConfiguration::mutate(|pending| { - let mut x = pending.unwrap_or_else(Self::configuration); - f(&mut x); - *pending = Some(x); - }) -} +/// Schedules updating the host configuration. The update is given by the `updater` closure. The +/// closure takes the current version of the configuration and returns the new version. +/// Returns an `Err` if the closure returns a broken configuration. However, there are a couple of +/// exceptions: +/// +/// - if the configuration that was passed in the closure is already broken, then it will pass the +/// update: you cannot break something that is already broken. +/// - If the `BypassConsistencyCheck` flag is set, then the checks will be skipped. +/// +/// The changes made by this function will always be scheduled at session X, where X is the current session index + 2. +/// If there is already a pending update for X, then the closure will receive the already pending configuration for +/// session X. +/// +/// If there is already a pending update for the current session index + 1, then it won't be touched. Otherwise, +/// that would violate the promise of this function that changes will be applied on the second session change (cur + 2). +fn schedule_config_update(updater: impl FnOnce(&mut HostConfiguration)) -> DispatchResult ``` ## Entry-points diff --git a/roadmap/implementers-guide/src/runtime/paras.md b/roadmap/implementers-guide/src/runtime/paras.md index 1bdc38684acd..c2e3a2dceec6 100644 --- a/roadmap/implementers-guide/src/runtime/paras.md +++ b/roadmap/implementers-guide/src/runtime/paras.md @@ -5,8 +5,11 @@ parachains and parathreads cannot change except at session boundaries and after session has passed. This is primarily to ensure that the number and meaning of bits required for the availability bitfields does not change except at session boundaries. -It's also responsible for managing parachain validation code upgrades as well as maintaining -availability of old parachain code and its pruning. +It's also responsible for: + +- managing parachain validation code upgrades as well as maintaining availability of old parachain +code and its pruning. +- vetting PVFs by means of the PVF pre-checking mechanism. ## Storage @@ -38,13 +41,6 @@ pub struct ParaPastCodeMeta { last_pruned: Option, } -enum UseCodeAt { - // Use the current code. - Current, - // Use the code that was replaced at the given block number. - ReplacedAt(BlockNumber), -} - struct ParaGenesisArgs { /// The initial head-data to use. genesis_head: HeadData, @@ -71,18 +67,49 @@ pub enum ParaLifecycle { /// Parachain is being offboarded. OutgoingParachain, } + +enum PvfCheckCause { + /// PVF vote was initiated by the initial onboarding process of the given para. + Onboarding(ParaId), + /// PVF vote was initiated by signalling of an upgrade by the given para. + Upgrade { + /// The ID of the parachain that initiated or is waiting for the conclusion of pre-checking. + id: ParaId, + /// The relay-chain block number that was used as the relay-parent for the parablock that + /// initiated the upgrade. + relay_parent_number: BlockNumber, + }, +} + +struct PvfCheckActiveVoteState { + // The two following vectors have their length equal to the number of validators in the active + // set. They start with all zeroes. A 1 is set at an index when the validator at the that index + // makes a vote. Once a 1 is set for either of the vectors, that validator cannot vote anymore. + // Since the active validator set changes each session, the bit vectors are reinitialized as + // well: zeroed and resized so that each validator gets its own bit. + votes_accept: BitVec, + votes_reject: BitVec, + + /// The number of session changes this PVF vote has observed. Therefore, this number is + /// increased at each session boundary. When created, it is initialized with 0. + age: SessionIndex, + /// The block number at which this PVF vote was created. + created_at: BlockNumber, + /// A list of causes for this PVF pre-checking. Has at least one. + causes: Vec, +} ``` #### Para Lifecycle -Because the state of parachains and parathreads are delayed by a session, we track the specific -state of the para using the `ParaLifecycle` enum. +Because the state changes of parachains and parathreads are delayed, we track the specific state of +the para using the `ParaLifecycle` enum. ``` None Parathread Parachain + + + | | | - | (2 Session Delay) | | + | (≈2 Session Delay) | | | | | +----------------------->+ | | Onboarding | | @@ -105,11 +132,21 @@ None Parathread Parachain + + + ``` +Note that if PVF pre-checking is enabled, onboarding of a para may potentially be delayed. This can +happen due to PVF pre-checking voting concluding late. + During the transition period, the para object is still considered in its existing state. ### Storage Layout ```rust +/// All currently active PVF pre-checking votes. +/// +/// Invariant: +/// - There are no PVF pre-checking votes that exists in list but not in the set and vice versa. +PvfActiveVoteMap: map ValidationCodeHash => PvfCheckActiveVoteState; +/// The list of all currently active PVF votes. Auxiliary to `PvfActiveVoteMap`. +PvfActiveVoteList: Vec; /// All parachains. Ordered ascending by ParaId. Parathreads are not included. Parachains: Vec, /// The current lifecycle state of all known Para Ids. @@ -169,6 +206,9 @@ UpcomingUpgrades: Vec<(ParaId, T::BlockNumber)>; /// The actions to perform during the start of a specific session index. ActionsQueue: map SessionIndex => Vec; /// Upcoming paras instantiation arguments. +/// +/// NOTE that after PVF pre-checking is enabled the para genesis arg will have it's code set +/// to empty. Instead, the code will be saved into the storage right away via `CodeByHash`. UpcomingParasGenesis: map ParaId => Option; /// The number of references on the validation code in `CodeByHash` storage. CodeByHashRefs: map ValidationCodeHash => u32; @@ -194,8 +234,13 @@ CodeByHash: map ValidationCodeHash => Option `ParaLifecycle`. 1. Downgrade all parachains that should become parathreads, updating the `Parachains` list and `ParaLifecycle`. - 1. Return list of outgoing paras to the initializer for use by other modules. - + 1. (Deferred) Return list of outgoing paras to the initializer for use by other modules. +1. Go over all active PVF pre-checking votes: + 1. Increment `age` of the vote. + 1. If `age` reached `cfg.pvf_voting_ttl`, then enact PVF rejection and remove the vote from the active list. + 1. Otherwise, reinitialize the ballots. + 1. Resize the `votes_accept`/`votes_reject` to have the same length as the incoming validator set. + 1. Zero all the votes. ## Initialization 1. Do pruning based on all entries in `PastCodePruning` with `BlockNumber <= now`. Update the @@ -211,9 +256,10 @@ CodeByHash: map ValidationCodeHash => Option * `schedule_para_cleanup(ParaId)`: Schedule a para to be cleaned up after the next full session. * `schedule_parathread_upgrade(ParaId)`: Schedule a parathread to be upgraded to a parachain. * `schedule_parachain_downgrade(ParaId)`: Schedule a parachain to be downgraded to a parathread. -* `schedule_code_upgrade(ParaId, CurrentCode, relay_parent: BlockNumber, HostConfiguration)`: Schedule a future code - upgrade of the given parachain, to be applied after inclusion of a block of the same parachain +* `schedule_code_upgrade(ParaId, new_code, relay_parent: BlockNumber, HostConfiguration)`: Schedule a future code + upgrade of the given parachain. In case the PVF pre-checking is disabled, or the new code is already present in the storage, the upgrade will be applied after inclusion of a block of the same parachain executed in the context of a relay-chain block with number >= `relay_parent + config.validation_upgrade_delay`. If the upgrade is scheduled `UpgradeRestrictionSignal` is set and it will remain set until `relay_parent + config.validation_upgrade_frequency`. +In case the PVF pre-checking is enabled, or the new code is not already present in the storage, then the PVF pre-checking run will be scheduled for that validation code. If the pre-checking concludes with rejection, then the upgrade is canceled. Otherwise, after pre-checking is concluded the upgrade will be scheduled and be enacted as described above. * `note_new_head(ParaId, HeadData, BlockNumber)`: note that a para has progressed to a new head, where the new head was executed in the context of a relay-chain block with given number. This will apply pending code upgrades based on the block number provided. If an upgrade took place it will clear the `UpgradeGoAheadSignal`. @@ -225,6 +271,7 @@ CodeByHash: map ValidationCodeHash => Option * `is_valid_para(ParaId) -> bool`: Returns true if the para ID references either a live parathread or live parachain. * `can_upgrade_validation_code(ParaId) -> bool`: Returns true if the given para can signal code upgrade right now. +* `pvfs_require_prechecking() -> Vec`: Returns the list of PVF validation code hashes that require PVF pre-checking votes. ## Finalization diff --git a/roadmap/implementers-guide/src/types/overseer-protocol.md b/roadmap/implementers-guide/src/types/overseer-protocol.md index 61a874697835..0030cc1786f9 100644 --- a/roadmap/implementers-guide/src/types/overseer-protocol.md +++ b/roadmap/implementers-guide/src/types/overseer-protocol.md @@ -70,6 +70,7 @@ enum AllMessages { GossipSupport(GossipSupportMessage), DisputeCoordinator(DisputeCoordinatorMessage), ChainSelection(ChainSelectionMessage), + PvfChecker(PvfCheckerMessage), } ``` @@ -751,6 +752,25 @@ Various modules request that the [Candidate Validation subsystem](../node/utilit ```rust +/// The outcome of the candidate-validation's PVF pre-check request. +pub enum PreCheckOutcome { + /// The PVF has been compiled successfully within the given constraints. + Valid, + /// The PVF could not be compiled. This variant is used when the candidate-validation subsystem + /// can be sure that the PVF is invalid. To give a couple of examples: a PVF that cannot be + /// decompressed or that does not represent a structurally valid WebAssembly file. + Invalid, + /// This variant is used when the PVF cannot be compiled but for other reasons that are not + /// included into [`PreCheckOutcome::Invalid`]. This variant can indicate that the PVF in + /// question is invalid, however it is not necessary that PVF that received this judgement + /// is invalid. + /// + /// For example, if during compilation the preparation worker was killed we cannot be sure why + /// it happened: because the PVF was malicious made the worker to use too much memory or its + /// because the host machine is under severe memory pressure and it decided to kill the worker. + Failed, +} + /// Result of the validation of the candidate. enum ValidationResult { /// Candidate is valid, and here are the outputs and the validation data used to form inputs. @@ -805,9 +825,29 @@ pub enum CandidateValidationMessage { Duration, // Execution timeout. oneshot::Sender>, ), + /// Try to compile the given validation code and send back + /// the outcome. + /// + /// The validation code is specified by the hash and will be queried from the runtime API at the + /// given relay-parent. + PreCheck( + // Relay-parent + Hash, + ValidationCodeHash, + oneshot::Sender, + ), } ``` +## PVF Pre-checker Message + +Currently, the PVF pre-checker subsystem receives no specific messages. + +```rust +/// Non-instantiable message type +pub enum PvfCheckerMessage { } +``` + [NBE]: ../network.md#network-bridge-event [AvailabilityDistributionV1NetworkMessage]: network.md#availability-distribution-v1 [BitfieldDistributionV1NetworkMessage]: network.md#bitfield-distribution-v1 diff --git a/roadmap/implementers-guide/src/types/pvf-prechecking.md b/roadmap/implementers-guide/src/types/pvf-prechecking.md new file mode 100644 index 000000000000..ec560a7f584f --- /dev/null +++ b/roadmap/implementers-guide/src/types/pvf-prechecking.md @@ -0,0 +1,22 @@ +# PVF Pre-checking types + +> ⚠️ This type was added in v2. + +One of the main units of information on which PVF pre-checking voting is build is the `PvfCheckStatement`. + +This is a statement by the validator who ran the pre-checking process for a PVF. A PVF is identified by the `ValidationCodeHash`. + +The statement is valid only during a single session, specified in the `session_index`. + +```rust +struct PvfCheckStatement { + /// `true` if the subject passed pre-checking and `false` otherwise. + pub accept: bool, + /// The validation code hash that was checked. + pub subject: ValidationCodeHash, + /// The index of a session during which this statement is considered valid. + pub session_index: SessionIndex, + /// The index of the validator from which this statement originates. + pub validator_index: ValidatorIndex, +} +``` diff --git a/roadmap/implementers-guide/src/types/runtime.md b/roadmap/implementers-guide/src/types/runtime.md index 345f1902d3b3..f72f902e36dc 100644 --- a/roadmap/implementers-guide/src/types/runtime.md +++ b/roadmap/implementers-guide/src/types/runtime.md @@ -127,6 +127,6 @@ struct ParaInherentData { bitfields: Bitfields, backed_candidates: BackedCandidates, dispute_statements: MultiDisputeStatementSet, - parent_header: Header + parent_header: Header } ```