From 01495327a117089401043174ebfb8e442437770a Mon Sep 17 00:00:00 2001 From: Marcin S Date: Tue, 15 Nov 2022 09:48:25 -0500 Subject: [PATCH 1/9] Add PVF module documentation TODO (once the PRs land): - [ ] Document executor parametrization. - [ ] Document CPU time measurement of timeouts. --- node/core/pvf/src/executor_intf.rs | 2 +- node/core/pvf/src/lib.rs | 57 ++++++++++++++++++---- node/core/pvf/src/priority.rs | 2 +- roadmap/implementers-guide/src/glossary.md | 1 - 4 files changed, 49 insertions(+), 13 deletions(-) diff --git a/node/core/pvf/src/executor_intf.rs b/node/core/pvf/src/executor_intf.rs index 6827fb636eac..bbeb6195e1dc 100644 --- a/node/core/pvf/src/executor_intf.rs +++ b/node/core/pvf/src/executor_intf.rs @@ -96,7 +96,7 @@ pub fn prevalidate(code: &[u8]) -> Result Result, sc_executor_common::error::WasmError> { sc_executor_wasmtime::prepare_runtime_artifact(blob, &CONFIG.semantics) } diff --git a/node/core/pvf/src/lib.rs b/node/core/pvf/src/lib.rs index ef5f31889237..4742d7789cea 100644 --- a/node/core/pvf/src/lib.rs +++ b/node/core/pvf/src/lib.rs @@ -18,16 +18,21 @@ //! A crate that implements PVF validation host. //! +//! # Entrypoint +//! //! This crate provides a simple API. You first [`start`] the validation host, which gives you the //! [handle][`ValidationHost`] and the future you need to poll. //! -//! Then using the handle the client can send two types of requests: +//! Then using the handle the client can send three types of requests: +//! +//! (a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it (verify and +//! compile) in order to pre-check its validity. //! -//! (a) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`] +//! (b) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`] //! and the PVF [code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF //! with the `params`. //! -//! (b) Heads up. This request allows to signal that the given PVF may be needed soon and that it +//! (c) Heads up. This request allows to signal that the given PVF may be needed soon and that it //! should be prepared for execution. //! //! The preparation results are cached for some time after they either used or was signaled in heads up. @@ -39,15 +44,43 @@ //! PVF execution requests can specify the [priority][`Priority`] with which the given request should //! be handled. Different priority levels have different effects. This is discussed below. //! -//! Preparation started by a heads up signal always starts in with the background priority. If there +//! Preparation started by a heads up signal always starts with the background priority. If there //! is already a request for that PVF preparation under way the priority is inherited. If after heads //! up, a new PVF execution request comes in with a higher priority, then the original task's priority //! will be adjusted to match the new one if it's larger. //! //! Priority can never go down, only up. //! +//! # Mitigating disputes +//! +//! ## Retrying execution requests +//! +//! If the execution request fails during **preparation**, we will retry if it is possible that the +//! preparation error was transient (i.e. it was of type [`PrepareError::Panic`], +//! [`PrepareError::TimedOut`], or [`PrepareError::DidNotMakeIt`]). We will only retry preparation +//! if another requests comes in after 15 minutes, to ensure any potential transient conditions had +//! time to be resolved. We will retry up to 5 times. See `can_retry_prepare_after_failure`. +//! +//! If the actual **execution** of the artifact fails, we will retry once if it was an +//! [`InvalidCandidate::AmbiguousWorkerDeath`] error, after a 1 second delay to allow any potential +//! transient conditions to clear. This occurs outside this module, in the Candidate Validation +//! subsystem. +//! +//! ## Preparation timeouts +//! +//! We use a timeout for preparation to limit the amount of time it can take. As the time for +//! preparation can vary depending on the machine and load on the machine, this can potentially lead +//! to disputes where some validators are able to execute a PVF and others aren't. +//! +//! One mitigation we have in place is a more lenient timeout for preparation during execution than +//! during pre-checking. The rationale is that the PVF has already passed pre-checking, so we know +//! it should be valid, and we allow it to take longer than expected as this is likely due to an +//! issue with the machine and not the PVF. +//! //! # Under the hood //! +//! ## The flow +//! //! Under the hood, the validation host is built using a bunch of communicating processes, not //! dissimilar to actors. Each of such "processes" is a future task that contains an event loop that //! processes incoming messages, potentially delegating sub-tasks to other "processes". @@ -55,11 +88,13 @@ //! Two of these processes are queues. The first one is for preparation jobs and the second one is for //! execution. Both of the queues are backed by separate pools of workers of different kind. //! -//! Preparation workers handle preparation requests by preverifying and instrumenting PVF wasm code, +//! Preparation workers handle preparation requests by prevalidating and instrumenting PVF wasm code, //! and then passing it into the compiler, to prepare the artifact. //! -//! Artifact is a final product of preparation. If the preparation succeeded, then the artifact will -//! contain the compiled code usable for quick execution by a worker later on. +//! ## Artifacts +//! +//! An artifact is the final product of preparation. If the preparation succeeded, then the artifact +//! will contain the compiled code usable for quick execution by a worker later on. //! //! If the preparation failed, then the worker will still write the artifact with the error message. //! We save the artifact with the error so that we don't try to prepare the artifacts that are broken @@ -68,12 +103,14 @@ //! The artifact is saved on disk and is also tracked by an in memory table. This in memory table //! doesn't contain the artifact contents though, only a flag that the given artifact is compiled. //! +//! Each fixed interval of time a pruning task will run. This task will remove all artifacts that +//! weren't used or received a heads up signal for a while. +//! +//! ## Execution +//! //! The execute workers will be fed by the requests from the execution queue, which is basically a //! combination of a path to the compiled artifact and the //! [`params`][`polkadot_parachain::primitives::ValidationParams`]. -//! -//! Each fixed interval of time a pruning task will run. This task will remove all artifacts that -//! weren't used or received a heads up signal for a while. mod artifacts; mod error; diff --git a/node/core/pvf/src/priority.rs b/node/core/pvf/src/priority.rs index de169be0696b..b80c9195832a 100644 --- a/node/core/pvf/src/priority.rs +++ b/node/core/pvf/src/priority.rs @@ -24,7 +24,7 @@ pub enum Priority { Normal, /// This priority is used for requests that are required to be processed as soon as possible. /// - /// For example, backing is on critical path and require execution as soon as possible. + /// For example, backing is on a critical path and requires execution as soon as possible. Critical, } diff --git a/roadmap/implementers-guide/src/glossary.md b/roadmap/implementers-guide/src/glossary.md index 8612d8834cb8..ed6a358be6da 100644 --- a/roadmap/implementers-guide/src/glossary.md +++ b/roadmap/implementers-guide/src/glossary.md @@ -48,4 +48,3 @@ exactly one downward message queue. Also of use is the [Substrate Glossary](https://substrate.dev/docs/en/knowledgebase/getting-started/glossary). [0]: https://wiki.polkadot.network/docs/learn-consensus -[1]: #pvf From 82beedc8084b1a3d0b9bf44f4b1da47f43a8406f Mon Sep 17 00:00:00 2001 From: Marcin S Date: Tue, 15 Nov 2022 10:52:15 -0500 Subject: [PATCH 2/9] Update node/core/pvf/src/lib.rs Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com> --- node/core/pvf/src/lib.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/node/core/pvf/src/lib.rs b/node/core/pvf/src/lib.rs index 4742d7789cea..59f455fedee6 100644 --- a/node/core/pvf/src/lib.rs +++ b/node/core/pvf/src/lib.rs @@ -103,7 +103,7 @@ //! The artifact is saved on disk and is also tracked by an in memory table. This in memory table //! doesn't contain the artifact contents though, only a flag that the given artifact is compiled. //! -//! Each fixed interval of time a pruning task will run. This task will remove all artifacts that +//! A pruning task will run at a fixed interval of time. This task will remove all artifacts that //! weren't used or received a heads up signal for a while. //! //! ## Execution From 79e319639c068c4e21212cb9ab1bac3204016f98 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Wed, 16 Nov 2022 08:59:27 -0500 Subject: [PATCH 3/9] Clarify meaning of PVF acronym --- node/core/pvf/src/lib.rs | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/node/core/pvf/src/lib.rs b/node/core/pvf/src/lib.rs index 59f455fedee6..9c8455f4b3e4 100644 --- a/node/core/pvf/src/lib.rs +++ b/node/core/pvf/src/lib.rs @@ -16,7 +16,10 @@ #![warn(missing_docs)] -//! A crate that implements PVF validation host. +//! A crate that implements the PVF (Parachain Validation Function) validation host. +//! +//! For more background to PVFs, refer to the [Implementer's Guide: PVF +//! Pre-checking](https://paritytech.github.io/polkadot/book/pvf-prechecking.html). //! //! # Entrypoint //! From c8f7b53314360b93473bce5f68817038e33a6e57 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Thu, 17 Nov 2022 07:19:02 -0500 Subject: [PATCH 4/9] Move PVF doc to implementer's guide --- node/core/pvf/src/lib.rs | 99 +--------------- roadmap/implementers-guide/src/SUMMARY.md | 1 + .../src/node/utility/pvf.md | 110 ++++++++++++++++++ .../src/types/overseer-protocol.md | 4 +- 4 files changed, 114 insertions(+), 100 deletions(-) create mode 100644 roadmap/implementers-guide/src/node/utility/pvf.md diff --git a/node/core/pvf/src/lib.rs b/node/core/pvf/src/lib.rs index 9c8455f4b3e4..5767d2b5820e 100644 --- a/node/core/pvf/src/lib.rs +++ b/node/core/pvf/src/lib.rs @@ -16,104 +16,9 @@ #![warn(missing_docs)] -//! A crate that implements the PVF (Parachain Validation Function) validation host. +//! A crate that implements the PVF validation host. //! -//! For more background to PVFs, refer to the [Implementer's Guide: PVF -//! Pre-checking](https://paritytech.github.io/polkadot/book/pvf-prechecking.html). -//! -//! # Entrypoint -//! -//! This crate provides a simple API. You first [`start`] the validation host, which gives you the -//! [handle][`ValidationHost`] and the future you need to poll. -//! -//! Then using the handle the client can send three types of requests: -//! -//! (a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it (verify and -//! compile) in order to pre-check its validity. -//! -//! (b) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`] -//! and the PVF [code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF -//! with the `params`. -//! -//! (c) Heads up. This request allows to signal that the given PVF may be needed soon and that it -//! should be prepared for execution. -//! -//! The preparation results are cached for some time after they either used or was signaled in heads up. -//! All requests that depends on preparation of the same PVF are bundled together and will be executed -//! as soon as the artifact is prepared. -//! -//! # Priority -//! -//! PVF execution requests can specify the [priority][`Priority`] with which the given request should -//! be handled. Different priority levels have different effects. This is discussed below. -//! -//! Preparation started by a heads up signal always starts with the background priority. If there -//! is already a request for that PVF preparation under way the priority is inherited. If after heads -//! up, a new PVF execution request comes in with a higher priority, then the original task's priority -//! will be adjusted to match the new one if it's larger. -//! -//! Priority can never go down, only up. -//! -//! # Mitigating disputes -//! -//! ## Retrying execution requests -//! -//! If the execution request fails during **preparation**, we will retry if it is possible that the -//! preparation error was transient (i.e. it was of type [`PrepareError::Panic`], -//! [`PrepareError::TimedOut`], or [`PrepareError::DidNotMakeIt`]). We will only retry preparation -//! if another requests comes in after 15 minutes, to ensure any potential transient conditions had -//! time to be resolved. We will retry up to 5 times. See `can_retry_prepare_after_failure`. -//! -//! If the actual **execution** of the artifact fails, we will retry once if it was an -//! [`InvalidCandidate::AmbiguousWorkerDeath`] error, after a 1 second delay to allow any potential -//! transient conditions to clear. This occurs outside this module, in the Candidate Validation -//! subsystem. -//! -//! ## Preparation timeouts -//! -//! We use a timeout for preparation to limit the amount of time it can take. As the time for -//! preparation can vary depending on the machine and load on the machine, this can potentially lead -//! to disputes where some validators are able to execute a PVF and others aren't. -//! -//! One mitigation we have in place is a more lenient timeout for preparation during execution than -//! during pre-checking. The rationale is that the PVF has already passed pre-checking, so we know -//! it should be valid, and we allow it to take longer than expected as this is likely due to an -//! issue with the machine and not the PVF. -//! -//! # Under the hood -//! -//! ## The flow -//! -//! Under the hood, the validation host is built using a bunch of communicating processes, not -//! dissimilar to actors. Each of such "processes" is a future task that contains an event loop that -//! processes incoming messages, potentially delegating sub-tasks to other "processes". -//! -//! Two of these processes are queues. The first one is for preparation jobs and the second one is for -//! execution. Both of the queues are backed by separate pools of workers of different kind. -//! -//! Preparation workers handle preparation requests by prevalidating and instrumenting PVF wasm code, -//! and then passing it into the compiler, to prepare the artifact. -//! -//! ## Artifacts -//! -//! An artifact is the final product of preparation. If the preparation succeeded, then the artifact -//! will contain the compiled code usable for quick execution by a worker later on. -//! -//! If the preparation failed, then the worker will still write the artifact with the error message. -//! We save the artifact with the error so that we don't try to prepare the artifacts that are broken -//! repeatedly. -//! -//! The artifact is saved on disk and is also tracked by an in memory table. This in memory table -//! doesn't contain the artifact contents though, only a flag that the given artifact is compiled. -//! -//! A pruning task will run at a fixed interval of time. This task will remove all artifacts that -//! weren't used or received a heads up signal for a while. -//! -//! ## Execution -//! -//! The execute workers will be fed by the requests from the execution queue, which is basically a -//! combination of a path to the compiled artifact and the -//! [`params`][`polkadot_parachain::primitives::ValidationParams`]. +//! This is responsible for handling requests to prepare and execute PVF code blobs. mod artifacts; mod error; diff --git a/roadmap/implementers-guide/src/SUMMARY.md b/roadmap/implementers-guide/src/SUMMARY.md index bcf87aad8a49..ca7be6e805f0 100644 --- a/roadmap/implementers-guide/src/SUMMARY.md +++ b/roadmap/implementers-guide/src/SUMMARY.md @@ -60,6 +60,7 @@ - [Utility Subsystems](node/utility/README.md) - [Availability Store](node/utility/availability-store.md) - [Candidate Validation](node/utility/candidate-validation.md) + - [PVF](node/utility/pvf.md) - [Provisioner](node/utility/provisioner.md) - [Network Bridge](node/utility/network-bridge.md) - [Gossip Support](node/utility/gossip-support.md) diff --git a/roadmap/implementers-guide/src/node/utility/pvf.md b/roadmap/implementers-guide/src/node/utility/pvf.md new file mode 100644 index 000000000000..201f0ef02309 --- /dev/null +++ b/roadmap/implementers-guide/src/node/utility/pvf.md @@ -0,0 +1,110 @@ +# PVF + +The `pvf` module is responsible for handling preparation and execution subtasks +for PVF code blobs. + +## Entrypoint + +This crate provides a simple API. You first [`start`] the validation host, which +gives you the [handle][`ValidationHost`] and the future you need to poll. + +Then using the handle the client can send three types of requests: + +(a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it +(verify and compile) in order to pre-check its validity. + +(b) PVF execution. This accepts the PVF +[`params`][`polkadot_parachain::primitives::ValidationParams`] and the PVF +[code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF +with the `params`. + +(c) Heads up. This request allows to signal that the given PVF may be needed +soon and that it should be prepared for execution. + +The preparation results are cached for some time after they either used or was +signaled in heads up. All requests that depends on preparation of the same PVF +are bundled together and will be executed as soon as the artifact is prepared. + +## Priority + +PVF execution requests can specify the [priority][`Priority`] with which the +given request should be handled. Different priority levels have different +effects. This is discussed below. + +Preparation started by a heads up signal always starts with the background +priority. If there is already a request for that PVF preparation under way the +priority is inherited. If after heads up, a new PVF execution request comes in +with a higher priority, then the original task's priority will be adjusted to +match the new one if it's larger. + +Priority can never go down, only up. + +## Mitigating disputes + +### Retrying execution requests + +If the execution request fails during **preparation**, we will retry if it is +possible that the preparation error was transient (i.e. it was of type +[`PrepareError::Panic`], [`PrepareError::TimedOut`], or +[`PrepareError::DidNotMakeIt`]). We will only retry preparation if another +requests comes in after 15 minutes, to ensure any potential transient conditions +had time to be resolved. We will retry up to 5 times. See +`can_retry_prepare_after_failure`. + +If the actual **execution** of the artifact fails, we will retry once if it was +an [`InvalidCandidate::AmbiguousWorkerDeath`] error, after a 1 second delay to +allow any potential transient conditions to clear. This occurs outside this +module, in the Candidate Validation subsystem. + +### Preparation timeouts + +We use a timeout for preparation to limit the amount of time it can take. As the +time for preparation can vary depending on the machine and load on the machine, +this can potentially lead to disputes where some validators are able to execute +a PVF and others aren't. + +One mitigation we have in place is a more lenient timeout for preparation during +execution than during pre-checking. The rationale is that the PVF has already +passed pre-checking, so we know it should be valid, and we allow it to take +longer than expected as this is likely due to an issue with the machine and not +the PVF. + +## Under the hood + +### The flow + +Under the hood, the validation host is built using a bunch of communicating +processes, not dissimilar to actors. Each of such "processes" is a future task +that contains an event loop that processes incoming messages, potentially +delegating sub-tasks to other "processes". + +Two of these processes are queues. The first one is for preparation jobs and the +second one is for execution. Both of the queues are backed by separate pools of +workers of different kind. + +Preparation workers handle preparation requests by prevalidating and +instrumenting PVF wasm code, and then passing it into the compiler, to prepare +the artifact. + +### Artifacts + +An artifact is the final product of preparation. If the preparation succeeded, +then the artifact will contain the compiled code usable for quick execution by a +worker later on. + +If the preparation failed, then the worker will still write the artifact with +the error message. We save the artifact with the error so that we don't try to +prepare the artifacts that are broken repeatedly. + +The artifact is saved on disk and is also tracked by an in memory table. This in +memory table doesn't contain the artifact contents though, only a flag that the +given artifact is compiled. + +A pruning task will run at a fixed interval of time. This task will remove all +artifacts that weren't used or received a heads up signal for a while. + +### Execution + +The execute workers will be fed by the requests from the execution queue, which +is basically a combination of a path to the compiled artifact and the +[`params`][`polkadot_parachain::primitives::ValidationParams`]. diff --git a/roadmap/implementers-guide/src/types/overseer-protocol.md b/roadmap/implementers-guide/src/types/overseer-protocol.md index 4b9dc97c27e2..adf04d74b2b7 100644 --- a/roadmap/implementers-guide/src/types/overseer-protocol.md +++ b/roadmap/implementers-guide/src/types/overseer-protocol.md @@ -681,9 +681,7 @@ enum ProvisionerMessage { The Runtime API subsystem is responsible for providing an interface to the state of the chain's runtime. -This is fueled by an auxiliary type encapsulating all request types defined in the Runtime API section of the guide. - -> To do: link to the Runtime API section. Not possible currently because of https://github.com/Michael-F-Bryan/mdbook-linkcheck/issues/25. Once v0.7.1 is released it will work. +This is fueled by an auxiliary type encapsulating all request types defined in the [Runtime API section](../../runtime-api) of the guide. ```rust enum RuntimeApiRequest { From 55c11839a5b91bbe0dc9b18f6e2c3bee8341d416 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Thu, 17 Nov 2022 08:10:33 -0500 Subject: [PATCH 5/9] Clean up implementer's guide a bit --- roadmap/implementers-guide/README.md | 1 - .../implementers-guide/src/node/utility/pvf-prechecker.md | 6 +++--- roadmap/implementers-guide/src/types/chain.md | 2 ++ roadmap/implementers-guide/src/types/overseer-protocol.md | 2 +- roadmap/implementers-guide/src/types/pvf-prechecking.md | 2 ++ 5 files changed, 8 insertions(+), 5 deletions(-) diff --git a/roadmap/implementers-guide/README.md b/roadmap/implementers-guide/README.md index 7f3f8cef7e63..da31cf363e04 100644 --- a/roadmap/implementers-guide/README.md +++ b/roadmap/implementers-guide/README.md @@ -22,7 +22,6 @@ Then install and build the book: ```sh cargo install mdbook mdbook-linkcheck mdbook-graphviz mdbook-mermaid mdbook-last-changed mdbook serve roadmap/implementers-guide -open http://localhost:3000 ``` ## Specification diff --git a/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md b/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md index b0f58346da99..fd75ce9e3804 100644 --- a/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md +++ b/roadmap/implementers-guide/src/node/utility/pvf-prechecker.md @@ -12,11 +12,11 @@ This subsytem does not produce any output messages either. The subsystem will, h If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of the PVFs that are relevant for the subsystem. -To be relevant for the subsystem, a PVF must be returned by `pvfs_require_precheck` [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant. +To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant. When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for the pre-check. -Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements. +Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements. Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves. @@ -28,4 +28,4 @@ If the node is not in the active validator set, it will still perform all the ch [Runtime API]: runtime-api.md [PVF pre-checking runtime API]: ../../runtime-api/pvf-prechecking.md [Candidate Validation]: candidate-validation.md -[`PvfCheckStatement`]: ../../types/pvf-prechecking.md +[PvfCheckStatement]: ../../types/pvf-prechecking.md#pvfcheckstatement diff --git a/roadmap/implementers-guide/src/types/chain.md b/roadmap/implementers-guide/src/types/chain.md index e8ec6cea8f4a..f5a9093bca9e 100644 --- a/roadmap/implementers-guide/src/types/chain.md +++ b/roadmap/implementers-guide/src/types/chain.md @@ -2,6 +2,8 @@ Types pertaining to the relay-chain - events, structures, etc. +TODO: These no longer exist. + ## Block Import Event ```rust diff --git a/roadmap/implementers-guide/src/types/overseer-protocol.md b/roadmap/implementers-guide/src/types/overseer-protocol.md index adf04d74b2b7..ad66d0132788 100644 --- a/roadmap/implementers-guide/src/types/overseer-protocol.md +++ b/roadmap/implementers-guide/src/types/overseer-protocol.md @@ -681,7 +681,7 @@ enum ProvisionerMessage { The Runtime API subsystem is responsible for providing an interface to the state of the chain's runtime. -This is fueled by an auxiliary type encapsulating all request types defined in the [Runtime API section](../../runtime-api) of the guide. +This is fueled by an auxiliary type encapsulating all request types defined in the [Runtime API section](../runtime-api) of the guide. ```rust enum RuntimeApiRequest { diff --git a/roadmap/implementers-guide/src/types/pvf-prechecking.md b/roadmap/implementers-guide/src/types/pvf-prechecking.md index 331429bd1fc5..f68f1e60feee 100644 --- a/roadmap/implementers-guide/src/types/pvf-prechecking.md +++ b/roadmap/implementers-guide/src/types/pvf-prechecking.md @@ -1,5 +1,7 @@ # PVF Pre-checking types +## `PvfCheckStatement` + > ⚠️ This type was added in v2. One of the main units of information on which PVF pre-checking voting is build is the `PvfCheckStatement`. From 8b1f3b72c45d4c73bc12581f87e51c0091660fb1 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Thu, 17 Nov 2022 08:59:12 -0500 Subject: [PATCH 6/9] Add page for PVF types --- roadmap/implementers-guide/src/SUMMARY.md | 1 + .../src/node/utility/pvf.md | 26 +++++++------ .../implementers-guide/src/types/candidate.md | 16 ++++++++ roadmap/implementers-guide/src/types/pvf.md | 37 +++++++++++++++++++ 4 files changed, 69 insertions(+), 11 deletions(-) create mode 100644 roadmap/implementers-guide/src/types/pvf.md diff --git a/roadmap/implementers-guide/src/SUMMARY.md b/roadmap/implementers-guide/src/SUMMARY.md index ca7be6e805f0..c2552c12b8d3 100644 --- a/roadmap/implementers-guide/src/SUMMARY.md +++ b/roadmap/implementers-guide/src/SUMMARY.md @@ -72,6 +72,7 @@ - [PVF Pre-Checking](node/utility/pvf-prechecker.md) - [Data Structures and Types](types/README.md) - [Candidate](types/candidate.md) + - [PVF](types/pvf.md) - [Backing](types/backing.md) - [Availability](types/availability.md) - [Overseer and Subsystem Protocol](types/overseer-protocol.md) diff --git a/roadmap/implementers-guide/src/node/utility/pvf.md b/roadmap/implementers-guide/src/node/utility/pvf.md index 201f0ef02309..8a4f2aef944e 100644 --- a/roadmap/implementers-guide/src/node/utility/pvf.md +++ b/roadmap/implementers-guide/src/node/utility/pvf.md @@ -5,17 +5,16 @@ for PVF code blobs. ## Entrypoint -This crate provides a simple API. You first [`start`] the validation host, which -gives you the [handle][`ValidationHost`] and the future you need to poll. +This crate provides a simple API. You first `start` the validation host, which +gives you the [handle][ValidationHost] and the future you need to poll. Then using the handle the client can send three types of requests: -(a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it +(a) PVF pre-checking. This takes the PVF [code][Pvf] and tries to prepare it (verify and compile) in order to pre-check its validity. -(b) PVF execution. This accepts the PVF -[`params`][`polkadot_parachain::primitives::ValidationParams`] and the PVF -[code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF +(b) PVF execution. This accepts the PVF [`params`][ValidationParams] and the PVF +[code][Pvf], prepares (verifies and compiles) the code, and then executes PVF with the `params`. (c) Heads up. This request allows to signal that the given PVF may be needed @@ -27,7 +26,7 @@ are bundled together and will be executed as soon as the artifact is prepared. ## Priority -PVF execution requests can specify the [priority][`Priority`] with which the +PVF execution requests can specify the [priority][Priority] with which the given request should be handled. Different priority levels have different effects. This is discussed below. @@ -45,14 +44,14 @@ Priority can never go down, only up. If the execution request fails during **preparation**, we will retry if it is possible that the preparation error was transient (i.e. it was of type -[`PrepareError::Panic`], [`PrepareError::TimedOut`], or -[`PrepareError::DidNotMakeIt`]). We will only retry preparation if another +`PrepareError::Panic`, `PrepareError::TimedOut`, or +`PrepareError::DidNotMakeIt`). We will only retry preparation if another requests comes in after 15 minutes, to ensure any potential transient conditions had time to be resolved. We will retry up to 5 times. See `can_retry_prepare_after_failure`. If the actual **execution** of the artifact fails, we will retry once if it was -an [`InvalidCandidate::AmbiguousWorkerDeath`] error, after a 1 second delay to +an `InvalidCandidate::AmbiguousWorkerDeath` error, after a 1 second delay to allow any potential transient conditions to clear. This occurs outside this module, in the Candidate Validation subsystem. @@ -107,4 +106,9 @@ artifacts that weren't used or received a heads up signal for a while. The execute workers will be fed by the requests from the execution queue, which is basically a combination of a path to the compiled artifact and the -[`params`][`polkadot_parachain::primitives::ValidationParams`]. +[`params`][ValidationParams]. + +[ValidationHost]: ../../types/pvf.md#validationhost +[Pvf]: ../../types/pvf.md#pvf +[ValidationParams]: ../../types/candidate.md#validationparams +[Priority]: ../../types/pvf.md#priority diff --git a/roadmap/implementers-guide/src/types/candidate.md b/roadmap/implementers-guide/src/types/candidate.md index b9d5900b7f17..baad5b07e6cd 100644 --- a/roadmap/implementers-guide/src/types/candidate.md +++ b/roadmap/implementers-guide/src/types/candidate.md @@ -92,6 +92,22 @@ struct CandidateDescriptor { } ``` +## `ValidationParams` + +```rust +/// Validation parameters for evaluating the parachain validity function. +pub struct ValidationParams { + /// Previous head-data. + pub parent_head: HeadData, + /// The collation body. + pub block_data: BlockData, + /// The current relay-chain block number. + pub relay_parent_number: RelayChainBlockNumber, + /// The relay-chain block's storage root. + pub relay_parent_storage_root: Hash, +} +``` + ## `PersistedValidationData` The validation data provides information about how to create the inputs for validation of a candidate. This information is derived from the chain state and will vary from para to para, although some of the fields may be the same for every para. diff --git a/roadmap/implementers-guide/src/types/pvf.md b/roadmap/implementers-guide/src/types/pvf.md new file mode 100644 index 000000000000..08197963a371 --- /dev/null +++ b/roadmap/implementers-guide/src/types/pvf.md @@ -0,0 +1,37 @@ +# PVF Types + +## `ValidationHost` + +```rust +/// A handle to the async process serving the validation host requests. +pub struct ValidationHost { + to_host_tx: mpsc::Sender, +} +``` + +## `Pvf` + +```rust +/// A struct that carries code of a parachain validation function and its hash. +pub struct Pvf { + pub(crate) code: Arc>, + pub(crate) code_hash: ValidationCodeHash, +} +``` + +## `Priority` + +```rust +/// A priority assigned to execution of a PVF. +pub enum Priority { + /// Normal priority for things that do not require immediate response, but still need to be + /// done pretty quick. + /// + /// Approvals and disputes fall into this category. + Normal, + /// This priority is used for requests that are required to be processed as soon as possible. + /// + /// For example, backing is on a critical path and requires execution as soon as possible. + Critical, +} +``` From 175504ec215d2a1b26bb8cd14fb9d7472fa39e78 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Tue, 22 Nov 2022 16:41:52 -0500 Subject: [PATCH 7/9] pvf: Better separation between crate docs and implementer's guide --- node/core/pvf/src/lib.rs | 72 ++++++++++- roadmap/implementers-guide/README.md | 6 + roadmap/implementers-guide/src/SUMMARY.md | 2 - .../src/node/utility/candidate-validation.md | 35 ++++++ .../src/node/utility/pvf.md | 114 ------------------ roadmap/implementers-guide/src/types/pvf.md | 37 ------ 6 files changed, 112 insertions(+), 154 deletions(-) delete mode 100644 roadmap/implementers-guide/src/node/utility/pvf.md delete mode 100644 roadmap/implementers-guide/src/types/pvf.md diff --git a/node/core/pvf/src/lib.rs b/node/core/pvf/src/lib.rs index 5767d2b5820e..1aabb1100437 100644 --- a/node/core/pvf/src/lib.rs +++ b/node/core/pvf/src/lib.rs @@ -18,7 +18,77 @@ //! A crate that implements the PVF validation host. //! -//! This is responsible for handling requests to prepare and execute PVF code blobs. +//! For more background, refer to the Implementer's Guide: [PVF +//! Pre-checking](https://paritytech.github.io/polkadot/book/pvf-prechecking.html) and [Candidate +//! Validation](https://paritytech.github.io/polkadot/book/node/utility/candidate-validation.html#pvf-host). +//! +//! # Entrypoint +//! +//! This crate provides a simple API. You first [`start`] the validation host, which gives you the +//! [handle][`ValidationHost`] and the future you need to poll. +//! +//! Then using the handle the client can send three types of requests: +//! +//! (a) PVF pre-checking. This takes the PVF [code][`Pvf`] and tries to prepare it (verify and +//! compile) in order to pre-check its validity. +//! +//! (b) PVF execution. This accepts the PVF [`params`][`polkadot_parachain::primitives::ValidationParams`] +//! and the PVF [code][`Pvf`], prepares (verifies and compiles) the code, and then executes PVF +//! with the `params`. +//! +//! (c) Heads up. This request allows to signal that the given PVF may be needed soon and that it +//! should be prepared for execution. +//! +//! The preparation results are cached for some time after they either used or was signaled in heads up. +//! All requests that depends on preparation of the same PVF are bundled together and will be executed +//! as soon as the artifact is prepared. +//! +//! # Priority +//! +//! PVF execution requests can specify the [priority][`Priority`] with which the given request should +//! be handled. Different priority levels have different effects. This is discussed below. +//! +//! Preparation started by a heads up signal always starts with the background priority. If there +//! is already a request for that PVF preparation under way the priority is inherited. If after heads +//! up, a new PVF execution request comes in with a higher priority, then the original task's priority +//! will be adjusted to match the new one if it's larger. +//! +//! Priority can never go down, only up. +//! +//! # Under the hood +//! +//! ## The flow +//! +//! Under the hood, the validation host is built using a bunch of communicating processes, not +//! dissimilar to actors. Each of such "processes" is a future task that contains an event loop that +//! processes incoming messages, potentially delegating sub-tasks to other "processes". +//! +//! Two of these processes are queues. The first one is for preparation jobs and the second one is for +//! execution. Both of the queues are backed by separate pools of workers of different kind. +//! +//! Preparation workers handle preparation requests by prevalidating and instrumenting PVF wasm code, +//! and then passing it into the compiler, to prepare the artifact. +//! +//! ## Artifacts +//! +//! An artifact is the final product of preparation. If the preparation succeeded, then the artifact +//! will contain the compiled code usable for quick execution by a worker later on. +//! +//! If the preparation failed, then the worker will still write the artifact with the error message. +//! We save the artifact with the error so that we don't try to prepare the artifacts that are broken +//! repeatedly. +//! +//! The artifact is saved on disk and is also tracked by an in memory table. This in memory table +//! doesn't contain the artifact contents though, only a flag that the given artifact is compiled. +//! +//! A pruning task will run at a fixed interval of time. This task will remove all artifacts that +//! weren't used or received a heads up signal for a while. +//! +//! ## Execution +//! +//! The execute workers will be fed by the requests from the execution queue, which is basically a +//! combination of a path to the compiled artifact and the +//! [`params`][`polkadot_parachain::primitives::ValidationParams`]. mod artifacts; mod error; diff --git a/roadmap/implementers-guide/README.md b/roadmap/implementers-guide/README.md index da31cf363e04..996041f176bb 100644 --- a/roadmap/implementers-guide/README.md +++ b/roadmap/implementers-guide/README.md @@ -24,6 +24,12 @@ cargo install mdbook mdbook-linkcheck mdbook-graphviz mdbook-mermaid mdbook-last mdbook serve roadmap/implementers-guide ``` +and in a second terminal window run: + +```sh +open http://localhost:3000 +``` + ## Specification See also the Polkadot specification [hosted](https://spec.polkadot.network/), and its [source](https://github.com/w3f/polkadot-spec). diff --git a/roadmap/implementers-guide/src/SUMMARY.md b/roadmap/implementers-guide/src/SUMMARY.md index c2552c12b8d3..bcf87aad8a49 100644 --- a/roadmap/implementers-guide/src/SUMMARY.md +++ b/roadmap/implementers-guide/src/SUMMARY.md @@ -60,7 +60,6 @@ - [Utility Subsystems](node/utility/README.md) - [Availability Store](node/utility/availability-store.md) - [Candidate Validation](node/utility/candidate-validation.md) - - [PVF](node/utility/pvf.md) - [Provisioner](node/utility/provisioner.md) - [Network Bridge](node/utility/network-bridge.md) - [Gossip Support](node/utility/gossip-support.md) @@ -72,7 +71,6 @@ - [PVF Pre-Checking](node/utility/pvf-prechecker.md) - [Data Structures and Types](types/README.md) - [Candidate](types/candidate.md) - - [PVF](types/pvf.md) - [Backing](types/backing.md) - [Availability](types/availability.md) - [Overseer and Subsystem Protocol](types/overseer-protocol.md) diff --git a/roadmap/implementers-guide/src/node/utility/candidate-validation.md b/roadmap/implementers-guide/src/node/utility/candidate-validation.md index 5393368c5c6b..07d7c09bf2f2 100644 --- a/roadmap/implementers-guide/src/node/utility/candidate-validation.md +++ b/roadmap/implementers-guide/src/node/utility/candidate-validation.md @@ -48,4 +48,39 @@ Once we have all parameters, we can spin up a background task to perform the val If we can assume the presence of the relay-chain state (that is, during processing [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run at the inclusion time thus confirming that the candidate will be accepted. +### PVF Host + +The PVF host is responsible for handling requests to prepare and execute PVF +code blobs. + +One high-level goal is to make PVF operations as deterministic as possible, to +reduce the rate of disputes. Disputes can happen due to e.g. a job timing out on +one machine, but not another. While we do not yet have full determinism, there +are some dispute reduction mechanisms in place right now. + +#### Retrying execution requests + +If the execution request fails during **preparation**, we will retry if it is +possible that the preparation error was transient (e.g. if the error was a panic +or time out). We will only retry preparation if another request comes in after +15 minutes, to ensure any potential transient conditions had time to be +resolved. We will retry up to 5 times. + +If the actual **execution** of the artifact fails, we will retry once if it was +an ambiguous error after a 1 second delay, to allow any potential transient +conditions to clear. + +#### Preparation timeouts + +We use timeouts for both preparation and execution jobs to limit the amount of +time they can take. As the time for a job can vary depending on the machine and +load on the machine, this can potentially lead to disputes where some validators +successfuly execute a PVF and others don't. + +One mitigation we have in place is a more lenient timeout for preparation during +execution than during pre-checking. The rationale is that the PVF has already +passed pre-checking, so we know it should be valid, and we allow it to take +longer than expected, as this is likely due to an issue with the machine and not +the PVF. + [CVM]: ../../types/overseer-protocol.md#validationrequesttype diff --git a/roadmap/implementers-guide/src/node/utility/pvf.md b/roadmap/implementers-guide/src/node/utility/pvf.md deleted file mode 100644 index 8a4f2aef944e..000000000000 --- a/roadmap/implementers-guide/src/node/utility/pvf.md +++ /dev/null @@ -1,114 +0,0 @@ -# PVF - -The `pvf` module is responsible for handling preparation and execution subtasks -for PVF code blobs. - -## Entrypoint - -This crate provides a simple API. You first `start` the validation host, which -gives you the [handle][ValidationHost] and the future you need to poll. - -Then using the handle the client can send three types of requests: - -(a) PVF pre-checking. This takes the PVF [code][Pvf] and tries to prepare it -(verify and compile) in order to pre-check its validity. - -(b) PVF execution. This accepts the PVF [`params`][ValidationParams] and the PVF -[code][Pvf], prepares (verifies and compiles) the code, and then executes PVF -with the `params`. - -(c) Heads up. This request allows to signal that the given PVF may be needed -soon and that it should be prepared for execution. - -The preparation results are cached for some time after they either used or was -signaled in heads up. All requests that depends on preparation of the same PVF -are bundled together and will be executed as soon as the artifact is prepared. - -## Priority - -PVF execution requests can specify the [priority][Priority] with which the -given request should be handled. Different priority levels have different -effects. This is discussed below. - -Preparation started by a heads up signal always starts with the background -priority. If there is already a request for that PVF preparation under way the -priority is inherited. If after heads up, a new PVF execution request comes in -with a higher priority, then the original task's priority will be adjusted to -match the new one if it's larger. - -Priority can never go down, only up. - -## Mitigating disputes - -### Retrying execution requests - -If the execution request fails during **preparation**, we will retry if it is -possible that the preparation error was transient (i.e. it was of type -`PrepareError::Panic`, `PrepareError::TimedOut`, or -`PrepareError::DidNotMakeIt`). We will only retry preparation if another -requests comes in after 15 minutes, to ensure any potential transient conditions -had time to be resolved. We will retry up to 5 times. See -`can_retry_prepare_after_failure`. - -If the actual **execution** of the artifact fails, we will retry once if it was -an `InvalidCandidate::AmbiguousWorkerDeath` error, after a 1 second delay to -allow any potential transient conditions to clear. This occurs outside this -module, in the Candidate Validation subsystem. - -### Preparation timeouts - -We use a timeout for preparation to limit the amount of time it can take. As the -time for preparation can vary depending on the machine and load on the machine, -this can potentially lead to disputes where some validators are able to execute -a PVF and others aren't. - -One mitigation we have in place is a more lenient timeout for preparation during -execution than during pre-checking. The rationale is that the PVF has already -passed pre-checking, so we know it should be valid, and we allow it to take -longer than expected as this is likely due to an issue with the machine and not -the PVF. - -## Under the hood - -### The flow - -Under the hood, the validation host is built using a bunch of communicating -processes, not dissimilar to actors. Each of such "processes" is a future task -that contains an event loop that processes incoming messages, potentially -delegating sub-tasks to other "processes". - -Two of these processes are queues. The first one is for preparation jobs and the -second one is for execution. Both of the queues are backed by separate pools of -workers of different kind. - -Preparation workers handle preparation requests by prevalidating and -instrumenting PVF wasm code, and then passing it into the compiler, to prepare -the artifact. - -### Artifacts - -An artifact is the final product of preparation. If the preparation succeeded, -then the artifact will contain the compiled code usable for quick execution by a -worker later on. - -If the preparation failed, then the worker will still write the artifact with -the error message. We save the artifact with the error so that we don't try to -prepare the artifacts that are broken repeatedly. - -The artifact is saved on disk and is also tracked by an in memory table. This in -memory table doesn't contain the artifact contents though, only a flag that the -given artifact is compiled. - -A pruning task will run at a fixed interval of time. This task will remove all -artifacts that weren't used or received a heads up signal for a while. - -### Execution - -The execute workers will be fed by the requests from the execution queue, which -is basically a combination of a path to the compiled artifact and the -[`params`][ValidationParams]. - -[ValidationHost]: ../../types/pvf.md#validationhost -[Pvf]: ../../types/pvf.md#pvf -[ValidationParams]: ../../types/candidate.md#validationparams -[Priority]: ../../types/pvf.md#priority diff --git a/roadmap/implementers-guide/src/types/pvf.md b/roadmap/implementers-guide/src/types/pvf.md deleted file mode 100644 index 08197963a371..000000000000 --- a/roadmap/implementers-guide/src/types/pvf.md +++ /dev/null @@ -1,37 +0,0 @@ -# PVF Types - -## `ValidationHost` - -```rust -/// A handle to the async process serving the validation host requests. -pub struct ValidationHost { - to_host_tx: mpsc::Sender, -} -``` - -## `Pvf` - -```rust -/// A struct that carries code of a parachain validation function and its hash. -pub struct Pvf { - pub(crate) code: Arc>, - pub(crate) code_hash: ValidationCodeHash, -} -``` - -## `Priority` - -```rust -/// A priority assigned to execution of a PVF. -pub enum Priority { - /// Normal priority for things that do not require immediate response, but still need to be - /// done pretty quick. - /// - /// Approvals and disputes fall into this category. - Normal, - /// This priority is used for requests that are required to be processed as soon as possible. - /// - /// For example, backing is on a critical path and requires execution as soon as possible. - Critical, -} -``` From e90c68cc53f13f74efdfb05876176f4ef24390e0 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Tue, 22 Nov 2022 16:51:27 -0500 Subject: [PATCH 8/9] ci: Add "prevalidating" to the dictionary --- scripts/ci/gitlab/lingua.dic | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/ci/gitlab/lingua.dic b/scripts/ci/gitlab/lingua.dic index 3a19233a8fb9..a2f64af1cbd6 100644 --- a/scripts/ci/gitlab/lingua.dic +++ b/scripts/ci/gitlab/lingua.dic @@ -209,6 +209,7 @@ preconfigured preimage/MS preopen prepend/G +prevalidating prevalidation preverify/G programmatically From 4c0042bf19ffd042c48bad834f3f3313bc6f0081 Mon Sep 17 00:00:00 2001 From: Marcin S Date: Wed, 23 Nov 2022 05:36:47 -0500 Subject: [PATCH 9/9] ig: Remove types/chain.md The types contained therein did not exist and the file was not referenced anywhere. --- roadmap/implementers-guide/src/SUMMARY.md | 1 - roadmap/implementers-guide/src/types/chain.md | 32 ------------------- 2 files changed, 33 deletions(-) delete mode 100644 roadmap/implementers-guide/src/types/chain.md diff --git a/roadmap/implementers-guide/src/SUMMARY.md b/roadmap/implementers-guide/src/SUMMARY.md index bcf87aad8a49..c504b9ac1923 100644 --- a/roadmap/implementers-guide/src/SUMMARY.md +++ b/roadmap/implementers-guide/src/SUMMARY.md @@ -75,7 +75,6 @@ - [Availability](types/availability.md) - [Overseer and Subsystem Protocol](types/overseer-protocol.md) - [Runtime](types/runtime.md) - - [Chain](types/chain.md) - [Messages](types/messages.md) - [Network](types/network.md) - [Approvals](types/approval.md) diff --git a/roadmap/implementers-guide/src/types/chain.md b/roadmap/implementers-guide/src/types/chain.md deleted file mode 100644 index f5a9093bca9e..000000000000 --- a/roadmap/implementers-guide/src/types/chain.md +++ /dev/null @@ -1,32 +0,0 @@ -# Chain - -Types pertaining to the relay-chain - events, structures, etc. - -TODO: These no longer exist. - -## Block Import Event - -```rust -/// Indicates that a new block has been added to the blockchain. -struct BlockImportEvent { - /// The block header-hash. - hash: Hash, - /// The header itself. - header: Header, - /// Whether this block is considered the head of the best chain according to the - /// event emitter's fork-choice rule. - new_best: bool, -} -``` - -## Block Finalization Event - -```rust -/// Indicates that a new block has been finalized. -struct BlockFinalizationEvent { - /// The block header-hash. - hash: Hash, - /// The header of the finalized block. - header: Header, -} -```