-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PVF: Instantiate wasm in pre-checking #7246
PVF: Instantiate wasm in pre-checking #7246
Conversation
|
||
Pre-checking mostly consists of attempting to prepare (compile) the PVF WASM blob. We use more strict limits (e.g. timeouts) here compared to regular preparation for execution. This way errors during preparation later are likely unrelated to the PVF itself, as it already passed pre-checking. We can treat such errors as local node issues. | ||
|
||
We also have an additional step where we attempt to instantiate the WASM runtime without running it. This is unrelated to preparation so we don't time it, but it does help us catch more issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we now treat runtime construction errors in execution as internal errors (abstain from voting)? I didn't do that because the PVF can still be un-instantiable on certain machines/configurations while making it through the pre-check vote, making attacks against those environments possible.
Or can we abstain from voting and still be sure that finality won't stall due to this PVF, since it passed the pre-checking quorum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, I think we have to have nodes abstain from voting in that case, anything else would be unfair: They voted against the PVF in pre-checking, but later we slash them because they fail validating a candidate? Does not sound fair.
On the other hand if it passed pre-checking, we know there is a super majority being able to validate the candidate, so it should be "fine" if some nodes don't vote.
I do think we have to time the instantiation - it could be possible to make this take a very long time?
At the same time, I would assume in the honest case for it to be very quick, so it should be ok to make this part of the preparation timeout in pre-checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we have to time the instantiation - it could be possible to make this take a very long time?
At the same time, I would assume in the honest case for it to be very quick, so it should be ok to make this part of > the preparation timeout in pre-checking.
We do have the long host-side timeout, but yeah that's not ideal. Lumping it in with the preparation timeout makes it harder to reason about as we are no longer comparing the same operations in prechecking vs. preparation.
Best would be to have a separate timeout for instantiation, as well as for instantiation in execution where it would be more lenient. This way we actually make sure that if it passes in prechecking it should pass in execution. And having it in a separate thread here as well as in execution means we can do more fine-grained sandboxing later. Or am I overcomplicating it? I can also just do your suggestion for now and file an issue for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that our pre-checking timeout is already way lower than the preparation timeout, it seems fine to include the instantiation - assuming we make sure the timeout is such that for honest PVFs it should not trigger. So indeed, I think we can keep it simple. It does not have to be comparable - it has to be "lower enough". Given that execution timeout is in the seconds, which includes instantiation I can hardly imagine how this simplification could be causing issues. It should be clearly documented though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we need the compiled artifact on-disk to run instantiation, and by that point the timeout has been stopped. We need to implement paritytech/substrate#13861. Then we could check that the artifact is instantiable before we actually write it to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still publish VRFs but then no-show I think. In principle a vote against the PVF could mean "no-show early" but afaik there is no reason forthat extra code complexity, probably you just announce your VRF and then abandon it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we need the compiled artifact on-disk to run instantiation, and by that point the timeout has been stopped. We need to implement paritytech/substrate#13861. Then we could check that the artifact is instantiable before we actually write it to disk.
Let's just make sure there is some timeout. The rest is: "What is easiest to implement" - it should not matter too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eskimor Done. Once we have sandboxing I think this will need its own thread. For now it should be good.
/// | ||
/// # Safety | ||
/// | ||
/// See `execute` for safety considerations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
[`execute`]
to create a doc-link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to pvf/common
since Executor
was now used in prechecking for runtime construction!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mrcnski !
sp_io::trie::HostFunctions, | ||
); | ||
|
||
/// The validation externalities that will panic on any storage related access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear here, why we would want that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already there, I just moved some stuff around in this PR. But, it is a good question. I always assumed it was because validation shouldn't ever need to call these functions, though maybe there is a stronger reason. We should confirm the reasoning and document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is off-topic but I feel like we should make the historical PRs more discoverable. Especially now that I borked the blame history with all my refactors. 🙈 Maybe a HISTORY.md
would be nice, for context and for linking out to important PRs. And why is this project closed I wonder? 🤔
bot merge |
PULL REQUEST
Overview
Do an additional instantiation check to catch more issues.
Relevant issues
Closes #7218