Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

validation host is too eager to deem a candidate invalid #3192

Closed
pepyakin opened this issue Jun 7, 2021 · 5 comments · Fixed by #3418
Closed

validation host is too eager to deem a candidate invalid #3192

pepyakin opened this issue Jun 7, 2021 · 5 comments · Fixed by #3418
Labels
I3-bug Fails to follow expected behavior.

Comments

@pepyakin
Copy link
Contributor

pepyakin commented Jun 7, 2021

Recently we observed that the validation host may fail to communicate with the child process that is responsible for compiling the PVF. This now leads to validation host thinking that the PVF that is broken. Therefore, all following execution requests will fail with "invalid candidate".

The logic there is wrong because failing to communicate with the child process should not lead to claiming that the candidate is invalid.

Besides that, it is not clear to me if we should deem the candidate invalid if the PVF itself is invalid.

@pepyakin pepyakin added the I3-bug Fails to follow expected behavior. label Jun 7, 2021
@rphmeier
Copy link
Contributor

rphmeier commented Jun 7, 2021

I think that from the perspective of the high-level code the 'validate candidate' API should make a best effort to validate the candidate and if that fails then to be clear that there has been an internal failure. We must make sure that malicious code cannot cause issues presenting as an internal failure. When internal issues are encountered the candidate validation subsystem might retry a few times to present an API which the high level code can use simply to answer the question of validity

@burdges
Copy link
Contributor

burdges commented Jun 7, 2021

Yes exactly, we should report how we fail though when starting a dispute. We should relevant logs related to disputes too.

@pepyakin
Copy link
Contributor Author

pepyakin commented Jun 9, 2021

I was thinking about this and I am not sure if I agree.

Imagine a malicious PVF crafted so that it would OOM on 50% of machines during compilation. This would make one side of the chasm to think the candidates for that PVF are valid and the other side would think it is invalid.

Is that really worse than if the second side thought that this is an internal failure?

@burdges
Copy link
Contributor

burdges commented Jun 9, 2021

Yes, we prefer if compilation errors are separate from block errors. At first blush, it sounds easy for parachain code if validators compile core ahead of time, but actually yes we likely need a dispute procedure for parachain code even there eventually.

In parathreads, we do not want most validators to ever build most parachain code, so I suggested that code updates be handled like a special block, meaning approval checkers build the code and then we conclude the code is valid. If we expect 16 honest approval checkers then that's p^16 for code that fails build on a proportion p of machines. If p > 2/3 then initially we slash those claiming it builds 100%. If p < 1/3 then initially we slash those claiming it fails maybe 10%. If 1/3 < p < 2/3 then we slash nobody but declare the code invalid. In both slashing cases, we need governance toi perform forensics to revert the slashes, but importantly since forensics concerns only a bad build it can afford to be rather generous.

I suppose parachains could adopt the same code upgrade procedure too, so no separate dispute procedure. As Adam Langley said, "When you design a protocol have one joint and keep it well oiled" or similarly "do one thing and do it well". I'm kinda ambivalent what happens in the shorter term, but longer term once parachains work properly, including #1656, then they should become out well oiled joint, even though they're crazy complex.

@rphmeier
Copy link
Contributor

Imagine a malicious PVF crafted so that it would OOM on 50% of machines during compilation. This would make one side of the chasm to think the candidates for that PVF are valid and the other side would think it is invalid.

Can't this be averted somehow? We must have some kind of threat model for parachains where the compilation can succeed on all machines

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug Fails to follow expected behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants