validation host is too eager to deem a candidate invalid #3192

pepyakin · 2021-06-07T14:48:34Z

Recently we observed that the validation host may fail to communicate with the child process that is responsible for compiling the PVF. This now leads to validation host thinking that the PVF that is broken. Therefore, all following execution requests will fail with "invalid candidate".

The logic there is wrong because failing to communicate with the child process should not lead to claiming that the candidate is invalid.

Besides that, it is not clear to me if we should deem the candidate invalid if the PVF itself is invalid.

rphmeier · 2021-06-07T14:58:02Z

I think that from the perspective of the high-level code the 'validate candidate' API should make a best effort to validate the candidate and if that fails then to be clear that there has been an internal failure. We must make sure that malicious code cannot cause issues presenting as an internal failure. When internal issues are encountered the candidate validation subsystem might retry a few times to present an API which the high level code can use simply to answer the question of validity

burdges · 2021-06-07T16:56:31Z

Yes exactly, we should report how we fail though when starting a dispute. We should relevant logs related to disputes too.

pepyakin · 2021-06-09T16:51:52Z

I was thinking about this and I am not sure if I agree.

Imagine a malicious PVF crafted so that it would OOM on 50% of machines during compilation. This would make one side of the chasm to think the candidates for that PVF are valid and the other side would think it is invalid.

Is that really worse than if the second side thought that this is an internal failure?

burdges · 2021-06-09T18:47:13Z

Yes, we prefer if compilation errors are separate from block errors. At first blush, it sounds easy for parachain code if validators compile core ahead of time, but actually yes we likely need a dispute procedure for parachain code even there eventually.

In parathreads, we do not want most validators to ever build most parachain code, so I suggested that code updates be handled like a special block, meaning approval checkers build the code and then we conclude the code is valid. If we expect 16 honest approval checkers then that's p^16 for code that fails build on a proportion p of machines. If p > 2/3 then initially we slash those claiming it builds 100%. If p < 1/3 then initially we slash those claiming it fails maybe 10%. If 1/3 < p < 2/3 then we slash nobody but declare the code invalid. In both slashing cases, we need governance toi perform forensics to revert the slashes, but importantly since forensics concerns only a bad build it can afford to be rather generous.

I suppose parachains could adopt the same code upgrade procedure too, so no separate dispute procedure. As Adam Langley said, "When you design a protocol have one joint and keep it well oiled" or similarly "do one thing and do it well". I'm kinda ambivalent what happens in the shorter term, but longer term once parachains work properly, including #1656, then they should become out well oiled joint, even though they're crazy complex.

rphmeier · 2021-06-10T17:05:58Z

Imagine a malicious PVF crafted so that it would OOM on 50% of machines during compilation. This would make one side of the chasm to think the candidates for that PVF are valid and the other side would think it is invalid.

Can't this be averted somehow? We must have some kind of threat model for parachains where the compilation can succeed on all machines

pepyakin added the I3-bug Fails to follow expected behavior. label Jun 7, 2021

pepyakin mentioned this issue Jul 6, 2021

PVF: unresponsive worker doesn't mean the candidate is bad #3418

Merged

pepyakin closed this as completed in #3418 Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validation host is too eager to deem a candidate invalid #3192

validation host is too eager to deem a candidate invalid #3192

pepyakin commented Jun 7, 2021

rphmeier commented Jun 7, 2021

burdges commented Jun 7, 2021

pepyakin commented Jun 9, 2021

burdges commented Jun 9, 2021 •

edited

Loading

rphmeier commented Jun 10, 2021

validation host is too eager to deem a candidate invalid #3192

validation host is too eager to deem a candidate invalid #3192

Comments

pepyakin commented Jun 7, 2021

rphmeier commented Jun 7, 2021

burdges commented Jun 7, 2021

pepyakin commented Jun 9, 2021

burdges commented Jun 9, 2021 • edited Loading

rphmeier commented Jun 10, 2021

burdges commented Jun 9, 2021 •

edited

Loading