Stricter checks for `ExecutorParams` soundness #1704

s0me0ne-unkn0wn · 2023-09-25T19:27:50Z

As rightly noted by @mrcnski, people make mistakes, and the governance consists of people. Although the ExecutorParams restrictions are properly documented, a situation may arise when they are not totally sound. Some probable concerns are:

Inclusion of multiple occurrences of the same ExecutorParam variant, in which case only the first one will take effect;
Inclusion of a parameter value that overcomes limits imposed by the underlying code (e.g., PrecheckingMaxMemory is u64, but if its value is over isize::MAX, a panic will occur);
Inclusion of senseless values (e.g., a backing timeout longer than the block time).

ExecutorParams are currently set as a part of HostConfiguration, and there's already a mechanism to check for the configuration soundness that allows refusing an unsound configuration: the consistency check. It should be complemented with appropriate checks for ExecutorParams soundness.

The text was updated successfully, but these errors were encountered:

mrcnski · 2023-09-26T08:42:33Z

Thanks for following up on this. Agreed with the mentor tag, this would be a good one for new contributors!

eagr · 2023-09-30T15:10:06Z

I wanna help out if it’s ok. A couple of questions.

Is there a reason why the ExecutorParams should be based on Vec? I mean, by turning it into a struct of Options, we could automatically rule out the 1st issue mentioned above.

And why PrecheckingMaxMemory(u64) instead of PrecheckingMaxMemory(usize)? Likewise, the type system could catch the 2nd issue.

s0me0ne-unkn0wn · 2023-09-30T16:06:44Z

I wanna help out if it’s ok.

That's great!

Is there a reason why the ExecutorParams should be based on Vec?

Yes, there's a long story behind that architectural decision. We want them to be easily extendible as we cannot figure out beforehand which parameters we need and what new parameters will appear in the future in Wasmtime and other executors when other implementations appear. And if something kept on-chain is struct, then every change of that struct requires storage migration to be performed, and an old node cannot decode the new struct, which results in a tricky release cycle: we have to release the new version supporting new primitives and wait until supermajority upgrades. Keeping them Vec<ExecutorParam> mitigates that: the node software is able to decode it even if it isn't aware of a new parameter appeared, provided that the new parameter isn't used on-chain yet. That helps to keep the node software release cycle smooth with fewer "critical upgrade" releases.

And why PrecheckingMaxMemory(u64) instead of PrecheckingMaxMemory(usize)?

usize and isize are not Encodeable by definition, as their storage size may vary between different platforms.

Feel free to ask any questions and have a chat/call with me if needed!

eagr · 2023-10-01T07:42:04Z

Please correct me if I’m wrong or something is not worth checking.

MaxMemoryPages(u32) (0, 65536]
Wasm memory objects currently max out at 4GB, that is, 65536 * 64KB, so the logical limit of this param should be 65536.

StackLogicalMax(u32) (0, 65536]
This is in place to make stack overflows deterministic across different architectures, right? I had to dig into the code to try to figure out what exactly is this param for. Maybe some clarification or a pointer would be nice.

StackNativeMax(u32) [128 * logical_max, 512 * logical_max]
For determinism, this should be big enough that the limit won’t be reached in practice.

PrecheckingMaxMemory(u64) (0, isize::MAX]
PvfPrepTimeout(PvfPrepTimeoutKind, u64) (0, blocktime_ms]
PvfExecTimeout(PvfExecTimeoutKind, u64) (0, blocktime_ms]

s0me0ne-unkn0wn · 2023-10-01T11:03:33Z

MaxMemoryPages(u32) (0, 65536]
Wasm memory objects currently max out at 4GB, that is, 65536 * 64KB, so the logical limit of this param should be 65536.

Seems legit, although by default, we also add 32 pages for shadow stack and Wasm data segment (as in Wasmtime this value limits not only Wasm linear memory itself but all the VM heap allocations), so I'd probably add those 32 to the max. value.

StackLogicalMax(u32) (0, 65536]
This is in place to make stack overflows deterministic across different architectures, right? I had to dig into the code to try to figure out what exactly is this param for. Maybe some clarification or a pointer would be nice.

Exactly, at least it was intended as such. Before PVF is compiled, it is instrumented with the value stack measuring code (it happens here), that is, every Wasm call instruction is wrapped with injected code adding the pre-calculated function call cost to the counter, and if the limit is reached throws unreachable. That is a deterministic but not reliable stack measuring (see below).

It's hard to say which values are reasonable. I'd definitely raise the minimum one, as any limit close to 0 would render any PVF code non-executable, and we want to insure ourselves against values that definitely break things. Probably 1024 is a good starting point. The current default is 65536 so the upper limit might have been raised to at least 2 * 65536 to have some leverage.

StackNativeMax(u32) [128 * logical_max, 512 * logical_max]
For determinism, this should be big enough that the limit won’t be reached in practice.

Uh, that's an intricate one. This, on the other hand, is reliable but not deterministic. Those two stack limits were paired in belief they are proportional. That seemed logical, and even Wasmtime authors we consulted with agreed they should be. But later, that assumption was proven to be wrong. In general case, the Wasm value stack and native stack are not proportional at all. So they are known to be independent values now, but we're still keeping them proportional just because we don't have any better idea. The logical limit prevents abusing the Wasm stack usage (e.g., through recursion), and the native limit prevents abusing the compiler logic when malicious code using a small value stack depth forces the compiler to generate huge native stack frames.

So, the window from 128x to 512x looks good to me. We're still in search of ways to improve those limits and make them more deterministic.

PrecheckingMaxMemory(u64) (0, isize::MAX]

Here, I'd first rule out 0 and everything close to it for the aforementioned reasons. I benchmarked one of the most sophisticated PVFs currently existing, and it required ~116 Mb to compile in single-treaded mode (currently used) and ~145 Mb in multi-threaded mode (if we ever enable it). So the limit probably should never fall under 256 Mb I'd say.

As for the upper bound, isize::MAX is technically reasonable. But in practice, it's like 9 exabytes! 🤔 Well, we could say it's just "no limit" value, but as this limit is only enforced when it's present, we already have a mechanism to waive it. Seems like putting some practically reasonable value as the upper bound is a good idea, it could rule out obviously erroneous values, but I don't have a strong opinion on the exact value yet. 16 Gb? Just out of my head, maybe someone will come up with a better idea.

PvfPrepTimeout(PvfPrepTimeoutKind, u64) (0, blocktime_ms]

Preparation timeouts are not connected to block times in any way (IIUC), so more freedom is allowed here. Let's make up some constants for now, like (0, 240sec), and decide about the exact value during the review.

Also, if both PvfPrepTimeoutKind::Precheck and PvfPrepTimeoutKind::Lenient are set, we should at least check for Lenient > Precheck, otherwise values are obviously wrong.

PvfExecTimeout(PvfExecTimeoutKind, u64) (0, blocktime_ms]

Also here, we should always check that Approval > Backing if both are present. For the reasonable value range, @eskimor and @burdges, can you please give us some ideas? The question is, what is the reasonable range of backing and approval timeouts, with respect to block time, that we would ever want to set? Like backing timeout from 1/10 blocktime to 1/2 blocktime or something like that.

burdges · 2023-10-01T18:49:22Z

We want short backing timeouts of course, like 1 second maybe?

We'll eventually charge backers for time overruns, which we determine from the median running time claimed by approval voters, which yes requires tricky new billing parameters. At that point, approval checkers timeouts should become quite long, like a minute maybe.

At present, approval checker time outs wing up being 4x longer than backing checker time out, but not sure exactly.

I'm unsure these number ever change without a code upgrade.

eagr · 2023-10-13T15:31:37Z

Atm, these params are left unchecked until there are good enough candidates for the boundries.

preparation timeouts
execution timeouts

s0me0ne-unkn0wn added the C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. label Sep 25, 2023

s0me0ne-unkn0wn mentioned this issue Sep 25, 2023

Tracking/limiting memory allocator #1192

Merged

eagr mentioned this issue Oct 3, 2023

Check executor params coherence #1774

Merged

the-right-joyce added the D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. label Oct 12, 2023

mrcnski closed this as completed in #1774 Oct 13, 2023

mrcnski assigned eagr Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter checks for `ExecutorParams` soundness #1704

Stricter checks for `ExecutorParams` soundness #1704

s0me0ne-unkn0wn commented Sep 25, 2023

mrcnski commented Sep 26, 2023

eagr commented Sep 30, 2023

s0me0ne-unkn0wn commented Sep 30, 2023

eagr commented Oct 1, 2023

s0me0ne-unkn0wn commented Oct 1, 2023 •

edited

Loading

burdges commented Oct 1, 2023

eagr commented Oct 13, 2023

Stricter checks for ExecutorParams soundness #1704

Stricter checks for ExecutorParams soundness #1704

Comments

s0me0ne-unkn0wn commented Sep 25, 2023

mrcnski commented Sep 26, 2023

eagr commented Sep 30, 2023

s0me0ne-unkn0wn commented Sep 30, 2023

eagr commented Oct 1, 2023

s0me0ne-unkn0wn commented Oct 1, 2023 • edited Loading

burdges commented Oct 1, 2023

eagr commented Oct 13, 2023

Stricter checks for `ExecutorParams` soundness #1704

Stricter checks for `ExecutorParams` soundness #1704

s0me0ne-unkn0wn commented Oct 1, 2023 •

edited

Loading