Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter checks for ExecutorParams soundness #1704

Closed
s0me0ne-unkn0wn opened this issue Sep 25, 2023 · 7 comments · Fixed by #1774
Closed

Stricter checks for ExecutorParams soundness #1704

s0me0ne-unkn0wn opened this issue Sep 25, 2023 · 7 comments · Fixed by #1774
Assignees
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder.

Comments

@s0me0ne-unkn0wn
Copy link
Contributor

As rightly noted by @mrcnski, people make mistakes, and the governance consists of people. Although the ExecutorParams restrictions are properly documented, a situation may arise when they are not totally sound. Some probable concerns are:

  • Inclusion of multiple occurrences of the same ExecutorParam variant, in which case only the first one will take effect;
  • Inclusion of a parameter value that overcomes limits imposed by the underlying code (e.g., PrecheckingMaxMemory is u64, but if its value is over isize::MAX, a panic will occur);
  • Inclusion of senseless values (e.g., a backing timeout longer than the block time).

ExecutorParams are currently set as a part of HostConfiguration, and there's already a mechanism to check for the configuration soundness that allows refusing an unsound configuration: the consistency check. It should be complemented with appropriate checks for ExecutorParams soundness.

@s0me0ne-unkn0wn s0me0ne-unkn0wn added the C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. label Sep 25, 2023
@mrcnski
Copy link
Contributor

mrcnski commented Sep 26, 2023

Thanks for following up on this. Agreed with the mentor tag, this would be a good one for new contributors!

@eagr
Copy link
Contributor

eagr commented Sep 30, 2023

I wanna help out if it’s ok. A couple of questions.

Is there a reason why the ExecutorParams should be based on Vec? I mean, by turning it into a struct of Options, we could automatically rule out the 1st issue mentioned above.

And why PrecheckingMaxMemory(u64) instead of PrecheckingMaxMemory(usize)? Likewise, the type system could catch the 2nd issue.

@s0me0ne-unkn0wn
Copy link
Contributor Author

I wanna help out if it’s ok.

That's great!

Is there a reason why the ExecutorParams should be based on Vec?

Yes, there's a long story behind that architectural decision. We want them to be easily extendible as we cannot figure out beforehand which parameters we need and what new parameters will appear in the future in Wasmtime and other executors when other implementations appear. And if something kept on-chain is struct, then every change of that struct requires storage migration to be performed, and an old node cannot decode the new struct, which results in a tricky release cycle: we have to release the new version supporting new primitives and wait until supermajority upgrades. Keeping them Vec<ExecutorParam> mitigates that: the node software is able to decode it even if it isn't aware of a new parameter appeared, provided that the new parameter isn't used on-chain yet. That helps to keep the node software release cycle smooth with fewer "critical upgrade" releases.

And why PrecheckingMaxMemory(u64) instead of PrecheckingMaxMemory(usize)?

usize and isize are not Encodeable by definition, as their storage size may vary between different platforms.

Feel free to ask any questions and have a chat/call with me if needed!

@eagr
Copy link
Contributor

eagr commented Oct 1, 2023

Please correct me if I’m wrong or something is not worth checking.

MaxMemoryPages(u32) (0, 65536]
Wasm memory objects currently max out at 4GB, that is, 65536 * 64KB, so the logical limit of this param should be 65536.

StackLogicalMax(u32) (0, 65536]
This is in place to make stack overflows deterministic across different architectures, right? I had to dig into the code to try to figure out what exactly is this param for. Maybe some clarification or a pointer would be nice.

StackNativeMax(u32) [128 * logical_max, 512 * logical_max]
For determinism, this should be big enough that the limit won’t be reached in practice.

PrecheckingMaxMemory(u64) (0, isize::MAX]
PvfPrepTimeout(PvfPrepTimeoutKind, u64) (0, blocktime_ms]
PvfExecTimeout(PvfExecTimeoutKind, u64) (0, blocktime_ms]

@s0me0ne-unkn0wn
Copy link
Contributor Author

s0me0ne-unkn0wn commented Oct 1, 2023

MaxMemoryPages(u32) (0, 65536]
Wasm memory objects currently max out at 4GB, that is, 65536 * 64KB, so the logical limit of this param should be 65536.

Seems legit, although by default, we also add 32 pages for shadow stack and Wasm data segment (as in Wasmtime this value limits not only Wasm linear memory itself but all the VM heap allocations), so I'd probably add those 32 to the max. value.

StackLogicalMax(u32) (0, 65536]
This is in place to make stack overflows deterministic across different architectures, right? I had to dig into the code to try to figure out what exactly is this param for. Maybe some clarification or a pointer would be nice.

Exactly, at least it was intended as such. Before PVF is compiled, it is instrumented with the value stack measuring code (it happens here), that is, every Wasm call instruction is wrapped with injected code adding the pre-calculated function call cost to the counter, and if the limit is reached throws unreachable. That is a deterministic but not reliable stack measuring (see below).

It's hard to say which values are reasonable. I'd definitely raise the minimum one, as any limit close to 0 would render any PVF code non-executable, and we want to insure ourselves against values that definitely break things. Probably 1024 is a good starting point. The current default is 65536 so the upper limit might have been raised to at least 2 * 65536 to have some leverage.

StackNativeMax(u32) [128 * logical_max, 512 * logical_max]
For determinism, this should be big enough that the limit won’t be reached in practice.

Uh, that's an intricate one. This, on the other hand, is reliable but not deterministic. Those two stack limits were paired in belief they are proportional. That seemed logical, and even Wasmtime authors we consulted with agreed they should be. But later, that assumption was proven to be wrong. In general case, the Wasm value stack and native stack are not proportional at all. So they are known to be independent values now, but we're still keeping them proportional just because we don't have any better idea. The logical limit prevents abusing the Wasm stack usage (e.g., through recursion), and the native limit prevents abusing the compiler logic when malicious code using a small value stack depth forces the compiler to generate huge native stack frames.

So, the window from 128x to 512x looks good to me. We're still in search of ways to improve those limits and make them more deterministic.

PrecheckingMaxMemory(u64) (0, isize::MAX]

Here, I'd first rule out 0 and everything close to it for the aforementioned reasons. I benchmarked one of the most sophisticated PVFs currently existing, and it required ~116 Mb to compile in single-treaded mode (currently used) and ~145 Mb in multi-threaded mode (if we ever enable it). So the limit probably should never fall under 256 Mb I'd say.

As for the upper bound, isize::MAX is technically reasonable. But in practice, it's like 9 exabytes! 🤔 Well, we could say it's just "no limit" value, but as this limit is only enforced when it's present, we already have a mechanism to waive it. Seems like putting some practically reasonable value as the upper bound is a good idea, it could rule out obviously erroneous values, but I don't have a strong opinion on the exact value yet. 16 Gb? Just out of my head, maybe someone will come up with a better idea.

PvfPrepTimeout(PvfPrepTimeoutKind, u64) (0, blocktime_ms]

Preparation timeouts are not connected to block times in any way (IIUC), so more freedom is allowed here. Let's make up some constants for now, like (0, 240sec), and decide about the exact value during the review.

Also, if both PvfPrepTimeoutKind::Precheck and PvfPrepTimeoutKind::Lenient are set, we should at least check for Lenient > Precheck, otherwise values are obviously wrong.

PvfExecTimeout(PvfExecTimeoutKind, u64) (0, blocktime_ms]

Also here, we should always check that Approval > Backing if both are present. For the reasonable value range, @eskimor and @burdges, can you please give us some ideas? The question is, what is the reasonable range of backing and approval timeouts, with respect to block time, that we would ever want to set? Like backing timeout from 1/10 blocktime to 1/2 blocktime or something like that.

@burdges
Copy link

burdges commented Oct 1, 2023

We want short backing timeouts of course, like 1 second maybe?

We'll eventually charge backers for time overruns, which we determine from the median running time claimed by approval voters, which yes requires tricky new billing parameters. At that point, approval checkers timeouts should become quite long, like a minute maybe.

At present, approval checker time outs wing up being 4x longer than backing checker time out, but not sure exactly.

I'm unsure these number ever change without a code upgrade.

@the-right-joyce the-right-joyce added the D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. label Oct 12, 2023
@eagr
Copy link
Contributor

eagr commented Oct 13, 2023

Atm, these params are left unchecked until there are good enough candidates for the boundries.

  • preparation timeouts
  • execution timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder.
Projects
Status: Completed
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants