Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Refactor WASM module instantiation #10480

Merged
merged 22 commits into from
Mar 19, 2022

Conversation

koute
Copy link
Contributor

@koute koute commented Dec 14, 2021

This PR refactors WASM module instantiation slightly to register the host callbacks through the Linker instead of doing it directly on the Store, which is slightly cleaner and is necessary to switch to native wasmtime instance pooling without sacrificing (too much) performance.

I've also enabled the instance pooling here and added the relevant benchmarks; with this PR merged the performance should be inline with the numbers I've posted here.

I've enabled the uffd feature on wasmtime here, however according to @pepyakin this does impose a minimum requirement of Linux 4.11, so we should decide before merging whether we want to keep that enabled or not. (If we do keep it then this should be bumped from B0-silent to B5-clientnoteworthy.)

The switch to instance pooling was postponed. (See #10244 (comment) for details.)

@koute koute added A0-please_review Pull request needs code review. I7-refactor Code needs refactoring. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Dec 14, 2021
@koute koute requested review from pepyakin and bkchr December 14, 2021 11:25
@koute
Copy link
Contributor Author

koute commented Dec 15, 2021

...look like our CI machines are failing due to the uffd being enabled, so they are probably running on the old kernel. I guess this is as good of a sign as any that we probably shouldn't enable it (at least not yet), so I'll just remove it for now.

I do wonder though if it'd be possible to perhaps close the speed gap here and get the benefits of uffd with some alternative method (e.g. taking advantage of copy-on-write pages, or using 2MB hugepages, etc.); that'd most likely require changes to wasmtime, but might be worth investigating.

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief glance

client/executor/common/src/runtime_blob/runtime_blob.rs Outdated Show resolved Hide resolved
client/executor/common/src/runtime_blob/runtime_blob.rs Outdated Show resolved Hide resolved
client/executor/common/src/runtime_blob/runtime_blob.rs Outdated Show resolved Hide resolved
@pepyakin
Copy link
Contributor

pepyakin commented Dec 15, 2021

Oh what a shame. Well, yeah, maybe it is a sign.

I was really under impression that COW should improve the situation. In fact, I actually did some experiments. What I did is that I created a file, set the right size and filled it with the initial data: i.e. data segments on top of zeroed data. Then for each instance I mmap that fd with MAP_PRIVATE. That was really long time ago though and I do not remember all details, but it was slow. Maybe that had something to do with copying the pages in CoW that was filled with zeroes instead of actually zero them. I imagine the latter may be faster. I did not really dig deep so I do not know. I am pretty sure that you may be able to dig deeper armed with perf and insight what happens in the kernel.

UPDATE: I did find the lead to those experiments #3011 (review)

Also, I was surprised that wasmtime folks went with uffd and not the COW, because in my mental model it COW should be quicker since it does not involve the userspace page fault handling. I am pretty sure they know something that I do not know.

Ideally, wasmtime provides an API similar (or maybe equivalent) to InstancePre that allows to instantaneously spawn copies of some pre-initialized instance, similarly to fork or Android's Zygote.

I am not sure about big pages though. They can help in case we have TLB pressure, but do we? If we mix UFFD or COW with 2MiB pages we may risk handling page faults that 2 MiB even though only a part of it was actually used.

@koute
Copy link
Contributor Author

koute commented Dec 16, 2021

Hmm... well, now that I've looked through wasmtime code I could probably just make it so that enabling uffd and running on an old kernel degrades gracefully and acts just like if uffd wasn't enabled in the first place; shouldn't be too hard.

Anyway, I did some experiments on wasmtime with COW memory and... it actually looks promising. On our benchmarks with the Kusama runtime the invocation time dropped from ~48us (which is what we get with either fast instance reuse or instance pooling + uffd) to ~20us, so it might be worth it to investigate this even further. (Of course this is just a YOLO proof of concept implementation; doing this properly would require more work to handle all of the corner cases.) From the profiling I did it might be possible to go even lower, since now after COW-ing the linear memory I see a bunch of normal memory allocations related to imports which dominate the runtime and which should also be cacheable across invocations.

[edit]After switching to preinitializing the module the invocation time with COW'd memory goes down to ~14us.[/edit]

@koute koute changed the title Refactor WASM module instantiation; enable WASM instance pooling Refactor WASM module instantiation Dec 16, 2021
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking soo long 🙈

client/executor/common/src/runtime_blob/runtime_blob.rs Outdated Show resolved Hide resolved
client/executor/common/src/runtime_blob/runtime_blob.rs Outdated Show resolved Hide resolved
client/executor/wasmtime/src/imports.rs Outdated Show resolved Hide resolved
client/executor/benches/bench.rs Show resolved Hide resolved
client/executor/wasmtime/src/imports.rs Outdated Show resolved Hide resolved
client/executor/wasmtime/src/imports.rs Outdated Show resolved Hide resolved
client/executor/wasmtime/src/imports.rs Outdated Show resolved Hide resolved
client/executor/wasmtime/src/runtime.rs Outdated Show resolved Hide resolved
client/executor/wasmtime/src/runtime.rs Outdated Show resolved Hide resolved
@koute koute requested review from bkchr and pepyakin February 18, 2022 10:46
@koute
Copy link
Contributor Author

koute commented Mar 10, 2022

@bkchr @pepyakin Ping. (:

New wasmtime was released a few days ago with support for COW memory mappings so I can now continue with this work; we need this merged in first though. (If you prefer I could just push new changes here, but that'll just make this PR bigger.)

@bkchr
Copy link
Member

bkchr commented Mar 19, 2022

bot merge

@paritytech-processbot
Copy link

Bot will approve on the behalf of @bkchr, since they are a team lead, in an attempt to reach the minimum approval count

@paritytech-processbot paritytech-processbot bot merged commit b64647c into paritytech:master Mar 19, 2022
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Refactor WASM module instantiation; enable WASM instance pooling

* Disable the `uffd` feature on `wasmtime`

* Restore the original behavior regarding the initial WASM memory size

* Adjust error message

* Remove unnecessary import in the benchmarks

* Preinstantiate the WASM runtime for a slight speedup

* Delete the asserts in `convert_memory_import_into_export`

* `return` -> `break`

* Revert WASM instance pooling for now

* Have `convert_memory_import_into_export` return an error instead of panic

* Update the warning when an import is missing

* Rustfmt and clippy fix

* Fix executor benchmarks' compilation without `wasmtime` being enabled

* rustfmt again

* Align to review comments

* Extend tests so that both imported and exported memories are tested

* Increase the number of heap pages for exported memories too

* Fix `decommit_works` test
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Refactor WASM module instantiation; enable WASM instance pooling

* Disable the `uffd` feature on `wasmtime`

* Restore the original behavior regarding the initial WASM memory size

* Adjust error message

* Remove unnecessary import in the benchmarks

* Preinstantiate the WASM runtime for a slight speedup

* Delete the asserts in `convert_memory_import_into_export`

* `return` -> `break`

* Revert WASM instance pooling for now

* Have `convert_memory_import_into_export` return an error instead of panic

* Update the warning when an import is missing

* Rustfmt and clippy fix

* Fix executor benchmarks' compilation without `wasmtime` being enabled

* rustfmt again

* Align to review comments

* Extend tests so that both imported and exported memories are tested

* Increase the number of heap pages for exported memories too

* Fix `decommit_works` test
grishasobol pushed a commit to gear-tech/substrate that referenced this pull request Mar 28, 2022
* Refactor WASM module instantiation; enable WASM instance pooling

* Disable the `uffd` feature on `wasmtime`

* Restore the original behavior regarding the initial WASM memory size

* Adjust error message

* Remove unnecessary import in the benchmarks

* Preinstantiate the WASM runtime for a slight speedup

* Delete the asserts in `convert_memory_import_into_export`

* `return` -> `break`

* Revert WASM instance pooling for now

* Have `convert_memory_import_into_export` return an error instead of panic

* Update the warning when an import is missing

* Rustfmt and clippy fix

* Fix executor benchmarks' compilation without `wasmtime` being enabled

* rustfmt again

* Align to review comments

* Extend tests so that both imported and exported memories are tested

* Increase the number of heap pages for exported memories too

* Fix `decommit_works` test
DaviRain-Su pushed a commit to octopus-network/substrate that referenced this pull request Aug 23, 2022
* Refactor WASM module instantiation; enable WASM instance pooling

* Disable the `uffd` feature on `wasmtime`

* Restore the original behavior regarding the initial WASM memory size

* Adjust error message

* Remove unnecessary import in the benchmarks

* Preinstantiate the WASM runtime for a slight speedup

* Delete the asserts in `convert_memory_import_into_export`

* `return` -> `break`

* Revert WASM instance pooling for now

* Have `convert_memory_import_into_export` return an error instead of panic

* Update the warning when an import is missing

* Rustfmt and clippy fix

* Fix executor benchmarks' compilation without `wasmtime` being enabled

* rustfmt again

* Align to review comments

* Extend tests so that both imported and exported memories are tested

* Increase the number of heap pages for exported memories too

* Fix `decommit_works` test
ark0f pushed a commit to gear-tech/substrate that referenced this pull request Feb 27, 2023
* Refactor WASM module instantiation; enable WASM instance pooling

* Disable the `uffd` feature on `wasmtime`

* Restore the original behavior regarding the initial WASM memory size

* Adjust error message

* Remove unnecessary import in the benchmarks

* Preinstantiate the WASM runtime for a slight speedup

* Delete the asserts in `convert_memory_import_into_export`

* `return` -> `break`

* Revert WASM instance pooling for now

* Have `convert_memory_import_into_export` return an error instead of panic

* Update the warning when an import is missing

* Rustfmt and clippy fix

* Fix executor benchmarks' compilation without `wasmtime` being enabled

* rustfmt again

* Align to review comments

* Extend tests so that both imported and exported memories are tested

* Increase the number of heap pages for exported memories too

* Fix `decommit_works` test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit I7-refactor Code needs refactoring.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants