Merge with upstream #267

dhil · 2024-12-09T14:18:13Z

No description provided.

…lliance#9742) * Use a single section name for all subsections This reduces the size of the object file as identical section names get deduplicated. * Add a per_function_section equivalent for data objects On cg_clif's simple-raytracer benchmark this reduces the executable's size by 1MB from ~10MB to ~9MB. It does make lld slower though seemingly unlike per_function_sections, so keeping this as separate flag is beneficial for cg_clif. * Add link to PR

* cargo update -p futures-util updating because: `warning: package `futures-util v0.3.27` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked` * cargo update -p hermit-abi warning: package `hermit-abi v0.3.0` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked * cargo update -p iana-time-zone updated because: `warning: package `iana-time-zone v0.1.59` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked` * cargo update -p spin updated because `warning: package `spin v0.9.4` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked` * cargo vet trust --all alexcrichton --allow-multiple-publishers * perform vets. exempt hermit-abi, because we don't build for that platform.

* Shuffle fields in `VMRuntimeLimits` Right now this structure has a pointer-sized field, two 64-bit integers, and then three pointer-sized fields. This structure's layout is nontrivial to calculate on 32-bit platforms as it needs to take the alignment of 64-bit integers into account which differs between ARM and x86 for example. To make this easier shuffle all the 64-bit integers are now first, which means we don't have to worry about alignment. I'll note that this particular ordering is still a bit brittle because we might need to shuffle things again if more fields are added. That being said any misalignment is caught during testing of the `wasmtime` crate so there's not much danger in adding more things, it'll just require updating a few more locations. * Update test expectations

…ealliance#9736) This is a leftover from the `cranelift-wasm` days and it's not necessary any more. Remove it and have an inherent impl for the trait methods instead.

…nce#9735) * pulley: Track faulting opcode in stack overflow better This commit updates the pulley `Encode` trait to have a `WIDTH` associated with it to be able to calculate the size of an instruction by name rather than hard-coding instruction details in multiple locations. This constant is procedurally generated per-instruction given the definition of the instruction. This for now assumes there are no variable-width instructions. * Fix tests * Fix feature'd build

) * pulley: Implement sign-extension cranelift helper The `uextend` and `sextend` lowerings aren't done yet but this should help fill out ABI bits instead of panicking. Example tests are added showing what support is added here. * Review comments

This commit updates how the `signals_based_traps` setting is read for various instruction generation during the wasm-to-CLIF translation phase. Specifically this splits the `signals_based_traps` configuration into one for "general CLIF instructions can trap" and another for "memory traps are allowed". These settings are based on `signals_based_traps` but then differ for Pulley. For example Pulley allows general instructions to trap since that just halts the interpreter, but memory-related instructions can't trap since segfaults aren't caught during the execution of Pulley. These settings then take Pulley into account to assist in translating Pulley compatibly by default.

This commit is the equivalent of bytecodealliance#9675 for libcalls used in core wasm. All libcalls now communicate whether or not they trapped through their return value instead of implicitly calling `longjmp` to exit from the libcall. This is to make integration with Pulley easier to avoid the need to `longjmp` over Pulley execution. Libcall definitions have changed where appropriate and the `catch_unwind_and_record_trap` function introduced in bytecodealliance#9675 was refactored to better support multiple types of values being returned from libcalls (instead of just `Result<()>`). Note that changes have been made to both the Cranelift translation layer and the Winch translation layer for this as the ABI of various libcalls are all changing.

…lliance#9741) * Default to Pulley on cranelift/winch-unsupported platforms This commit updates the selection of the default target used by a `Config` to take into account the host architecture and possibly use Pulley instead of the host target itself. The goal here is that when a target isn't explicitly configured then Wasmtime is tasked with picking a reasonable default to execute code with. If neither Winch nor Cranelift has any support then the only way to possibly execute code would be with Pulley, so in these situations pulley becomes the default target. The goal of this change is to make testing easier on 32-bit platforms where there is no compiler support. This means we won't have to update all tests to explicitly configure pulley on unsupported platforms. Additionally this means that an eventual `wasmtime` executable for 32-bit platforms will use Pulley by default and won't need any extra configuration. * Fix miri tunables

…iance#9743) * Enumerate all host calls in `wasmtime_environ::HostCall` This commit is a continuation of the plan of implementing host calls in Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host` method is updated to take a new type, `HostCall`, which indicates what type of host call is being performed. This is then serialized to a 32-bit integer which will be present in the pulley instruction being generated. This 32-bit integer will then be used to perform a dispatch (the dispatch is left for a future PR with more Pulley integration). This new `HostCall` structure is defined with `BuiltinFunctionIndex` internally. Additionally a new `ComponentBuiltinFunctionIndex` is added to enumerate the same set of indexes for components as well. Along the way the split between component transcoders/builtins were removed and they're now all lumped together in one macro for builtins. (no need to have two separate macros). This new `HostCall` is used to implement the `call_indirect_host` instruction for Pulley to fill out an unimplemented piece of code. * Rename `max` to `len`

…ance#9746) Handling review comments from bytecodealliance#9665 and fully updating documentation to reflect the mid-pr design shift to the currently-landed state.

* format: fix typo * format: wrap line length * format: re-wrap comment * format: organize crate dependencies

…nce#9749) * pulley: Expand endianness documentation of `XRegUnion` Just had some discussion with Nick about this that felt best to immortalize in documentation rather than just having it in our heads. * Apply suggestions from code review Co-authored-by: Andrew Brown <andrew.brown@intel.com> --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>

* winch: Epoch-based interruption Closes bytecodealliance#8091 This commit introduces support for epoch interruption to Winch. The heuristics around epoch check emission are identical to Cranelift's except for the fact that the current implementation doesn't introduce a local-based cache for the current epoch deadline. This is an intentional decision given Winch's focus on compilation performance. However, if needed in the future, knobs could be introduced to optionally introduce a local cache at the cost of reduced compilation performance. * Add disas tests for x64/epoch * Refactor to `maybe_emit_epoch_check` * Generate expected output for disas

* Update to target-lexicon 0.13.0 Pulling in bytecodealliance/target-lexicon#115 to lay the foundation for some big-endian work in Pulley * Fix more compile errors

* Don't cancel workflows on Windows runners Our CI has configuration such that on failure of any job it cancels the entire workflow run. This means that if a failure is quickly encountered it doesn't wait for the entire workflow to finish just to report the failure, clearing out the queue of failing PRs faster. This doesn't work well on Windows runners though where often the cancellation and marking the job as failed race. The winner of the race is often the cancellation which means that there's no listed failure in 100+ jobs which can be frustrating. Instead centralize this cancellation logic in one helper and additionally add a condition where it doesn't cancel on Windows platforms. That means that failures on windows won't be fail-fast, but that's sort of the best that we can do for now. * Fix syntax

* Use `runner.os` in CI configuration more Instead of using `matrix.os` which can change over time as images change instead use `runner.os` to both name jobs and have conditions on jobs. This couples it more closely to what's intended, the choice of OS, rather than image version. * Not available for names

This commits ports the suggestions made in bytecodealliance#9737 to fuel checks. Namely * Prefer the usage of `*_reg` over `*_var` * Add a couple of disassembly tests

* winch: Track the callee calling convention This commit aims to improve the consistency of call emission in Winch. Prior to this commit, the calling convention at Winch's assembler layer was hardcoded. Even though the calling convention invariants are correctly handled early on in the code generation process and this has no effect on the generated code, it could lead to subtle bugs if Cranelift's emission infrastructure changes. It could also be confusing when trying to implement call instrutions for other backends. This change is motivated by some of the questions in bytecodealliance#9751 * Clippy fixes

* Support executing Pulley in Wasmtime This commit is the initial implementation of executing the Pulley interpreter from the `wasmtime` crate. This gives access to all of the `wasmtime` crate's runtime APIs backed by execution of bytecode in Pulley. This builds on the previous PRs I've been making for support in Pulley to culminate in testing on CI in this PR. This PR handles some final tidbits related to producing a runnable image that can be interpreted by the `wasmtime` crate such as: * Pulley compilation artifacts are no longer marked as natively executable, just read-only. * Pulley compilation artifacts include wasm-to-array trampolines like normal platforms (removes a pulley special-case). * Dispatch of host calls from Pulley to the Wasmtime runtime are implemented. * Pulley's list of panicking wasm features is slimmed down as most are covered by "this lowering isn't supported" errors. * Execution of wasm code now has an `if` to see whether Pulley is enabled within a `Store` or not. * Traps and basic "unwinding" of the pulley stack are implemented (e.g. a "pulley version" of `setjmp` and `longjmp`, sort of) * Halting the interpreter has been refactored to help shrink the size of `ControlFlow<Done>` and handle metadata with each done state. Some minor refactorings are also included here and there along with a few fixes here and there necessary to get tests passing. The next major part of this commit is updates to our `wast` test suite and executing all `*.wast` files. Pulley is now executed by default for all files as a new execution engine. This means that all platforms in CI will start executing Pulley tests. At this time almost all tests are flagged as "expected to fail" but there are a small handful of allow-listed tests which are expected to pass. This exact list will change over time as CLIF lowerings are implemented and the interpreter is extended. Follow-up PRs will extend the testing strategy further such as: * Extending `#[wasmtime_test]` to include Pulley in addition to Winch. * Getting testing set up on CI for 32-bit platforms. prtest:full * Fix pulley fuzz build * Fix clippy lints * Shuffle around some `#[cfg]`'d code * Remove unused imports * Update feature sets testing MIRI Enable pulley for wasmtime/wasmtime-cli and also enable all features for wasmtime-environ * Round up pulley's page size to 64k * Skip pulley tests on s390x for now * Add a safety rail for matching a pulley target to the host * Fix more pulley tests on s390x * Review comments * Fix fuzz build

This commit extends our CI for i686 and armv7 to test the Pulley backend, namely the full `*.wast` test suite as well as the `wasmtime` crate itself. Note that many `*.wast` tests are still expected to fail at this time. This involved fixing a number of 32-vs-64 bit issues throughout the test suite in various location in this commit.

* Implement custom stack unwinding for Pulley The pulley calling convention may not match the native calling convention, which is the case for s390x, so the unwinding that Wasmtime does needs to be parameterized over what's being unwound. This commit adds a new `Unwind` trait with various methods behind it that the backtrace implementation is then updated to use. * Fix alignment check for 32-bit

Fixing a mistake in bytecodealliance#9754 not caught til it was actually executed during a failed run on bytecodealliance#9758.

* pulley: Support big-endian targets This commit fixes the Pulley interpreter and Cranelift backend to properly support big-endian targets. The problem beforehand was that loads/stores in Pulley were defined as the native endianness which means that big and little-endian platforms differed. Additionally the previously set of understood pulley targets were implicitly always little-endian which was not appropriate for a big-endian host where JIT data structures are stored in big-endian format. This commit fixes all of these issues with a few changes: * Pulley loads/stores are always little-endian now. * Pulley now has bswap32/bswap64 instructions. * Wasmtime/Cranelift now understand big-endian pulley targets (e.g. `pulley32be`). * CLIF translation of loads/stores now properly handles the endianness flags on `MemFlags`. In the future if necessary we could natively add a macro-op for big-endian loads/stores to Pulley but for now it's more minimal to just have `bswap32` and `bswap64`. The result of this commit is that Pulley tests are now run and executed on s390x like all other platforms. * Fix fuzz build * Review comments * Fix fuzz build

* Simplify mmap interface slightly Return a single `SendSyncPtr` -- the platform-independent context converts to the various raw pointer types as desired. This simplifies upcoming work where I wanted to return a `SendSyncPtr`. * Move MemoryImageSource::map_at to mmap module This is part of the work to centralize memory management into the `mmap` module. This commit introduces a few structures which aid in that process, and starts converting one of the functions (`MemoryImageSource::map_at`) into this module. The structures introduced are: * `MemoryBase`: `RuntimeLinearMemory::base_ptr` is now `RuntimeLinearMemory::base`, which returns a `MemoryBase`. This is either a raw pointer or an `MmapOffset` as described below. * `MmapOffset`: A combination of a reference to an mmap and an offset into it. Logically represents a pointer into a mapped section of memory. In future work, we'll move more image-mapping code over to `Mmap` instances.

bjorn3 and others added 28 commits December 5, 2024 15:08

Remove the FuncEnvironment trait from wasmtime-cranelift (bytecod…

d69f0e3

…ealliance#9736) This is a leftover from the `cranelift-wasm` days and it's not necessary any more. Remove it and have an inherent impl for the trait methods instead.

pulley: Clarify documentation around CallIndirectHost (bytecodealli…

2f7b0f5

…ance#9746) Handling review comments from bytecodealliance#9665 and fully updating documentation to reflect the mid-pr design shift to the currently-landed state.

Fix minor formatting issues (bytecodealliance#9748)

66989d9

* format: fix typo * format: wrap line length * format: re-wrap comment * format: organize crate dependencies

Update to target-lexicon 0.13.0 (bytecodealliance#9752)

f247a75

* Update to target-lexicon 0.13.0 Pulling in bytecodealliance/target-lexicon#115 to lay the foundation for some big-endian work in Pulley * Fix more compile errors

winch: Minor improvements to fuel checks (bytecodealliance#9750)

7145ca5

This commits ports the suggestions made in bytecodealliance#9737 to fuel checks. Namely * Prefer the usage of `*_reg` over `*_var` * Add a couple of disassembly tests

cpp demangling check for demangling error (bytecodealliance#9756)

ad48f94

Add required shell: bash to new action (bytecodealliance#9761)

019078a

Fixing a mistake in bytecodealliance#9754 not caught til it was actually executed during a failed run on bytecodealliance#9758.

Merge with upstream

1027959

Merge with wasmfx/main

78f27b6

dhil merged commit 8f56e55 into wasmfx:main Dec 9, 2024
57 checks passed

dhil deleted the wasmfx-merge branch December 9, 2024 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with upstream #267

Merge with upstream #267

dhil commented Dec 9, 2024

Merge with upstream #267

Merge with upstream #267

Conversation

dhil commented Dec 9, 2024