Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with upstream #267

Merged
merged 28 commits into from
Dec 9, 2024
Merged

Merge with upstream #267

merged 28 commits into from
Dec 9, 2024

Conversation

dhil
Copy link
Member

@dhil dhil commented Dec 9, 2024

No description provided.

bjorn3 and others added 28 commits December 5, 2024 15:08
…lliance#9742)

* Use a single section name for all subsections

This reduces the size of the object file as identical section names get
deduplicated.

* Add a per_function_section equivalent for data objects

On cg_clif's simple-raytracer benchmark this reduces the executable's
size by 1MB from ~10MB to ~9MB. It does make lld slower though seemingly
unlike per_function_sections, so keeping this as separate flag is
beneficial for cg_clif.

* Add link to PR
* cargo update -p futures-util

updating because: `warning: package `futures-util v0.3.27` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked`

* cargo update -p hermit-abi

warning: package `hermit-abi v0.3.0` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked

* cargo update -p iana-time-zone

updated because: `warning: package `iana-time-zone v0.1.59` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked`

* cargo update -p spin

updated because `warning: package `spin v0.9.4` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked`

* cargo vet trust --all alexcrichton --allow-multiple-publishers

* perform vets. exempt hermit-abi, because we don't build for that platform.
* Shuffle fields in `VMRuntimeLimits`

Right now this structure has a pointer-sized field, two 64-bit integers,
and then three pointer-sized fields. This structure's layout is
nontrivial to calculate on 32-bit platforms as it needs to take the
alignment of 64-bit integers into account which differs between ARM and
x86 for example. To make this easier shuffle all the 64-bit integers are now
first, which means we don't have to worry about alignment.

I'll note that this particular ordering is still a bit brittle because
we might need to shuffle things again if more fields are added. That
being said any misalignment is caught during testing of the `wasmtime`
crate so there's not much danger in adding more things, it'll just
require updating a few more locations.

* Update test expectations
…ealliance#9736)

This is a leftover from the `cranelift-wasm` days and it's not necessary
any more. Remove it and have an inherent impl for the trait methods
instead.
…nce#9735)

* pulley: Track faulting opcode in stack overflow better

This commit updates the pulley `Encode` trait to have a `WIDTH`
associated with it to be able to calculate the size of an instruction by
name rather than hard-coding instruction details in multiple locations.
This constant is procedurally generated per-instruction given the
definition of the instruction. This for now assumes there are no
variable-width instructions.

* Fix tests

* Fix feature'd build
)

* pulley: Implement sign-extension cranelift helper

The `uextend` and `sextend` lowerings aren't done yet but this should
help fill out ABI bits instead of panicking. Example tests are added
showing what support is added here.

* Review comments
This commit updates how the `signals_based_traps` setting is read for
various instruction generation during the wasm-to-CLIF translation
phase. Specifically this splits the `signals_based_traps` configuration
into one for "general CLIF instructions can trap" and another for
"memory traps are allowed". These settings are based on
`signals_based_traps` but then differ for Pulley. For example Pulley
allows general instructions to trap since that just halts the
interpreter, but memory-related instructions can't trap since segfaults
aren't caught during the execution of Pulley. These settings then take
Pulley into account to assist in translating Pulley compatibly by
default.
This commit is the equivalent of bytecodealliance#9675 for libcalls used in core wasm.
All libcalls now communicate whether or not they trapped through their
return value instead of implicitly calling `longjmp` to exit from the
libcall. This is to make integration with Pulley easier to avoid the
need to `longjmp` over Pulley execution.

Libcall definitions have changed where appropriate and the
`catch_unwind_and_record_trap` function introduced in bytecodealliance#9675 was
refactored to better support multiple types of values being returned
from libcalls (instead of just `Result<()>`).

Note that changes have been made to both the Cranelift translation layer
and the Winch translation layer for this as the ABI of various libcalls
are all changing.
…lliance#9741)

* Default to Pulley on cranelift/winch-unsupported platforms

This commit updates the selection of the default target used by a
`Config` to take into account the host architecture and possibly use
Pulley instead of the host target itself. The goal here is that when a
target isn't explicitly configured then Wasmtime is tasked with picking
a reasonable default to execute code with. If neither Winch nor
Cranelift has any support then the only way to possibly execute code
would be with Pulley, so in these situations pulley becomes the default
target.

The goal of this change is to make testing easier on 32-bit platforms
where there is no compiler support. This means we won't have to update
all tests to explicitly configure pulley on unsupported platforms.
Additionally this means that an eventual `wasmtime` executable for
32-bit platforms will use Pulley by default and won't need any extra
configuration.

* Fix miri tunables
…iance#9743)

* Enumerate all host calls in `wasmtime_environ::HostCall`

This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.

* Rename `max` to `len`
…ance#9746)

Handling review comments from bytecodealliance#9665 and fully updating documentation to
reflect the mid-pr design shift to the currently-landed state.
* format: fix typo

* format: wrap line length

* format: re-wrap comment

* format: organize crate dependencies
…nce#9749)

* pulley: Expand endianness documentation of `XRegUnion`

Just had some discussion with Nick about this that felt best to
immortalize in documentation rather than just having it in our heads.

* Apply suggestions from code review

Co-authored-by: Andrew Brown <andrew.brown@intel.com>

---------

Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* winch: Epoch-based interruption

Closes bytecodealliance#8091

This commit introduces support for epoch interruption to Winch. The
heuristics around epoch check emission are identical to Cranelift's
except for the fact that the current implementation doesn't introduce
a local-based cache for the current epoch deadline. This is an
intentional decision given Winch's focus on compilation performance.
However, if needed in the future, knobs could be introduced to
optionally introduce a local cache at the cost of reduced compilation
performance.

* Add disas tests for x64/epoch

* Refactor to `maybe_emit_epoch_check`

* Generate expected output for disas
* Update to target-lexicon 0.13.0

Pulling in bytecodealliance/target-lexicon#115 to lay the foundation for
some big-endian work in Pulley

* Fix more compile errors
* Don't cancel workflows on Windows runners

Our CI has configuration such that on failure of any job it cancels the
entire workflow run. This means that if a failure is quickly encountered
it doesn't wait for the entire workflow to finish just to report the
failure, clearing out the queue of failing PRs faster. This doesn't work
well on Windows runners though where often the cancellation and marking
the job as failed race. The winner of the race is often the cancellation
which means that there's no listed failure in 100+ jobs which can be
frustrating.

Instead centralize this cancellation logic in one helper and
additionally add a condition where it doesn't cancel on Windows
platforms. That means that failures on windows won't be fail-fast, but
that's sort of the best that we can do for now.

* Fix syntax
* Use `runner.os` in CI configuration more

Instead of using `matrix.os` which can change over time as images change
instead use `runner.os` to both name jobs and have conditions on jobs.
This couples it more closely to what's intended, the choice of OS,
rather than image version.

* Not available for names
This commits ports the suggestions made in bytecodealliance#9737
to fuel checks.

Namely

* Prefer the usage of `*_reg` over `*_var`
* Add a couple of disassembly tests
* winch: Track the callee calling convention

This commit aims to improve the consistency of call emission in Winch.

Prior to this commit, the calling convention at Winch's assembler layer
was hardcoded. Even though the calling convention invariants are
correctly handled early on in the code generation process and this has
no effect on the generated code, it could lead to subtle bugs if
Cranelift's emission infrastructure changes. It could also be confusing
when trying to implement call instrutions for other backends.

This change is motivated by some of the questions in bytecodealliance#9751

* Clippy fixes
* Support executing Pulley in Wasmtime

This commit is the initial implementation of executing the Pulley
interpreter from the `wasmtime` crate. This gives access to all of the
`wasmtime` crate's runtime APIs backed by execution of bytecode in
Pulley. This builds on the previous PRs I've been making for support in
Pulley to culminate in testing on CI in this PR. This PR handles some
final tidbits related to producing a runnable image that can be
interpreted by the `wasmtime` crate such as:

* Pulley compilation artifacts are no longer marked as natively
  executable, just read-only.
* Pulley compilation artifacts include wasm-to-array trampolines like
  normal platforms (removes a pulley special-case).
* Dispatch of host calls from Pulley to the Wasmtime runtime are
  implemented.
* Pulley's list of panicking wasm features is slimmed down as most are
  covered by "this lowering isn't supported" errors.
* Execution of wasm code now has an `if` to see whether Pulley is
  enabled within a `Store` or not.
* Traps and basic "unwinding" of the pulley stack are implemented (e.g.
  a "pulley version" of `setjmp` and `longjmp`, sort of)
* Halting the interpreter has been refactored to help shrink the size of
  `ControlFlow<Done>` and handle metadata with each done state.

Some minor refactorings are also included here and there along with a
few fixes here and there necessary to get tests passing.

The next major part of this commit is updates to our `wast` test suite
and executing all `*.wast` files. Pulley is now executed by default for
all files as a new execution engine. This means that all platforms in CI
will start executing Pulley tests. At this time almost all tests are
flagged as "expected to fail" but there are a small handful of
allow-listed tests which are expected to pass. This exact list will
change over time as CLIF lowerings are implemented and the interpreter
is extended.

Follow-up PRs will extend the testing strategy further such as:

* Extending `#[wasmtime_test]` to include Pulley in addition to Winch.
* Getting testing set up on CI for 32-bit platforms.

prtest:full

* Fix pulley fuzz build

* Fix clippy lints

* Shuffle around some `#[cfg]`'d code

* Remove unused imports

* Update feature sets testing MIRI

Enable pulley for wasmtime/wasmtime-cli and also enable all features for
wasmtime-environ

* Round up pulley's page size to 64k

* Skip pulley tests on s390x for now

* Add a safety rail for matching a pulley target to the host

* Fix more pulley tests on s390x

* Review comments

* Fix fuzz build
This commit extends our CI for i686 and armv7 to test the Pulley
backend, namely the full `*.wast` test suite as well as the `wasmtime`
crate itself. Note that many `*.wast` tests are still expected to fail
at this time.

This involved fixing a number of 32-vs-64 bit issues throughout the test
suite in various location in this commit.
* Implement custom stack unwinding for Pulley

The pulley calling convention may not match the native calling
convention, which is the case for s390x, so the unwinding that Wasmtime
does needs to be parameterized over what's being unwound. This commit
adds a new `Unwind` trait with various methods behind it that the
backtrace implementation is then updated to use.

* Fix alignment check for 32-bit
Fixing a mistake in bytecodealliance#9754 not caught til it was actually executed during
a failed run on bytecodealliance#9758.
* pulley: Support big-endian targets

This commit fixes the Pulley interpreter and Cranelift backend to
properly support big-endian targets. The problem beforehand was that
loads/stores in Pulley were defined as the native endianness which means
that big and little-endian platforms differed. Additionally the
previously set of understood pulley targets were implicitly always
little-endian which was not appropriate for a big-endian host where JIT
data structures are stored in big-endian format.

This commit fixes all of these issues with a few changes:

* Pulley loads/stores are always little-endian now.
* Pulley now has bswap32/bswap64 instructions.
* Wasmtime/Cranelift now understand big-endian pulley targets (e.g.
  `pulley32be`).
* CLIF translation of loads/stores now properly handles the endianness
  flags on `MemFlags`.

In the future if necessary we could natively add a macro-op for
big-endian loads/stores to Pulley but for now it's more minimal to just
have `bswap32` and `bswap64`.

The result of this commit is that Pulley tests are now run and executed
on s390x like all other platforms.

* Fix fuzz build

* Review comments

* Fix fuzz build
* Simplify mmap interface slightly

Return a single `SendSyncPtr` -- the platform-independent context converts to
the various raw pointer types as desired. This simplifies upcoming work where I
wanted to return a `SendSyncPtr`.

* Move MemoryImageSource::map_at to mmap module

This is part of the work to centralize memory management into the `mmap`
module. This commit introduces a few structures which aid in that process, and
starts converting one of the functions (`MemoryImageSource::map_at`) into this
module.

The structures introduced are:

* `MemoryBase`: `RuntimeLinearMemory::base_ptr` is now
  `RuntimeLinearMemory::base`, which returns a `MemoryBase`. This is either a
  raw pointer or an `MmapOffset` as described below.

* `MmapOffset`: A combination of a reference to an mmap and an offset into it.
  Logically represents a pointer into a mapped section of memory.

In future work, we'll move more image-mapping code over to `Mmap` instances.
@dhil dhil merged commit 8f56e55 into wasmfx:main Dec 9, 2024
57 checks passed
@dhil dhil deleted the wasmfx-merge branch December 9, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants