forked from bytecodealliance/wasmtime
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with upstream #271
Merged
Merged
Merge with upstream #271
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ance#9794) This commit first fixed an issue with table access codegen to disable spectre mitigations on Pulley targets like how spectre is disabled for memory accesses as well. This unblocked many tests related to tables which then led to a different error about a `trapnz` with an 8-bit value not being supported. In fixing `trapnz` with 8-bit values this PR went ahead and did a general-purpose refactoring for how conditional branches are managed. Previously conditional traps and conditional branches had some duplicated logic and the goal was to unify everything. There is now a single `Cond` which represents the condition of a conditional jump which is used uniformly for all locations such as `select`, `brif`, and `trap[n]z`. This new type represents all the sorts of conditional branches that can be done in Pulley, for example integer comparisons and whether or not a register is zero. This `Cond` type has various helpers for printing it, inverting it, collecting operands, emission, etc. The end result is that it's a bit wordy to work with `Cond` right now due to the size of the variants but all locations working with conditional traps are deduplicated and now it's just repetitive logic rather than duplicated logic. Putting all of this together gets a large batch of spec tests working. I'll note that this does remove a feature where `trapnz` was turned into nothing or an unconditional trap if the argument was a constant, but that feels like an optimization perhaps best left for the middle-end rather than doing it in the backend. cc bytecodealliance#9783
Implement float-to-int bitcasts as well as the xor operation for integers. cc bytecodealliance#9783
) * cranelift: Round inline stack probes down, not up When we have `enable_probestack` turned on and set `probestack_strategy` to "inline", we have to compute how many pages of the stack we'll probe. The current implementation rounds our stack frame size up to the nearest multiple of the page size, then probes each page once. However, if our stack frame is not a multiple of the page size, that means there's a partial page at the end. It's not necessary to probe that partial page, just like it's unnecessary to probe at all if the frame is smaller than one page. Either way, any signal handler needs to be prepared for stack accesses on that last page to fault at any time during the function's execution. * Add comments explaining why we round down. * Port the round-down code to s390x too. * Update s390x expected outputs. --------- Co-authored-by: Dan Gohman <dev@sunfishcode.online>
* Use the C API as the example instead of the `wasmtime` CLI now that Cranelift can be disabled. * Update custom platform docs to talk about `signals-based-traps`.
* pulley: Get `strings.wast` test passing Needed the `imul` CLIF instruction to get implemented so 32/64-bit multiplication have now been added. cc bytecodealliance#9783 * Flag test as now passing
…ance#9803) * pulley: Fill out remaining 32/64-bit integer operations This required some extra plumbing to shepherd the precise reason why signed division trapped to Wasmtime which is done through an extra `TrapKind` side channel now added. This then additionally fixes the signed remainder interpreter function to return 0 on `MIN % -1` which is different from what Rust specifies (which is to return `None` or panic). cc bytecodealliance#9783 * Reduce some code duplication * Fix pulley tests * Fix MSRV compat
* pulley: Implement float<->int conversions Gets the `conversions.wast` test running along with a few other misc ones. cc bytecodealliance#9783 * Fix pulley's no_std build * One more conversion to a workspace dep
* Update to WASI 0.2.3 WIT files No major changes here; this just updates WIT files from WASI 0.2.2 to 0.2.3. * Update the vendor-wit.sh script to 0.2.3.
* pulley: Implement fcopysign for issue4890.wast * Fix #[no_std] issue + feedback
* pulley: Move `fp`/`lr` out of `XReg` set This commit moves the `fp` and `lr` registers out of the `XReg` register set and into the `MachineState` per-VM. These are automatically modified and read with `push_frame` and `pop_frame`. Dedicated `xmov_{fp,lr}` instructions were added for use in Wasmtime's trampolines which directly read these registers. * Fix pulley tests * Free up `spilltmp1` register Also unused in CLIF
* pulley: Implement float arithmetic operations Fill out enough to get `f32.wast` and `f64.wast` spec tests working. A minor ABI issue was discovered along the way which is also required to get a new test working on both 32 and 64-bit platforms. cc bytecodealliance#9783 * Centralize handling of float operations Add a new `wasmtime-math` crate which is tasked with providing the float math functions needed by Wasm with with, in theory, most optimal platform implementation available. * Fix riscv64 tests
* pulley: Get `call_indirect.wast` spec test working Fix a few typos here and there related to floats to get some tests passing. cc bytecodealliance#9783 * Update CLIF test expectations
This hasn't actually turned up anything in quite some time and MIRI testing is relatively slow so by default don't test MIRI on Pulley PRs (it's of course still tested on the merge queue though)
) Fill out some more misc float ops. cc bytecodealliance#9783
Fill out a lowering for CLIF's `ineg` instruction. cc bytecodealliance#9783
Gets some `embenchen_*.wast` tests passing. cc bytecodealliance#9783
Gets another `*.wast` test passing cc bytecodealliance#9783
…9818) Forcibly disable the `threads` proposal for Pulley. This is done because I don't believe there's any way that we can, in Rust, implement non-atomic loads and stores in a manner that isn't undefined behavior. Until this is available leave this proposal as flagged as unsupported.
* pulley: Initial scaffold of SIMD support This commit fills out some of the initial infrastructure necessary for supporting the SIMD proposal to WebAssembly in the Pulley interpreter, namely 128-bit simd. The `VRegVal` union has been filled out with various types, endianness questions are settled, and initial implementations of a suite of opcodes are added to get a basic set of tests working throughout the backend. cc bytecodealliance#9783 * Avoid dealing with big-endian vectors * Change wasm `global`s to store `v128` in little-endian format. * Change pulley stack loads/stores to work with vectors in little-endian format.
This commit adds the pulley targets to many of the preexisting `*.clif` runtests throughout the tree. This covers most of the MVP functionality of wasm for example and additionally exercises 8 and 16-bit lowerings for many instructions. A few minor pulley instructions were added and otherwise new 8/16-bit lowerings use existing instructions. It's expected that the 8/16-bit lowerings won't be used all that often so they're not particularly optimal at this time. Some CLIF tests were omitted such as: * Most SIMD-using CLIF tests * Float/int conversion tests using 8 and 16-bit integers * Tests with `call` instructions as relocations don't work with the JIT crate on Pulley * Tests using 128-bit integers Support for some of these tests may be enabled in the future, but for example 8/16-bit integers may not get used much.
…liance#9829) * Add an inheritance test * Only allow GCing tags we (mostly) know are safe to GC Make the conservatively correct choice the default one.
…liance#9824) It seems that this was removed in bytecodealliance#9381 The new method is really nice for controlling this at runtime, however the docs still mentioned the old behavior.
This is not directly reachable from wasm but can be created through optimizations.
* pulley: Implement `return_call` instructions This commit fleshes out the Cranelift lowerings of tail calls which gets the wasm tail call proposal itself working on Pulley. Most of the bits and pieces here were copied over from the riscv64 backend and then edited to suit Pulley. * Fix table64 addressing on 32-bit
…liance#9821) * allow customizing log prefixes for wasmtime serve command Signed-off-by: Joseph Zhang <jzhang@absolute.com> * pr feedback - use simple boolean flag instead --------- Signed-off-by: Joseph Zhang <jzhang@absolute.com>
* pulley: Implement SIMD `splat` instruction Gets a few spec tests and CLIF tests passing cc bytecodealliance#9783 * Fix typo
Fill out vector load-and-extend instructions.
Fill out some bit-related operations for vector registers.
Fill out some bitmask/test instructions for vectors.
* emit 32bit udiv * winch: aarch64 udiv/urem without extension * remove stray dbg! * fmt * remove println * fix formatting in ISLE * Sized TrapIf * move operand size into CondBrKind variant * show_reg_sized fallback
* pulley: Get `simd_conversions.wast` test working Lots of narrowing/extending/conversion-related opcodes implemented. Note that these opcodes are all in the "extended" namespace as the 1-byte namespace has started to overflow. * Fill out some TODO
…9843) Gets a number of `*.wast` tests passing.
Gets some misc wast tests passing.
* Move memories first in `VMContext` This commit shuffles the fields of `VMContext` to move memories first above all other items. This is done to ensure that the offset to memory-related fields is one of the smaller offsets and ideally fits within an 8-byte offset which helps fit into some special cases on various ISAs. For example x64 has a smaller instruction encoding for 8-bit offsets than 32-bit offsets, and I'm planning on doing a similar optimization for Pulley. * Review comments
* pulley: Add basic i128 comparison to backend Don't add special opcodes yet as this probably isn't performance critical in an interpreted context, but it should always be possible to add such opcodes in the future. * Review comments
…e#9845) * Optimize constant offset loads/stores in translation This commit implements an optimization when lowering Wasm bytecode to CLIF to skip a bounds check when the offset in memory is statically known. This comes up in C/C++/Rust when `static` memory is addressed for example where LLVM emits an `i32.const 0` as the base address and puts the address of the variable in the `offset` instruction field. This isn't necessary for 64-bit platforms but when explicit bounds-checks are required this can help to eliminate a constant-true bounds-check. * Review comments
…9852) No new pulley opcodes here, just reusing preexisting opcodes.
Pushes more into the 32-bit offset on load/store instructions.
More wast tests passing.
…#9854) * pulley: Shuffle opcodes to free some 1-byte opcodes In the near future the set of opcodes here are going to be expanded along a number of axes such as: * More modes of addressing loads/stores other than just `reg + offset32`. * Opcodes with immediate operands rather than unconditional register operands. The 1-byte opcode namespace is already filling up and there's a bit of a mishmash of what's 1-byte and what's "extended" for now. To help bridge this gap in the interim shuffle all float/vector-related opcodes into the "extended" opcode namespace. This frees up a large chunk of the 1-byte opcode namespace for future expansion with extensions like above. I'll note that I'm not 100% sure that the opcodes will all stay here after this reshuffling. I haven't done any profiling/performance analysis to gauge the impact of this change. The immediate goal is to start experimenting with non-float/vector programs and get them profiling well. This will require more "macro opcodes" such as new addressing modes and opcodes-with-immediates. These are expected to be relatively hot opcodes and thus probably want to be in the "fast" 1-byte namespace, hence the shuffling here. My plan is to in the future do a bit of an evaluation with a float/vector program and see whether it make sense to shuffle some of them into this 1-bytecode space as well. More radically it might make sense to remove the split between ops/extended ops and instead just have a 2-byte opcode space entirely. That's all left for future evaluations though. * Fix test offsets * Update test expectations
This commit extends the pulley opcode space with integer addition/subtraction where `src2` is an immediate. The goal here is to be a "sort of macro instruction" despite it not being too too macro here. This cuts down on `xconstN` instructions which both saves space in the final binary and should be slightly more optimal perf-wise due to only one dispatch being needed. In this commit the `xadd32` instruction is previously 3 bytes: one for an opcode and 2 bytes for the dst/src1/src2 binary operands. Adding a small constant to a register previously took 5 bytes where 2 bytes were needed for `xconst8 N` then 3 for the addition. Here the encoding size of the new instruction is 4 bytes: 1 for the opcode, 2 for dst/src1, and one for the immediate. This is currently chosen to mostly optimize dispatch in the interpreter loop as opposed to code size (as only a single byte is saved). In the future thought it would be possible to extend `BinaryOperands` to one operand being a 6-bit immediate to preserve the same code size. This also notably adds, for addition/subtraction, only unsigned immediates. With addition/subtraction being inverses of one another supporting signed immediates isn't necessary and helps free up another bit for packing numbers into these opcodes. This change reduces the size of `spidermonkey.cwasm` from 31M to 29M locally.
* add Default and Debug impls to SparseMap * more meaningful debug impl
This commit extends the set of opcodes to load/stores from memory with integer registers. Previously the only addressing mode supported was a base register plus a 32-bit signed immediate. This immediate frequently doesn't need 32-bits though and can often fit in a much smaller range. Looking at `spidermonkey.cwasm` a large number of loads/stores can fit within an unsigned 8-bit integer instead so this commit adds an `offset8` mode in addition to the preexisting `offset32` mode. Empirically this commit shrinks `spidermonkey.cwasm` for pulley64 from 33M to 31M. This notably, at this time, does not extend general addressing modes in Pulley nor does it extend all loads/stores. For example float/vector/big-endian loads and stores all continue to only support a 32-bit signed offset from the base pointer. This is done under the assumption that integer loads/stores dominate both performance/code-size, but this is not empirically proven just yet. Additionally at this time the choice is being made to add an opcode-per-addressing-mode rather than having a single load opcode take a general addressing mode. The assumption here is that decoding a fully general addressing mode and processing it is probably slower at runtime than specializing opcodes per addressing mode. This is currently an unproven assumption however and the cost of this is increased complexity in the Cranelift backend as it has to have many branches for all loads/stores supported.
…ce#9862) * pulley: Remove unwind metadata from Cranelift backend This is a copy/paste from the riscv64 backend originally but there's no need to integrate with native unwinders on Pulley so there's no need to track this information. This removes the `Unwind` pseudo-inst entirely. * Fix CI and review comments
* pulley: Add immediate payloads to more opcodes This commit adds immediate payloads to the following instructions: * `xmul32` - `xmul32_s8` / `xmul32_s32` * `xmul64` - `xmul64_s8` / `xmul64_s32` * `xband32` - `xband32_s8` / `xband32_s32` * `xband64` - `xband64_s8` / `xband64_s32` * `xbor32` - `xbor32_s8` / `xbor32_s32` * `xbor64` - `xbor64_s8` / `xbor64_s32` * `xbxor32` - `xbxor32_s8` / `xbxor32_s32` * `xbxor64` - `xbxor64_s8` / `xbxor64_s32` * `xshl32` - `xshl32_u6` * `xshl64` - `xshl64_u6` * `xshr32_u` - `xshl32_u_u6` * `xshr64_u` - `xshl64_u_u6` * `xshr32_s` - `xshl32_s_u6` * `xshr64_s` - `xshl64_s_u6` For shifts there's no need to have 32-bit immediates (or even 8-bit) since 6 bits is enough to encode all the immediates. This means that the 6-bit immediate is packed within `BinaryOperands` as a new `U6` type. This commit unfortunately does not shrink `spidermonkey.cwasm` significantly beyond the prior 29M. This is nevertheless expected to be relatively important for performance. * Fix test expectations
) This commit adds a large number of new `br_if_x*` instructions which compare with an immediate instead of comparing two registers. This is pretty common in wasm/compiled code where, for example, loop upper bounds are often constants. This helps compress code slightly while fusing more instructions together. The main cost of this is that the number of opcodes added here is quite large. Like with previous immediate-taking opcodes both 8 and 32-bit variants of immediates are added for all comparisons. Additionally unlike the previous set of branch-and-compare instructions it's required to add instructions for `>` and `>=` because the operands cannot be swapped to invert the condition, further increasing the number of opcodes added. This is a mild size reduction on `spidermonkey.cwasm` from 29M to 28M but it's mostly expected to be a performance win for interpreted loops.
…dealliance#9864) * pulley: Add macro instructions for function prologue/epilogue This commit adds two new instructions to Pulley to combine the operations of setting up a frame, allocating stack, and saving clobbered registers. This is all combined into a single instruction which is relatively large but is much smaller than each of these individual operations exploded out. This is a size win on `spidermonkey.cwasm` by about 1M and locally in a small `fib.wat` test this is also a good speedup by reducing the number of instructions executed. * Review comments and update test expectations
Filling out some more miscellaneous `*.wast` tests
* Test DWARF with --release This would have caught the issue in CI. It is also closer to what the end user will use. * [prtest:debug] Use volatile for keep-alive * Drop __attribute__((retain)) * Run tests in debug as well * Remove debug testing; add comment
* rename put_non_zero_in_reg to put_nonzero_in_reg_maybe_zext * handle 32bit sdiv * i32 rem_u/s * fix tests * fix rem * fmt * improve load_constant_full * fix tests, and review edits
This commit fixes a few mistakes that were introduced in bytecodealliance#9863. Specifically when lowering `Cond.If...` and the arguments needed swapping the condition was also inverted by accident. More `*.clif` runtests were added to catch this case and expose it. Additionally Pulley now has lowering for all the `FloatCC` orderings to be able to run the `select.clif` runtest which primarily exposed the issue.
* pulley: Implement some float simd ops Gets a few more wast tests passing * Enable some cranelift runtests
Refactor some existing usage of `pulley_xconst*` instructions to instead use instructions-taking-immediates instead now that they've been added to Pulley.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.