-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support WebAssembly memory instructions on big-endian platforms #2124
Comments
@cfallin @julian-seward1 - opened issue as discussed in our call today. |
See the PR I just posted. This is not necessarily ready to go in as-is, it is intended to showcase a possible implementation. The PR implements a variant of method (C) above, where I'm creating just two new Cranelift IR opcodes, LoadRev and StoreRev - load reverse and store reverse. (These map directly to instructions IBM Z has, and a number of other platforms have as well.) Note that there are no extending load / truncating store versions of those (I'm not aware of any ISA that directly has them, and in any case any new back-end could just match the combination if necessary). Likewise, there are no _complex versions, those are no longer really used with new back-end anyway. |
Thanks for exploring the design space and laying out our choices @uweigand. I would personally be in favor of alternative A. I think having clif's semantics defined irrespective of the current target is valuable and will make hacking on the compiler easier. (I think B is also a strong contender, but A will be easier from an engineering perspective, since we can leverage compiler errors to automatically tell us exactly which code needs updates, rather than manually hunting down places where mem flags are silently dropped and/or created with defaults.) Although I don't strongly hold this opinion, I'm leaning towards only discriminating between big and little endian, not including a native endian option.
We can expose the target's endianness relatively easily, since our https://docs.rs/target-lexicon/0.11.1/target_lexicon/enum.Endianness.html |
+1 to the combination of two choices:
In other words we want existing users of the builder API to continue to see the same semantics, and new builder methods can take an |
I've made an initial attempt at implementing something along those lines here: https://github.com/uweigand/wasmtime/pull/1 Note that this is currently not ready to even attempt merging upstream, in particular because the addition of the extra endianness member to various clif instructions makes the InstructionData structure exceed its target size of 16 bytes. So either we find some way to encode InstructionData more efficiently (but that would need advice from somebody much more familiar with Rust than I am ...), we accept the (significantly) memory consumption, or else we go back and try something else. However, adding the explicit endianness member also makes the code more verbose, if you look at the patch above you'll see a large number of mostly "boilerplate" changes. As an aside, I have not been able to implement this suggestion:
as is (or maybe I misunderstood it). The builders that are used by most of the code are the ones automatically generated by the "meta" machinery, and not only did I not find an easy way to provide a default value for any member, but the builders don't seem to even have the required information (they don't have access to the TargetIsa or triple or anything else to determine the default target endianness, unless I'm missing something). That said, changing the callers to provide an explicit endianness (usually retrieved from the TargetIsa) was mostly straightforward. As a result of those issues with adding and explicit endianness member, I've been thinking of maybe going back to option B and encoding endianness with the MemFlags. This solves the InstructionData size problem, and also removes most of the boilerplate code changes. I was originally sceptically about option B due to the potential issue of introducing errors by accidentally dropping MemFlags during optimizations. However, I think this problem can be fixed: all builders of load/store instructions already make it mandatory to provide a MemFlags value. This can be either copied from another instruction, or else created afresh by starting from MemFlags::new() or MemFlags::trusted(). Now, if we simply added a mandatory endianness argument to all such MemFlags constructors, then we could easily enforce that every copy of MemFlags always contains target endianness information, and thus this cannot be silently dropped. I'll see if I can come up with another implementation along those lines. Finally, I'd appreciate any further comments on the patch above. In particular, there are some user-visible changes to the textual representation of clif instructions - please let me know if this approach looks good:
|
This sounds very reasonable to me.
This sounds good to me. |
Here's an updated patch set along those lines: https://github.com/uweigand/wasmtime/pull/2 I'm still not completely happy, in particular because of the following two issues:
Comment on these questions as well as general comments on the approach are welcome - thanks! |
Hmm, yeah, I think we may want to reconsider this. I had refactored things away from attaching a In particular, the endianness flag at the CLIF level should translate to a different instruction sequence at the VCode level on most machines (excepting only a hypothetical machine that has a "swap bytes" bit on every load instruction). So it would be suboptimal to be able to represent meaningless flags at the VCode level. An alternative we had considered was to instead make the trap-ness part of a new "instruction metadata", along with the safepoint bit, that lives alongside the So I might suggest the following: (i) create a That's a large enough refactor that, for now, I think your PR's approach is fine (keep
Yes, these should also carry |
Thanks @cfallin, I've now updated the patch to carry a MemFlags in GlobalValueData::Load and left the MemFlags on machine instructions for now. The updated patch is now posted as PR #2488 above. |
Fixed with #2492. |
WebAssembly memory operations are by definition little-endian even on big-endian target platforms. However, other memory accesses will require native target endianness (e.g. to access parts of the VMContext that is also accessed by VM native code). This means on big-endian targets, the code generator will have to handle both little- and big-endian memory accesses. However, there is currently no way to encode that distinction into the Cranelift IR that describes memory accesses. This patch provides such a way by adding an (optional) explicit endianness marker to an instance of MemFlags. Since each Cranelift IR instruction that describes memory accesses already has an instance of MemFlags attached, this can now be used to provide endianness information. Note that by default, memory accesses will continue to use the native target ISA endianness. To override this to specify an explicit endianness, a MemFlags value that was built using the set_endianness routine must be used. This patch does so for accesses that implement WebAssembly memory operations. This patch addresses issue bytecodealliance#2124.
Feature
Current wasmtime maps the WebAssembly memory instructions (t.load, t.store etc.) directly to Cranelift IR memory instructions (load, store, uloadN, etc.).
This causes problems on big-endian platforms, because the Cranelift IR instruction are implemented as native load and store instructions using the machine byte order, while the WebAssembly memory instructions are specified to use little-endian byte order always.
Now, I initially thought that one way to solve this problem could be to treat Cranelift IR memory instructions also as always little-endian by specification. However, that does not work, because there are many other uses of these instructions that do require native byte order.
Some examples of those include:
Memory accesses added by platform ABI code (implicit pointers for argument or return values), in particular if this needs to be compatible with native code.
Memory accesses to values prepared by trampoline code at the boundaries of VM native code and JITted code.
Memory accesses to parts of the VMContext that is also accessed by VM native code.
In addition, there are cases where -while not strictly necessary for correctness- it is preferable for performance reasons to use native byte order, e.g. for spill code, for accessing variables on the stack, when implementing code such as inlined copies for small memcpy etc.
So, I believe we need some way of representing both always-little-endian memory operations (used to translate
the WebAssembly instructions), and native memory operation (used for everything else).
Benefit
Enabling support for Wasmtime on big-endian platforms like IBM Z.
Implementation
My current implementation of this approach simply duplicates all Cranelift IR memory instructions to create always-LE versions. So in addition to "load" there is "load_le" etc. The full list is:
load_le
load_le_complex
store_le
store_le_complex
uload16_le
uload16_le_complex
sload16_le
sload16_le_complex
istore16_le
istore16_le_complex
uload32_le
uload32_le_complex
sload32_le
sload32_le_complex
istore32_le
istore32_le_complex
Advantages of this approach include:
But there are disadvantages:
Alternatives
There's various alternative ways this could be implemented:
A) Add an additional flag argument to load/store instructions that
specifies the requested byte order. A detail question is whether the flag is
little-endian vs. native
little-endian vs. big-endian
little-endian vs. big-endian vs. native
Advantages:
Disadvantages:
B) Add an additional flag bit to the existing MemFlags
Advantages:
Disadvantages:
C) Open-code the conversion in the WebAssembly translator
Only emit a plain "load" if the target is little-endian. Otherwise emit a load followed by a byte-swap (possibly followed by an extension). Vice-versa for stores. This would probably require addition of a new "bswap" Cranelift IR instruction, unless we want to open-code bswap itself as well (possible, but a bit tedious).
Advantages:
Disadvantages:
The text was updated successfully, but these errors were encountered: