cranelift: Add Bswap instruction (#1092) #5147

11evan · 2022-10-27T18:52:30Z

Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now:

builder.ins().bswap(value)

to get a native byteswap instruction.

x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register.

As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8.

aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate.
s390x: Bswap not implemented
For completeness, added bswap to the interpreter as well.

cfallin

Thanks -- this is really nice to have!

I have a few style comments below but nothing crucial.

At the top level, a little more testing might be nice too. Can we have at least a runtest (cranelift/filetests/filetests/runtests/bswap.clif maybe) with some interesting value cases, and the appropriate targets (test run with x86-64, aarch64, and test interpret)?

cfallin · 2022-10-27T19:09:27Z

cranelift/codegen/src/isa/aarch64/inst/emit_tests.rs

+        // expect:
+        // 1101_1010_110 | 0_0000_0000_01 | 01_011 | 0_0010
+        // which is little endian:
+        // 0110_0010_0000_0101_1100_0000_1101_1010


The little-endian bit literal here is kind of confusing (it's little-endian but MSB first); if we need an explanatory comment can we convert the above to a u32 (0xdac00562) or maybe just remove this comment?

These comments were just to help me out when writing the test cases; I've removed them now

cfallin · 2022-10-27T19:10:56Z

cranelift/codegen/src/isa/aarch64/inst/emit_tests.rs

+        // expect:
+        // 1101_1010_110 | 0_0000_0000_11 | 01_010 | 0_0001
+        // which is little endian:
+        // 0100_0001_0000_1101_1100_0000_1101_1010


Likewise here as above

cfallin · 2022-10-27T19:12:37Z

cranelift/codegen/src/isa/x64/inst/emit.rs

+
+            // BSWAP reg32 is (REX.W==0) 0F C8
+            // BSWAP reg64 is (REX.W==1) 0F C8
+            let mut rex = match size {


Can we use RexFlags here?

Yeah good idea, changed to use RexFlags in update, and gave it a new emit_one_op fn to go along with the existing emit_two_op and emit_three_op

cfallin · 2022-10-27T19:13:42Z

cranelift/interpreter/src/value.rs

@@ -86,6 +86,9 @@ pub trait Value: Clone + From<DataValue> {
    fn leading_zeros(self) -> ValueResult<Self>;
    fn trailing_zeros(self) -> ValueResult<Self>;
    fn reverse_bits(self) -> ValueResult<Self>;
+
+    // Byteswap


No need for the blank line and comment here, I think...

jameysharp

Nice work! This is impressively thorough.

I notice there aren't any lowerings for 128-bit bswap. They'd be relatively easy to write (bswap each 64-bit half, and also swap the two registers), but it's also okay to not support that yet.

It would be best to also add tests in cranelift/filetests/filetests/isa/ and cranelift/filetests/filetests/runtests/ that cover this new instruction. But again, I think we could merge this without those.

jameysharp · 2022-10-27T19:22:29Z

cranelift/codegen/meta/src/shared/instructions.rs

+        "#,
+            &formats.unary,
+        )
+        .operands_in(vec![x])


I'm wondering if we should exclude 8-bit integers for this instruction. There's no sensible implementation of byte-swapping when there's only one byte.

Good idea, excluded 8-bit and also 128-bit since the latter is unimplemented for now

uweigand · 2022-10-27T19:52:39Z

s390x: Bswap not implemented

We do have bswap instructions, and they're already implemented in the backend: for $I32 and $I64, bswap_reg can be used; for $I16, a 32-bit bswap followed by a 16-bit right shift is generally the most efficient solution.

See e.g. the use as part of the bitrev implementation:

(rule (bitrev_bytes $I16 x) (lshr_imm $I32 (bswap_reg $I32 x) 16))
(rule (bitrev_bytes $I32 x) (bswap_reg $I32 x))
(rule (bitrev_bytes $I64 x) (bswap_reg $I64 x))

afonso360 · 2022-10-27T20:36:12Z

This is awesome! Thanks! 🎉

I added this to our fuzzer (Feel free to add that to this PR if you want!) and It's been running for over an hour on aarch64 without any issues, however on x86 it reported an interesting failure.

I've tried to minimize this as much as I could but it sometimes throws weird results.

test interpret
test run
target x86_64

function %a() -> i8, i32, i64 {
block0:
    v5 = iconst.i64 0x9903_5204_d05f_abab
    v6 = bswap v5
    v7 = iconst.i8 0
    v8 = iconst.i32 0
    return v7, v8, v6
}

; run: %a() == [0, 0, -6076657925176032359]

You can run this testcase from the cranelift directory with cargo run -- test ./the-above.clif

Removing the v7 or v8 makes the test pass, which is interesting... but otherwise the test segfaults on my machine, most of the time but not always.

Here is the disassembly, which you can get with cargo run -- compile -D --target x86_64 ./the-above.clif:

Disassembly of 29 bytes:
   0:   55                      push    rbp
   1:   48 89 e5                mov     rbp, rsp
   4:   49 bb ab ab 5f d0 04 52 03 99
                                movabs  r11, 0x99035204d05fabab
   e:   4c 0f cb                bswap   rbx
  11:   31 c0                   xor     eax, eax
  13:   31 d2                   xor     edx, edx
  15:   4c 89 1f                mov     qword ptr [rdi], r11
  18:   48 89 ec                mov     rsp, rbp
  1b:   5d                      pop     rbp
  1c:   c3                      ret

Here's a more reliable example that *always* fails with a wrong result, but never segfaults.

test interpret
test run
target x86_64

function %a(f32, f64, i32, i32, f64) -> i8, i32, i64 {
block0(v0: f32, v1: f64, v2: i32, v3: i32, v4: f64):
    v5 = iconst.i64 0x9903_5204_d05f_abab
    v6 = bswap v5
    v7 = iconst.i8 0
    v8 = iconst.i32 0
    return v7, v8, v6
}

; run: %a(0.0, 0.0, 0, 0, 0.0) == [0, 0, -6076657925176032359]

Here's the disassembly of the above function

Disassembly of 32 bytes:
   0:   55                      push    rbp
   1:   48 89 e5                mov     rbp, rsp
   4:   49 89 d1                mov     r9, rdx
   7:   49 b8 ab ab 5f d0 04 52 03 99
                                movabs  r8, 0x99035204d05fabab
  11:   4c 0f c8                bswap   rax
  14:   31 c0                   xor     eax, eax
  16:   31 d2                   xor     edx, edx
  18:   4d 89 01                mov     qword ptr [r9], r8
  1b:   48 89 ec                mov     rsp, rbp
  1e:   5d                      pop     rbp
  1f:   c3                      ret

11evan · 2022-10-27T20:50:00Z

Thanks for the feedback everyone! I'll work on addressing everything, including adding runtests and investigating the fuzzer failure

11evan · 2022-10-28T02:20:48Z

The x86_64 issue the fuzzer found was an incorrect REX encoding - I misunderstood the manual and was using REX.R instead of REX.B when encoding bswap access to r8-r15. Fixed locally and added those cases to the runtests, they pass now.

I'll have an update out sometime in the next couple days once I address the other feedback and add s390x support

11evan · 2022-10-28T23:36:15Z

PR updated:

fixed REX encoding bug found by fuzzer
added runtests
added s390x support
other cleanups from feedback (remove extraneous comments, exclude 8-bit integers, use x64 RexFlags type, add bswap to fuzzgen)

jameysharp · 2022-10-28T23:48:02Z

cranelift/codegen/meta/src/shared/instructions.rs

+    let iSwappable = &TypeVar::new(
+        "iSwappable",
+        "A multi byte scalar integer type",
+        TypeSetBuilder::new().ints(16..64).build(),


Thanks for removing the 8-bit case! We've generally kept support for types which make sense even if there's no backend support, though, so I would put the 128-bit case back in. If nothing else, that lets us have a runtests/i128-bswap.clif that validates the interpreter (test interpret) even though we can't yet do test run on any target.

There are quite a few instructions that don't have 128-bit support on some backends yet, but it's good to have things in place to show what those instructions are expected to do for whenever somebody gets around to implementing them.

Ok sounds good! Added back 128-bit and created an interpreter test. I didn't actually implement any 128-bit swaps in the backends, left that for the future.

afonso360

LGTM with the 128 version added!

I ran this through the fuzzer this morning and no further issues popped up! 🥳

Edit: To clarify, I don't think we need the i128 version now, we can add it in a later PR but we should legalize the instruction.

uweigand · 2022-10-29T14:30:14Z

s390x parts LGTM, thanks for adding this. (Seeing the discussion above, if consensus is to add I128 support, you can likewise copy the 128-bit byteswap implementation via vector permute from bitrev.)

Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now: ``` builder.ins().bswap(value) ``` to get a native byteswap instruction. * x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register. As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8. Updated x64 RexFlags type to support emitting for single-operand instructions like bswap * aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate. * s390x: Bswap was already supported in backend, just had to add a bit of plumbing * For completeness, added bswap to the interpreter as well. * added filetests and runtests for each ISA * added bswap to fuzzgen, thanks to afonso360 for the code there * 128-bit swaps are not yet implemented, that can be done later

11evan · 2022-10-31T16:37:06Z

PR Updated - legalized 128-bit bswap, added runtest for interpreter. Didn't actually implement 128-bit swap in any of the backends yet.

jameysharp

Perfect! Thanks so much for this.

11evan · 2022-10-31T18:24:13Z

Looks like the CI run failed at verify-publish with:

Run cd /opt/hostedtoolcache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
[...]
100   212  100   212    0     0     40      0  0:00:05  0:00:05 --:--:--    54

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error: Process completed with exit code 2.

It looks like an issue with the tooling rather than an issue with my commit - what should I do here, does it just need a retry?

jameysharp · 2022-10-31T19:30:49Z

Apparently it did just need a retry. Merged now! 🎉

11evan mentioned this pull request Oct 27, 2022

Add bswap instruction #1092

Closed

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. labels Oct 27, 2022

cfallin reviewed Oct 27, 2022

View reviewed changes

jameysharp reviewed Oct 27, 2022

View reviewed changes

11evan force-pushed the bswap branch 2 times, most recently from 66833f7 to 7f2a823 Compare October 28, 2022 23:32

jameysharp reviewed Oct 28, 2022

View reviewed changes

afonso360 approved these changes Oct 29, 2022

View reviewed changes

11evan force-pushed the bswap branch from 7f2a823 to a284a24 Compare October 31, 2022 16:20

jameysharp approved these changes Oct 31, 2022

View reviewed changes

jameysharp enabled auto-merge (squash) October 31, 2022 17:28

jameysharp merged commit 4ca9e82 into bytecodealliance:main Oct 31, 2022

11evan deleted the bswap branch November 4, 2022 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift: Add Bswap instruction (#1092) #5147

cranelift: Add Bswap instruction (#1092) #5147

11evan commented Oct 27, 2022

cfallin left a comment

cfallin Oct 27, 2022

11evan Oct 28, 2022

cfallin Oct 27, 2022

cfallin Oct 27, 2022

11evan Oct 28, 2022

cfallin Oct 27, 2022

jameysharp left a comment

jameysharp Oct 27, 2022

11evan Oct 28, 2022

uweigand commented Oct 27, 2022

afonso360 commented Oct 27, 2022

11evan commented Oct 27, 2022

11evan commented Oct 28, 2022 •

edited

Loading

11evan commented Oct 28, 2022

jameysharp Oct 28, 2022

11evan Oct 31, 2022

afonso360 left a comment •

edited

Loading

uweigand commented Oct 29, 2022

11evan commented Oct 31, 2022

jameysharp left a comment

11evan commented Oct 31, 2022

jameysharp commented Oct 31, 2022

cranelift: Add Bswap instruction (#1092) #5147

cranelift: Add Bswap instruction (#1092) #5147

Conversation

11evan commented Oct 27, 2022

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameysharp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand commented Oct 27, 2022

afonso360 commented Oct 27, 2022

11evan commented Oct 27, 2022

11evan commented Oct 28, 2022 • edited Loading

11evan commented Oct 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afonso360 left a comment • edited Loading

Choose a reason for hiding this comment

uweigand commented Oct 29, 2022

11evan commented Oct 31, 2022

jameysharp left a comment

Choose a reason for hiding this comment

11evan commented Oct 31, 2022

jameysharp commented Oct 31, 2022

11evan commented Oct 28, 2022 •

edited

Loading

afonso360 left a comment •

edited

Loading