-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift: Add Bswap instruction (#1092) #5147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- this is really nice to have!
I have a few style comments below but nothing crucial.
At the top level, a little more testing might be nice too. Can we have at least a runtest (cranelift/filetests/filetests/runtests/bswap.clif
maybe) with some interesting value cases, and the appropriate targets (test run
with x86-64, aarch64, and test interpret
)?
// expect: | ||
// 1101_1010_110 | 0_0000_0000_01 | 01_011 | 0_0010 | ||
// which is little endian: | ||
// 0110_0010_0000_0101_1100_0000_1101_1010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The little-endian bit literal here is kind of confusing (it's little-endian but MSB first); if we need an explanatory comment can we convert the above to a u32 (0xdac00562
) or maybe just remove this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments were just to help me out when writing the test cases; I've removed them now
// expect: | ||
// 1101_1010_110 | 0_0000_0000_11 | 01_010 | 0_0001 | ||
// which is little endian: | ||
// 0100_0001_0000_1101_1100_0000_1101_1010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here as above
|
||
// BSWAP reg32 is (REX.W==0) 0F C8 | ||
// BSWAP reg64 is (REX.W==1) 0F C8 | ||
let mut rex = match size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use RexFlags
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good idea, changed to use RexFlags
in update, and gave it a new emit_one_op
fn to go along with the existing emit_two_op
and emit_three_op
cranelift/interpreter/src/value.rs
Outdated
@@ -86,6 +86,9 @@ pub trait Value: Clone + From<DataValue> { | |||
fn leading_zeros(self) -> ValueResult<Self>; | |||
fn trailing_zeros(self) -> ValueResult<Self>; | |||
fn reverse_bits(self) -> ValueResult<Self>; | |||
|
|||
// Byteswap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the blank line and comment here, I think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This is impressively thorough.
I notice there aren't any lowerings for 128-bit bswap. They'd be relatively easy to write (bswap each 64-bit half, and also swap the two registers), but it's also okay to not support that yet.
It would be best to also add tests in cranelift/filetests/filetests/isa/
and cranelift/filetests/filetests/runtests/
that cover this new instruction. But again, I think we could merge this without those.
"#, | ||
&formats.unary, | ||
) | ||
.operands_in(vec![x]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should exclude 8-bit integers for this instruction. There's no sensible implementation of byte-swapping when there's only one byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, excluded 8-bit and also 128-bit since the latter is unimplemented for now
We do have bswap instructions, and they're already implemented in the backend: for See e.g. the use as part of the
|
This is awesome! Thanks! 🎉 I added this to our fuzzer (Feel free to add that to this PR if you want!) and It's been running for over an hour on aarch64 without any issues, however on x86 it reported an interesting failure. I've tried to minimize this as much as I could but it sometimes throws weird results.
You can run this testcase from the cranelift directory with Removing the v7 or v8 makes the test pass, which is interesting... but otherwise the test segfaults on my machine, most of the time but not always. Here is the disassembly, which you can get with
Here's a more reliable example that *always* fails with a wrong result, but never segfaults.
Here's the disassembly of the above function
|
Thanks for the feedback everyone! I'll work on addressing everything, including adding runtests and investigating the fuzzer failure |
The x86_64 issue the fuzzer found was an incorrect REX encoding - I misunderstood the manual and was using REX.R instead of REX.B when encoding bswap access to r8-r15. Fixed locally and added those cases to the runtests, they pass now. I'll have an update out sometime in the next couple days once I address the other feedback and add s390x support |
66833f7
to
7f2a823
Compare
PR updated:
|
let iSwappable = &TypeVar::new( | ||
"iSwappable", | ||
"A multi byte scalar integer type", | ||
TypeSetBuilder::new().ints(16..64).build(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing the 8-bit case! We've generally kept support for types which make sense even if there's no backend support, though, so I would put the 128-bit case back in. If nothing else, that lets us have a runtests/i128-bswap.clif
that validates the interpreter (test interpret
) even though we can't yet do test run
on any target.
There are quite a few instructions that don't have 128-bit support on some backends yet, but it's good to have things in place to show what those instructions are expected to do for whenever somebody gets around to implementing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good! Added back 128-bit and created an interpreter test. I didn't actually implement any 128-bit swaps in the backends, left that for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the 128 version added!
I ran this through the fuzzer this morning and no further issues popped up! 🥳
Edit: To clarify, I don't think we need the i128 version now, we can add it in a later PR but we should legalize the instruction.
s390x parts LGTM, thanks for adding this. (Seeing the discussion above, if consensus is to add I128 support, you can likewise copy the 128-bit byteswap implementation via vector permute from bitrev.) |
Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now: ``` builder.ins().bswap(value) ``` to get a native byteswap instruction. * x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register. As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8. Updated x64 RexFlags type to support emitting for single-operand instructions like bswap * aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate. * s390x: Bswap was already supported in backend, just had to add a bit of plumbing * For completeness, added bswap to the interpreter as well. * added filetests and runtests for each ISA * added bswap to fuzzgen, thanks to afonso360 for the code there * 128-bit swaps are not yet implemented, that can be done later
PR Updated - legalized 128-bit bswap, added runtest for interpreter. Didn't actually implement 128-bit swap in any of the backends yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks so much for this.
Looks like the CI run failed at
It looks like an issue with the tooling rather than an issue with my commit - what should I do here, does it just need a retry? |
Apparently it did just need a retry. Merged now! 🎉 |
Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now:
to get a native byteswap instruction.
As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8.
aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate.
s390x: Bswap not implemented
For completeness, added bswap to the interpreter as well.