-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64: Don't emit mov
for zero case in jump tables for shift intrinsics
#107322
JIT ARM64: Don't emit mov
for zero case in jump tables for shift intrinsics
#107322
Conversation
mov
for zero case in jump tables for shift intrinsicsmov
for zero case in jump tables for left-shift intrinsics
I initially neglected that we can do something like cc @dotnet/jit-contrib, @tannergooding PTAL. Thanks! |
ping @tannergooding @dotnet/jit-contrib |
if ((shiftAmount == 0) && (intrin.category == HW_Category_ShiftRightByImmediate)) | ||
{ | ||
// TODO: Use emitIns_Mov instead. | ||
// We do not use it currently because it will still elide the 'mov' | ||
// even if 'canSkip' is false. We cannot elide the 'mov' here. | ||
GetEmitter()->emitIns_R_R_R(INS_mov, emitTypeSize(node), targetReg, reg, reg); | ||
// For operations like `Vector128<byte> >> 8`, over-shifting behavior will mask | ||
// the immediate so that it's zero, and the shift will be imported as a GT_HWINTRINSIC. | ||
// If this isn't const-folded away, it is possible to try to emit a right-shift by zero, | ||
// but only if the immediate is known (we won't emit a zero case for the jump table). | ||
// Since we cannot encode a right-shift by 0 on ARM64, emit a mov instead. | ||
assert(op->isContainedIntOrIImmed()); | ||
GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, reg, /* canSkip */ true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we emit a case for the jump table?
If the value is unknown and it doesn't throw, then won't we hit the case in the jump table and need handling emitted for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct that for unknown valid immediates, we will emit the jump table. In the table generation logic, we get the table's bounds from HWIntrinsicInfo::lookupImmBounds
, which sets the lower bound at 1 for right-shift intrinsics here. So if shiftAmount == 0
for a right-shift intrinsic, we should not be generating a jump table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what are we doing if the value is unknown and 0
? Because we're not throwing for the constant case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I tried the following program:
public class Test
{
public byte m;
}
public class Program
{
static Test t = new Test();
public static void Main()
{
var v = Vector128<Byte>.Zero;
var result = v >> t.m;
Console.WriteLine(result);
}
}
The JIT imports the shift as ShiftLogical
, hence why we don't throw. I can update the comment to cover this case, if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I wonder if this is the "right" fix.
We basically have:
- Import the intrinsic
- If the shift amount is non-constant, import as
ShiftLogical
- If the shift amount is constant, then mask, which could become zero
So I think we're maybe missing a check that ensures that if its zero, its still treated as "out of range" and gets imported as ShiftLogical
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your proposed fix is better. I initially wanted the fallback implementations we've been adding to kick in for cases like this, but the logic for checking immediates and adding range checks/etc. in Compiler::impHWIntrinsic
runs before we check if the intrinsic is table-driven or needs special handling. I guess in GenTreeHWIntrinsic::GetHWIntrinsicIdForBinOp
, instead of just checking if the immediate is const, we need to also check if it's out-of-range and use ShiftLogical
accordingly -- I'll give that a try.
mov
for zero case in jump tables for left-shift intrinsicsmov
for zero case in jump tables for shift intrinsics
Diffs are from the
I'm guessing these were previously getting folded away. |
// and return the correct fallback intrinsic. | ||
if ((op != GT_LSH) && (op2->AsIntCon()->IconValue() == 0)) | ||
{ | ||
op2 = gtNewSimdCreateBroadcastNode(type, op2, simdBaseJitType, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, we prefer the direct zero constant helper in cases like this:
op2 = gtNewSimdCreateBroadcastNode(type, op2, simdBaseJitType, simdSize); | |
op2 = gtNewZeroConNode(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing these were previously getting folded away
We can get this handled in .NET 10 and ensure that ShiftLogical(x, zero)
is elided. It shouldn't be important to preserve for .NET 9
/backport to release/9.0 |
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10745195972 |
Fixes #107173. ARM64 right-shift intrinsics don't support encoding shift amounts of zero -- I believe this was the initial motivation for emitting a
mov
in lieu of the intrinsic when the shift amount is zero. However,HWIntrinsicImmOpHelper
should never try to emit a shift amount of zero for right-shift intrinsics, so we shouldn't need themov
case. Furthermore, emitting amov
instead of the intrinsic drops any side effects other than the shift, as demonstrated by the usage ofShiftLeftLogicalWideningLower
in the test without this fix.By removing the
mov
case, left-shift intrinsics should work correctly with all immediates now, and for cases where we try to right-shift by zero, we either switch over to a fallback implementation during importation if one is available, or we throw anArgumentOutOfRangeException
at runtime.