-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscompilation corrupts stack-allocated vectors #63475
Comments
I don't think your reduction is correct; it looks like it involves accessing zero-byte allocations. Generally, the first tool I reach for to reduce miscompiles is opt-bisect-limit (https://llvm.org/docs/OptBisect.html). |
I removed all the zero-byte |
Looks like there is an ABI mismatch. The arguments are pushed via pushq at 8 byte offsets and then read via movl at 4 byte offsets. |
Here's a reduction: define void @caller() nounwind {
call void @callee(ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, <7 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>)
ret void
}
define void @callee(ptr %p0, ptr %p1, ptr %p2, ptr %p3, ptr %p4, ptr %p5, <7 x i32> %arg) nounwind {
start:
%alloca = alloca [7 x i32], align 4
store <7 x i32> %arg, ptr %alloca, align 4
%extract0 = extractelement <7 x i32> %arg, i64 0
call void @use(i32 %extract0)
%extract1 = extractelement <7 x i32> %arg, i64 1
call void @use(i32 %extract1)
%extract2 = extractelement <7 x i32> %arg, i64 2
call void @use(i32 %extract2)
%extract3 = extractelement <7 x i32> %arg, i64 3
call void @use(i32 %extract3)
%extract4 = extractelement <7 x i32> %arg, i64 4
call void @use(i32 %extract4)
%extract5 = extractelement <7 x i32> %arg, i64 5
call void @use(i32 %extract5)
%extract6 = extractelement <7 x i32> %arg, i64 6
call void @use(i32 %extract6)
%extract7 = extractelement <7 x i32> %arg, i64 7
call void @use(i32 %extract7)
ret void
}
declare void @use(i32) The caller does: pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
callq callee@PLT The callee does: movl 112(%rsp), %ebx
movl 104(%rsp), %ebp
movl 96(%rsp), %r14d
movl 76(%rsp), %r15d
movl 72(%rsp), %r12d
movl 64(%rsp), %edi
movl 68(%rsp), %r13d If we drop the store, then the offsets are correct (don't mind the different base): movl 144(%rsp), %ebx
movl 136(%rsp), %ebp
movl 128(%rsp), %r14d
movl 120(%rsp), %r15d
movl 112(%rsp), %r12d
movl 104(%rsp), %r13d
movl 96(%rsp), %edi So this is again in some way related to the arg copy elision optimization. |
This seems to be related to the code in X86ISelLowering::LowerMemArgument() handling isCopyElisionCandidate(). It checks for ScalarizedAndExtendedVector, but does so by inspecting the size of the LocVT. However, if I'm understanding this right, in this case the LocVT is i32 matching the vector size, but this doesn't match the size of the stack slot, which is 8. I'm not sure if there's any easy way to access that stack slot size though... CCAssignVal only stores the start offset. |
Candidate patch: https://reviews.llvm.org/D154078 |
@llvm/issue-subscribers-backend-x86 |
@cbeuw Do you have the original Rust code that lead to this issue? I find it suspicious that we end up with illegal vector types in optimized IR -- unless you did something with |
@nikic I have the unreduced code in custom MIR: https://godbolt.org/z/7q6q8eK96. But I don't have the reduced one around any more... I'm happy to run the minimisation script again though if needed. This isn't reproducible from surface Rust, which is why I opened a bug report with LLVM directly. The reproduction required a |
By illegal vector types do you mean the zero-byte |
"Illegal vector type" here refers to the non-power-of-two vectors, which are not natively supported by the target. They are already part of the input IR, and the most likely culprit for that is rust-lang/rust#111999. I wonder whether it would make sense to prevent argument promotion for such types, as the legalized argument passing for such vectors can be substantially worse than just passing them indirectly (and it makes it more likely to hit legalization bugs like #63608). |
When eliding argument copies, the memory layout between a plain store of the type and the layout of the argument lowering on the stack must match. For multi-part argument lowerings, this is not necessarily the case. The code already tried to prevent this optimization for "scalarized and extended" vectors, but the check for "extends" was incomplete. While a scalarized vector of i32s stores i32 values on the stack, these are stored in 8 byte stack slots (on x86_64), so effectively have padding. Rather than trying to add more special cases to handle this (which is not straightforward), I'm going in the other direction and exclude scalarized vectors from this optimization entirely. This seems like a rare case that is not worth the hassle -- the complete lack of test coverage is not reassuring either. Fixes llvm/llvm-project#63475. Differential Revision: https://reviews.llvm.org/D154078
This should print
42 42 42 42 42 42 42
, but prints42 0 42 0 42 42 42
withclang
oropt -O3
https://godbolt.org/z/8v3d7enK8The above was from
llvm-reduce
. I don't know if it broke something so I attached the original IR below. This is compiled from Rust but I've patched out the symbols from Rust std so has no dependency on Rust.original IR
The text was updated successfully, but these errors were encountered: