-
-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning matrix
causing unaligned / packed return on sysv amd64 ABI
#4553
Comments
Does the crash (probably due to incorrect alignment) occur on LLVM 18? |
Can confirm it happens with LLVM 18 and 19 |
I'm not super familiar with LLVM but it looks like the IR that Odin generates is not correct: define internal void @matseg.main(ptr noalias nocapture nonnull %__.context_ptr) {
decls:
%0 = alloca [4 x float], align 4 ; <==== Allocation with align 4
%1 = alloca [2 x float], align 8
br label %entry
entry: ; preds = %decls
%2 = call <{ <2 x float>, <2 x float> }> @matseg.foo(ptr %__.context_ptr)
store <{ <2 x float>, <2 x float> }> %2, ptr %0, align 1
%3 = load <4 x float>, ptr %0, align 16 ; <==== Load from that same allocation with align 16
%4 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 0, i32 1>
%5 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 2, i32 3>
%6 = fmul <2 x float> %4, <float 1.000000e+00, float 1.000000e+00>
%7 = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> %5, <2 x float> zeroinitializer, <2 x float> %6)
call void @llvm.memset.inline.p0.i64(ptr %1, i8 0, i64 8, i1 false)
store <2 x float> %7, ptr %1, align 8
ret void
} |
Seems like the general problem is here Odin/src/llvm_backend_general.cpp Lines 2603 to 2606 in cdb86d6
The backing type of the 2x2 matrix here is My temporary workaround is changing the 4 to 16 in the As a side note, I don't think the first store there after entry should be generated with align 1 |
matrix
matrix
as a temporary can cause an unaligned load
Not necessarily fixed, but give it a go on the latest commit, please. |
That change is character-for-character identical to my local workaround, so I'm pretty sure it should be fine. I did try this fix with different matrices that Odin assumes even higher alignment (4x4, 3x4, 2x4), but it seems like that turned out fine and their temporaries had the correct alignment (32+). This seems to imply that special-casing 2x2 matrices and potentially 1x4 f32 matrices might be the only thing needed (though there may be other edge cases with There is definitely something up with the store generated to just 2x2 and 1x4 f32 matrices that causes it to have |
This test is definitely imperfect (should have no erroneous failures, only erroneous passes), but should serve as a good smoke test if matrix alignment is ever broken again. Looking at the generated LLVM, there is a bunch of weird choices for alignment chosen that might be worth looking into. It's also worth noting that the failure mode of this test is a #GP exception, which I don't know how well the test runner handles in a larger test corpus.
This test is definitely imperfect (should have no erroneous failures, only erroneous passes), but should serve as a good smoke test if matrix alignment is ever broken again. Looking at the generated LLVM, there is a bunch of weird choices for alignment chosen that might be worth looking into. It's also worth noting that the failure mode of this test is a #GP exception, which I don't know how well the test runner handles in a larger test corpus.
I wonder why that line exists at all: Odin/src/llvm_backend_general.cpp Line 2604 in cdb86d6
Why do we constraint the transmute destination alignment, when we take the max between src and dst types |
matrix
as a temporary can cause an unaligned loadmatrix
as a temporary can cause an unaligned store
The unaligned issues are an issue with the sysv amd64 ABI |
matrix
as a temporary can cause an unaligned storematrix
causing unaligned / packed return on sysv amd64 ABI
Context
I'm not very familiar with Odin, but I found this code snippet to exhibit some very surprising behavior
Expected Behavior
Should have well-defined behavior, (not segfault). Specifically I would expect a
matrix[2, 2]f32
to be returned on the stack, since making a heap allocation or even using the temp allocator seems overkillCurrent Behavior
Segmentation fault
Failure Information (for bugs)
Steps to Reproduce
This program reliably segfaults with
-o:none
, but won't with-o:speed
, even if we introduce a use a of the result viafmt.println
and the relevant disassembly
The text was updated successfully, but these errors were encountered: