Returning `matrix` causing unaligned / packed return on sysv amd64 ABI #4553

gfaster · 2024-12-03T06:45:31Z

Context

I'm not very familiar with Odin, but I found this code snippet to exhibit some very surprising behavior

Operating System & Odin Version:

Odin:    dev-2024-11:e607cbe93
	OS:      Debian GNU/Linux 12 (bookworm), Linux 6.1.0-27-amd64
	CPU:     Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
	RAM:     15720 MiB
	Backend: LLVM 14.0.6

Expected Behavior

Should have well-defined behavior, (not segfault). Specifically I would expect a matrix[2, 2]f32 to be returned on the stack, since making a heap allocation or even using the temp allocator seems overkill

Current Behavior

Segmentation fault

Failure Information (for bugs)

Steps to Reproduce

This program reliably segfaults with -o:none, but won't with -o:speed, even if we introduce a use a of the result via fmt.println

package tfile

foo :: proc() -> matrix[2, 2]f32 {
    return 0
}

main :: proc() {
    _ = foo() * ([2]f32{1, 0})
}

and the relevant disassembly

tfile.main:
	subq	$40, %rsp
	movq	%rdi, 8(%rsp)
	movq	8(%rsp), %rdi
	callq	tfile.foo
	movlpd	%xmm1, 32(%rsp)
	movlpd	%xmm0, 24(%rsp)
	movaps	24(%rsp), %xmm1  ; CRASH
	movsd	32(%rsp), %xmm0
	xorps	%xmm2, %xmm2
	mulps	%xmm2, %xmm0
	addps	%xmm1, %xmm0
	movq	$0, 16(%rsp)
	movlpd	%xmm0, 16(%rsp)
	addq	$40, %rsp
	retq

tfile.foo:
	movl	$0, -4(%rsp)
	movl	$0, -8(%rsp)
	movl	$0, -12(%rsp)
	movl	$0, -16(%rsp)
	movsd	-16(%rsp), %xmm0
	movsd	-8(%rsp), %xmm1
	retq

The text was updated successfully, but these errors were encountered:

JesseRMeyer · 2024-12-03T15:32:30Z

Does the crash (probably due to incorrect alignment) occur on LLVM 18?

gfaster · 2024-12-04T23:49:19Z

Can confirm it happens with LLVM 18 and 19

gfaster · 2024-12-04T23:54:26Z

I'm not super familiar with LLVM but it looks like the IR that Odin generates is not correct:

define internal void @matseg.main(ptr noalias nocapture nonnull %__.context_ptr) {
decls:
  %0 = alloca [4 x float], align 4 ; <==== Allocation with align 4
  %1 = alloca [2 x float], align 8
  br label %entry

entry:                                            ; preds = %decls
  %2 = call <{ <2 x float>, <2 x float> }> @matseg.foo(ptr %__.context_ptr)
  store <{ <2 x float>, <2 x float> }> %2, ptr %0, align 1
  %3 = load <4 x float>, ptr %0, align 16  ; <==== Load from that same allocation with align 16
  %4 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 0, i32 1>
  %5 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 2, i32 3>
  %6 = fmul <2 x float> %4, <float 1.000000e+00, float 1.000000e+00>
  %7 = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> %5, <2 x float> zeroinitializer, <2 x float> %6)
  call void @llvm.memset.inline.p0.i64(ptr %1, i8 0, i64 8, i1 false)
  store <2 x float> %7, ptr %1, align 8
  ret void
}

gfaster · 2024-12-05T01:21:38Z

Seems like the general problem is here

Odin/src/llvm_backend_general.cpp

Lines 2603 to 2606 in cdb86d6

    
           i64 max_align = gb_max(lb_alignof(src_type), lb_alignof(dst_type)); 
        
           max_align = gb_max(max_align, 4); 
        
           LLVMValueRef ptr = llvm_alloca(p, dst_type, max_align);

The backing type of the 2x2 matrix here is [4 x float] which itself has an align of 4 according to lb_alignof but Odin assumes alignment of 16. The reason this isn't an issue if it's assigned to a local first is that the local is generated with the appropriate alignment for the type.

My temporary workaround is changing the 4 to 16 in the gb_max expression, but it seems like an actual solution will require exposing the Odin type to OdinLLVMBuildTransmute or maybe replacing the pointer casts with bitcast

As a side note, I don't think the first store there after entry should be generated with align 1

gingerBill · 2024-12-05T15:45:26Z

Not necessarily fixed, but give it a go on the latest commit, please.

gfaster · 2024-12-05T17:00:23Z

That change is character-for-character identical to my local workaround, so I'm pretty sure it should be fine.

I did try this fix with different matrices that Odin assumes even higher alignment (4x4, 3x4, 2x4), but it seems like that turned out fine and their temporaries had the correct alignment (32+).

This seems to imply that special-casing 2x2 matrices and potentially 1x4 f32 matrices might be the only thing needed (though there may be other edge cases with #align types/fields that I don't know about)

There is definitely something up with the store generated to just 2x2 and 1x4 f32 matrices that causes it to have align 1 (that can be seen in the llvm snippet above) that doesn't appear with other sizes (2x4, 3x4, 4x4) nor with other element types (f16, f64), which seems like it may be related.

This test is definitely imperfect (should have no erroneous failures, only erroneous passes), but should serve as a good smoke test if matrix alignment is ever broken again. Looking at the generated LLVM, there is a bunch of weird choices for alignment chosen that might be worth looking into. It's also worth noting that the failure mode of this test is a #GP exception, which I don't know how well the test runner handles in a larger test corpus.

Add regression test for #4553

laytan · 2024-12-06T21:24:47Z

I wonder why that line exists at all:

Odin/src/llvm_backend_general.cpp

Line 2604 in cdb86d6

max_align = gb_max(max_align, 4);

Why do we constraint the transmute destination alignment, when we take the max between src and dst types

laytan · 2024-12-06T21:32:25Z

The unaligned issues are an issue with the sysv amd64 ABI

gfaster changed the title ~~Surprising behavior when returning matrix~~ Returning matrix as a temporary can cause an unaligned load Dec 5, 2024

gingerBill closed this as completed in dd8e6e8 Dec 5, 2024

gingerBill reopened this Dec 5, 2024

gingerBill added a commit that referenced this issue Dec 6, 2024

Merge pull request #4564 from gfaster/4553-regression-test

62768ad

Add regression test for #4553

laytan changed the title ~~Returning matrix as a temporary can cause an unaligned load~~ Returning matrix as a temporary can cause an unaligned store Dec 6, 2024

laytan changed the title ~~Returning matrix as a temporary can cause an unaligned store~~ Returning matrix causing unaligned / packed return on sysv amd64 ABI Dec 6, 2024

laytan added the performance label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning `matrix` causing unaligned / packed return on sysv amd64 ABI #4553

Returning `matrix` causing unaligned / packed return on sysv amd64 ABI #4553

gfaster commented Dec 3, 2024

JesseRMeyer commented Dec 3, 2024

gfaster commented Dec 4, 2024

gfaster commented Dec 4, 2024

gfaster commented Dec 5, 2024

gingerBill commented Dec 5, 2024

gfaster commented Dec 5, 2024

laytan commented Dec 6, 2024

laytan commented Dec 6, 2024

Returning matrix causing unaligned / packed return on sysv amd64 ABI #4553

Returning matrix causing unaligned / packed return on sysv amd64 ABI #4553

Comments

gfaster commented Dec 3, 2024

Context

Expected Behavior

Current Behavior

Failure Information (for bugs)

Steps to Reproduce

JesseRMeyer commented Dec 3, 2024

gfaster commented Dec 4, 2024

gfaster commented Dec 4, 2024

gfaster commented Dec 5, 2024

gingerBill commented Dec 5, 2024

gfaster commented Dec 5, 2024

laytan commented Dec 6, 2024

laytan commented Dec 6, 2024

Returning `matrix` causing unaligned / packed return on sysv amd64 ABI #4553

Returning `matrix` causing unaligned / packed return on sysv amd64 ABI #4553