Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning matrix causing unaligned / packed return on sysv amd64 ABI #4553

Open
gfaster opened this issue Dec 3, 2024 · 8 comments
Open

Comments

@gfaster
Copy link
Contributor

gfaster commented Dec 3, 2024

Context

I'm not very familiar with Odin, but I found this code snippet to exhibit some very surprising behavior

  • Operating System & Odin Version:
Odin:    dev-2024-11:e607cbe93
	OS:      Debian GNU/Linux 12 (bookworm), Linux 6.1.0-27-amd64
	CPU:     Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
	RAM:     15720 MiB
	Backend: LLVM 14.0.6

Expected Behavior

Should have well-defined behavior, (not segfault). Specifically I would expect a matrix[2, 2]f32 to be returned on the stack, since making a heap allocation or even using the temp allocator seems overkill

Current Behavior

Segmentation fault

Failure Information (for bugs)

Steps to Reproduce

This program reliably segfaults with -o:none, but won't with -o:speed, even if we introduce a use a of the result via fmt.println

package tfile

foo :: proc() -> matrix[2, 2]f32 {
    return 0
}

main :: proc() {
    _ = foo() * ([2]f32{1, 0})
}

and the relevant disassembly

tfile.main:
	subq	$40, %rsp
	movq	%rdi, 8(%rsp)
	movq	8(%rsp), %rdi
	callq	tfile.foo
	movlpd	%xmm1, 32(%rsp)
	movlpd	%xmm0, 24(%rsp)
	movaps	24(%rsp), %xmm1  ; CRASH
	movsd	32(%rsp), %xmm0
	xorps	%xmm2, %xmm2
	mulps	%xmm2, %xmm0
	addps	%xmm1, %xmm0
	movq	$0, 16(%rsp)
	movlpd	%xmm0, 16(%rsp)
	addq	$40, %rsp
	retq

tfile.foo:
	movl	$0, -4(%rsp)
	movl	$0, -8(%rsp)
	movl	$0, -12(%rsp)
	movl	$0, -16(%rsp)
	movsd	-16(%rsp), %xmm0
	movsd	-8(%rsp), %xmm1
	retq
@JesseRMeyer
Copy link

Does the crash (probably due to incorrect alignment) occur on LLVM 18?

@gfaster
Copy link
Contributor Author

gfaster commented Dec 4, 2024

Can confirm it happens with LLVM 18 and 19

@gfaster
Copy link
Contributor Author

gfaster commented Dec 4, 2024

I'm not super familiar with LLVM but it looks like the IR that Odin generates is not correct:

define internal void @matseg.main(ptr noalias nocapture nonnull %__.context_ptr) {
decls:
  %0 = alloca [4 x float], align 4 ; <==== Allocation with align 4
  %1 = alloca [2 x float], align 8
  br label %entry

entry:                                            ; preds = %decls
  %2 = call <{ <2 x float>, <2 x float> }> @matseg.foo(ptr %__.context_ptr)
  store <{ <2 x float>, <2 x float> }> %2, ptr %0, align 1
  %3 = load <4 x float>, ptr %0, align 16  ; <==== Load from that same allocation with align 16
  %4 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 0, i32 1>
  %5 = shufflevector <4 x float> %3, <4 x float> undef, <2 x i32> <i32 2, i32 3>
  %6 = fmul <2 x float> %4, <float 1.000000e+00, float 1.000000e+00>
  %7 = call <2 x float> @llvm.fmuladd.v2f32(<2 x float> %5, <2 x float> zeroinitializer, <2 x float> %6)
  call void @llvm.memset.inline.p0.i64(ptr %1, i8 0, i64 8, i1 false)
  store <2 x float> %7, ptr %1, align 8
  ret void
}

@gfaster
Copy link
Contributor Author

gfaster commented Dec 5, 2024

Seems like the general problem is here

i64 max_align = gb_max(lb_alignof(src_type), lb_alignof(dst_type));
max_align = gb_max(max_align, 4);
LLVMValueRef ptr = llvm_alloca(p, dst_type, max_align);

The backing type of the 2x2 matrix here is [4 x float] which itself has an align of 4 according to lb_alignof but Odin assumes alignment of 16. The reason this isn't an issue if it's assigned to a local first is that the local is generated with the appropriate alignment for the type.

My temporary workaround is changing the 4 to 16 in the gb_max expression, but it seems like an actual solution will require exposing the Odin type to OdinLLVMBuildTransmute or maybe replacing the pointer casts with bitcast

As a side note, I don't think the first store there after entry should be generated with align 1

@gfaster gfaster changed the title Surprising behavior when returning matrix Returning matrix as a temporary can cause an unaligned load Dec 5, 2024
@gingerBill gingerBill reopened this Dec 5, 2024
@gingerBill
Copy link
Member

Not necessarily fixed, but give it a go on the latest commit, please.

@gfaster
Copy link
Contributor Author

gfaster commented Dec 5, 2024

That change is character-for-character identical to my local workaround, so I'm pretty sure it should be fine.

I did try this fix with different matrices that Odin assumes even higher alignment (4x4, 3x4, 2x4), but it seems like that turned out fine and their temporaries had the correct alignment (32+).

This seems to imply that special-casing 2x2 matrices and potentially 1x4 f32 matrices might be the only thing needed (though there may be other edge cases with #align types/fields that I don't know about)

There is definitely something up with the store generated to just 2x2 and 1x4 f32 matrices that causes it to have align 1 (that can be seen in the llvm snippet above) that doesn't appear with other sizes (2x4, 3x4, 4x4) nor with other element types (f16, f64), which seems like it may be related.

gfaster added a commit to gfaster/Odin that referenced this issue Dec 5, 2024
This test is definitely imperfect (should have no erroneous failures,
only erroneous passes), but should serve as a good smoke test
if matrix alignment is ever broken again. Looking at the generated LLVM,
there is a bunch of weird choices for alignment chosen that might be
worth looking into.

It's also worth noting that the failure mode of this test is a #GP
exception, which I don't know how well the test runner handles in a
larger test corpus.
gfaster added a commit to gfaster/Odin that referenced this issue Dec 5, 2024
This test is definitely imperfect (should have no erroneous failures,
only erroneous passes), but should serve as a good smoke test
if matrix alignment is ever broken again. Looking at the generated LLVM,
there is a bunch of weird choices for alignment chosen that might be
worth looking into.

It's also worth noting that the failure mode of this test is a #GP
exception, which I don't know how well the test runner handles in a
larger test corpus.
gingerBill added a commit that referenced this issue Dec 6, 2024
@laytan
Copy link
Collaborator

laytan commented Dec 6, 2024

I wonder why that line exists at all:

max_align = gb_max(max_align, 4);

Why do we constraint the transmute destination alignment, when we take the max between src and dst types

@laytan laytan changed the title Returning matrix as a temporary can cause an unaligned load Returning matrix as a temporary can cause an unaligned store Dec 6, 2024
@laytan
Copy link
Collaborator

laytan commented Dec 6, 2024

The unaligned issues are an issue with the sysv amd64 ABI

@laytan laytan changed the title Returning matrix as a temporary can cause an unaligned store Returning matrix causing unaligned / packed return on sysv amd64 ABI Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants