Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::intrinsics::simd::simd_reduce_add_unordered generates inefficient code for floating-point numbers #130028

Closed
Il-Capitano opened this issue Sep 6, 2024 · 3 comments · Fixed by #130325
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Il-Capitano
Copy link

Code generation for std::intrinsics::simd::simd_reduce_add_unordered generates an extra floating-point add that adds +0.0 to the result: https://godbolt.org/z/Y496nxv3E

use std::simd::*;

unsafe fn reduce_add_unordered(v: f32x4) -> f32 {
    std::intrinsics::simd::simd_reduce_add_unordered(v)
}

The problem seems to be because the compiler uses +0.0 as the starting value of @llvm.vector.reduce.fadd.* instead of -0.0. Comparing LLVM code generation for the two cases, we get the more efficient version when using -0.0: https://godbolt.org/z/fhaz7ced6

define float @reduce_fadd_positive_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
  ret float %result
}

define float @reduce_fadd_negative_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> %v)
  ret float %result
}

declare float @llvm.vector.reduce.fadd.v4f32(float, <4 x float>)

This generates the following assembly for AArch64:

reduce_fadd_positive_zero:              // @reduce_fadd_positive_zero
        ldr     q1, [x0]
        movi    d0, #0000000000000000
        faddp   v1.4s, v1.4s, v1.4s
        faddp   s1, v1.2s
        fadd    s0, s1, s0
        ret
reduce_fadd_negative_zero:              // @reduce_fadd_negative_zero
        ldr     q0, [x0]
        faddp   v0.4s, v0.4s, v0.4s
        faddp   s0, v0.2s
        ret

To me, this behaviour seems to be caused by using +0.0 instead of -0.0 here in the compiler:

arith_red!(
simd_reduce_add_unordered: vector_reduce_add,
vector_reduce_fadd_reassoc,
false,
add,
0.0
);

@Il-Capitano Il-Capitano added the C-bug Category: This is a bug. label Sep 6, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 6, 2024
@workingjubilee workingjubilee added A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Sep 6, 2024
@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 7, 2024
@RalfJung
Copy link
Member

This sounds somewhat related to #129321.

@workingjubilee
Copy link
Member

Yeah, this is fixed by rust-lang/portable-simd#438

@workingjubilee
Copy link
Member

wait, not in the... sec, will PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants