`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028

Il-Capitano · 2024-09-06T11:11:00Z

Code generation for std::intrinsics::simd::simd_reduce_add_unordered generates an extra floating-point add that adds +0.0 to the result: https://godbolt.org/z/Y496nxv3E

use std::simd::*;

unsafe fn reduce_add_unordered(v: f32x4) -> f32 {
    std::intrinsics::simd::simd_reduce_add_unordered(v)
}

The problem seems to be because the compiler uses +0.0 as the starting value of @llvm.vector.reduce.fadd.* instead of -0.0. Comparing LLVM code generation for the two cases, we get the more efficient version when using -0.0: https://godbolt.org/z/fhaz7ced6

define float @reduce_fadd_positive_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
  ret float %result
}

define float @reduce_fadd_negative_zero(ptr %p) {
  %v = load <4 x float>, ptr %p, align 16
  %result = tail call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> %v)
  ret float %result
}

declare float @llvm.vector.reduce.fadd.v4f32(float, <4 x float>)

This generates the following assembly for AArch64:

reduce_fadd_positive_zero:              // @reduce_fadd_positive_zero
        ldr     q1, [x0]
        movi    d0, #0000000000000000
        faddp   v1.4s, v1.4s, v1.4s
        faddp   s1, v1.2s
        fadd    s0, s1, s0
        ret
reduce_fadd_negative_zero:              // @reduce_fadd_negative_zero
        ldr     q0, [x0]
        faddp   v0.4s, v0.4s, v0.4s
        faddp   s0, v0.2s
        ret

To me, this behaviour seems to be caused by using +0.0 instead of -0.0 here in the compiler:

rust/compiler/rustc_codegen_llvm/src/intrinsic.rs

Lines 2095 to 2101 in a3af208

    
           arith_red!( 
        
               simd_reduce_add_unordered: vector_reduce_add, 
        
               vector_reduce_fadd_reassoc, 
        
               false, 
        
               add, 
        
               0.0 
        
           );

The text was updated successfully, but these errors were encountered:

RalfJung · 2024-09-13T20:00:33Z

This sounds somewhat related to #129321.

workingjubilee · 2024-09-13T20:32:42Z

Yeah, this is fixed by rust-lang/portable-simd#438

workingjubilee · 2024-09-13T20:34:34Z

wait, not in the... sec, will PR.

Il-Capitano added the C-bug Category: This is a bug. label Sep 6, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 6, 2024

workingjubilee added A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Sep 6, 2024

saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 7, 2024

workingjubilee added the O-AArch64 Armv8-A or later processors in AArch64 mode label Sep 13, 2024

workingjubilee mentioned this issue Sep 14, 2024

Use -0.0 in intrinsics::simd::reduce_add_unordered #130325

Merged

bors closed this as completed in #130325 Sep 16, 2024

Il-Capitano mentioned this issue Dec 25, 2024

[Clang] Add float type support to __builtin_reduce_add and __builtin_reduce_multipy llvm/llvm-project#120367

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028

Il-Capitano commented Sep 6, 2024

RalfJung commented Sep 13, 2024

workingjubilee commented Sep 13, 2024

workingjubilee commented Sep 13, 2024

std::intrinsics::simd::simd_reduce_add_unordered generates inefficient code for floating-point numbers #130028

std::intrinsics::simd::simd_reduce_add_unordered generates inefficient code for floating-point numbers #130028

Comments

Il-Capitano commented Sep 6, 2024

RalfJung commented Sep 13, 2024

workingjubilee commented Sep 13, 2024

workingjubilee commented Sep 13, 2024

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028

`std::intrinsics::simd::simd_reduce_add_unordered` generates inefficient code for floating-point numbers #130028