Incorrect `fmaf` results on some inputs #263

afonso360 · 2022-07-26T08:13:18Z

Hey,

As part of investigating a unrelated issue I found an input for fmaf where libm returns a different result from the native fmadd instruction on aarch64. Note, the issue is arch independent, I'm using aarch64 to compare because it has a native fmadd instruction.

The result is only off by 1 ULP, does libm target this level of precision?

Here's the example program:

fn main() {
    let a = f32::from_bits(1266679807);
    let b = f32::from_bits(1300234242);
    let c = f32::from_bits(1115553792);
    let expected = f32::from_bits(1501560833);


    let native_mul_add = a.mul_add(b, c);
    let native_bits = native_mul_add.to_bits();

    let libm = libm::fmaf(a, b, c);
    let libm_bits = libm.to_bits();

    println!("expected: {}", expected.to_bits());
    println!("native  : {}", native_bits);
    println!("libm    : {}", libm_bits);
}

Outputs:

expected: 1501560833
native  : 1501560833
libm    : 1501560834

I don't think this is related to #200, I don't get a panic.

Another input that also fails is:

a = 1484783617
b = 1488977920
c = 2147483649

// Results:
native = 1908408321
libm   = 1908408322

The text was updated successfully, but these errors were encountered:

Amanieu · 2022-07-26T09:38:54Z

The implementation in this crate is based on the one in musl libc. Could you try comparing it against that? Also what does this operation return on different platforms? In theory FMA is specified by IEEE and should return the same return on all platforms.

afonso360 · 2022-07-26T10:48:12Z

The implementation in this crate is based on the one in musl libc. Could you try comparing it against that?

Sure, I ran the example code in a docker container with x86_64-unknown-linux-musl, this should (in theory) use musl's fmaf for the mul_add version:

results:

expected: 1501560833
native  : 1501560833
libm    : 1501560834
simd    : 1501560833

I also converted this over to C to test in an alpine container, which as far as I know will link with musl (not sure if I've done something wrong here).

code:

#include <math.h>
#include <stdio.h>

typedef union {
int i;
float f;
} mix_float;

int main() {
      mix_float a;
      a.i = 1266679807;

      mix_float b;
      b.i = 1300234242;

      mix_float c;
      c.i = 1115553792;

      float res = fmaf(a.f, b.f, c.f);


      mix_float mres;
      mres.f = res;

      printf("%d\n", mres.i);

      return 0;
}

results:

/tmp/test123 # cc test.c
/tmp/test123 # ./a.out
1501560833

Also what does this operation return on different platforms? In theory FMA is specified by IEEE and should return the same return on all platforms.

As far as I understand x86 does not have a native scalar fma instruction so it gets lowered to a lib call, both msvc and gnu agree on the result.

x86_64-pc-windows-msvc:

expected: 1501560833
native  : 1501560833
libm    : 1501560834

x86_64-unknown-linux-gnu:

expected: 1501560833
native  : 1501560833
libm    : 1501560834

We can also test the native simd version of x86 if we use vfmadd, but it gets the same result:

test code:

fn main() {
  let a = f32::from_bits(1266679807);
  let b = f32::from_bits(1300234242);
  let c = f32::from_bits(1115553792);
  let expected = f32::from_bits(1501560833);


  let native_mul_add = a.mul_add(b, c);
  let native_bits = native_mul_add.to_bits();

  let libm = libm::fmaf(a, b, c);
  let libm_bits = libm.to_bits();


  let simd_bits = unsafe {
      use std::arch::x86_64::*;

      let a = _mm_set1_ps(a);
      let b = _mm_set1_ps(b);
      let c = _mm_set1_ps(c);
      let res = _mm_fmadd_ps(a, b, c);
      _mm_extract_ps(res, 0)
  };



  println!("expected: {}", expected.to_bits());
  println!("native  : {}", native_bits);
  println!("libm    : {}", libm_bits);
  println!("simd    : {}", simd_bits);
}

result:

expected: 1501560833
native  : 1501560833
libm    : 1501560834
simd    : 1501560833

The other arch that we had on our testsuite was s390x and it did pass the tests (i.e. agreed with the expected results above). I don't have access to a s390x machine, but qemu agreed with the result.

Ported from upstream musl commit 4f3d346bffdf9ed2b1803653643dc31242490944 Fixes #263

afonso360 mentioned this issue Jul 26, 2022

cranelift: Add MinGW fma regression tests bytecodealliance/wasmtime#4517

Merged

afonso360 changed the title ~~Incorrect fmaf results on AArch64~~ Incorrect fmaf results on some inputs Jul 26, 2022

Amanieu added a commit that referenced this issue Aug 10, 2022

Fix fmaf not to depend on FE_TOWARDZERO

6fea5fb

Ported from upstream musl commit 4f3d346bffdf9ed2b1803653643dc31242490944 Fixes #263

Amanieu mentioned this issue Aug 10, 2022

Fix fmaf not to depend on FE_TOWARDZERO #267

Merged

Amanieu closed this as completed in #267 Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect `fmaf` results on some inputs #263

Incorrect `fmaf` results on some inputs #263

afonso360 commented Jul 26, 2022 •

edited

Loading

Amanieu commented Jul 26, 2022

afonso360 commented Jul 26, 2022

Incorrect fmaf results on some inputs #263

Incorrect fmaf results on some inputs #263

Comments

afonso360 commented Jul 26, 2022 • edited Loading

Amanieu commented Jul 26, 2022

afonso360 commented Jul 26, 2022

Incorrect `fmaf` results on some inputs #263

Incorrect `fmaf` results on some inputs #263

afonso360 commented Jul 26, 2022 •

edited

Loading