Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect fmaf results on some inputs #263

Closed
afonso360 opened this issue Jul 26, 2022 · 2 comments · Fixed by #267
Closed

Incorrect fmaf results on some inputs #263

afonso360 opened this issue Jul 26, 2022 · 2 comments · Fixed by #267

Comments

@afonso360
Copy link

afonso360 commented Jul 26, 2022

Hey,

As part of investigating a unrelated issue I found an input for fmaf where libm returns a different result from the native fmadd instruction on aarch64. Note, the issue is arch independent, I'm using aarch64 to compare because it has a native fmadd instruction.

The result is only off by 1 ULP, does libm target this level of precision?

Here's the example program:

fn main() {
    let a = f32::from_bits(1266679807);
    let b = f32::from_bits(1300234242);
    let c = f32::from_bits(1115553792);
    let expected = f32::from_bits(1501560833);


    let native_mul_add = a.mul_add(b, c);
    let native_bits = native_mul_add.to_bits();

    let libm = libm::fmaf(a, b, c);
    let libm_bits = libm.to_bits();

    println!("expected: {}", expected.to_bits());
    println!("native  : {}", native_bits);
    println!("libm    : {}", libm_bits);
}

Outputs:

expected: 1501560833
native  : 1501560833
libm    : 1501560834

I don't think this is related to #200, I don't get a panic.


Another input that also fails is:

a = 1484783617
b = 1488977920
c = 2147483649

// Results:
native = 1908408321
libm   = 1908408322
@afonso360 afonso360 changed the title Incorrect fmaf results on AArch64 Incorrect fmaf results on some inputs Jul 26, 2022
@Amanieu
Copy link
Member

Amanieu commented Jul 26, 2022

The implementation in this crate is based on the one in musl libc. Could you try comparing it against that? Also what does this operation return on different platforms? In theory FMA is specified by IEEE and should return the same return on all platforms.

@afonso360
Copy link
Author

The implementation in this crate is based on the one in musl libc. Could you try comparing it against that?

Sure, I ran the example code in a docker container with x86_64-unknown-linux-musl, this should (in theory) use musl's fmaf for the mul_add version:

results:
expected: 1501560833
native  : 1501560833
libm    : 1501560834
simd    : 1501560833

I also converted this over to C to test in an alpine container, which as far as I know will link with musl (not sure if I've done something wrong here).

code:
#include <math.h>
#include <stdio.h>

typedef union {
int i;
float f;
} mix_float;

int main() {
      mix_float a;
      a.i = 1266679807;

      mix_float b;
      b.i = 1300234242;

      mix_float c;
      c.i = 1115553792;

      float res = fmaf(a.f, b.f, c.f);


      mix_float mres;
      mres.f = res;

      printf("%d\n", mres.i);

      return 0;
}
results:
/tmp/test123 # cc test.c
/tmp/test123 # ./a.out
1501560833

Also what does this operation return on different platforms? In theory FMA is specified by IEEE and should return the same return on all platforms.

As far as I understand x86 does not have a native scalar fma instruction so it gets lowered to a lib call, both msvc and gnu agree on the result.

x86_64-pc-windows-msvc:
expected: 1501560833
native  : 1501560833
libm    : 1501560834
x86_64-unknown-linux-gnu:
expected: 1501560833
native  : 1501560833
libm    : 1501560834

We can also test the native simd version of x86 if we use vfmadd, but it gets the same result:

test code:
fn main() {
  let a = f32::from_bits(1266679807);
  let b = f32::from_bits(1300234242);
  let c = f32::from_bits(1115553792);
  let expected = f32::from_bits(1501560833);


  let native_mul_add = a.mul_add(b, c);
  let native_bits = native_mul_add.to_bits();

  let libm = libm::fmaf(a, b, c);
  let libm_bits = libm.to_bits();


  let simd_bits = unsafe {
      use std::arch::x86_64::*;

      let a = _mm_set1_ps(a);
      let b = _mm_set1_ps(b);
      let c = _mm_set1_ps(c);
      let res = _mm_fmadd_ps(a, b, c);
      _mm_extract_ps(res, 0)
  };



  println!("expected: {}", expected.to_bits());
  println!("native  : {}", native_bits);
  println!("libm    : {}", libm_bits);
  println!("simd    : {}", simd_bits);
}
result:
expected: 1501560833
native  : 1501560833
libm    : 1501560834
simd    : 1501560833

The other arch that we had on our testsuite was s390x and it did pass the tests (i.e. agreed with the expected results above). I don't have access to a s390x machine, but qemu agreed with the result.

Amanieu added a commit that referenced this issue Aug 10, 2022
Ported from upstream musl commit 4f3d346bffdf9ed2b1803653643dc31242490944

Fixes #263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants