Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fmaf16 #419

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Add fmaf16 #419

wants to merge 1 commit into from

Conversation

tgross35
Copy link
Contributor

Split from #390 since I think the f128 version will be trickier.

@tgross35 tgross35 marked this pull request as draft January 11, 2025 21:34
@tgross35 tgross35 force-pushed the f16-fma branch 5 times, most recently from 67f2865 to 8aca043 Compare January 22, 2025 23:15
@tgross35
Copy link
Contributor Author

tgross35 commented Jan 22, 2025

@beetrees or @tspiteri would you be able to help me figure this out? The version of fmaf16 in this PR says fma(0x0001, 0x3bf7, 0x0000) = 0x0001 (6E-8 * 0.9956 + 0 = 6E-8, same result as x * y), which agrees with what LLVM does via fmaf. However, Rug is saying the result should be 0x0000. Is this usage incorrect?

#![feature(f16)]

use az::Az;
use rug::float::Round;

fn main() {
    let x = f16::from_bits(0x0001);
    let y = f16::from_bits(0x3bf7);
    let z = f16::from_bits(0x0000);

    let mut xf = rug::Float::with_val(11, x);
    let yf = rug::Float::with_val(11, y);
    let zf = rug::Float::with_val(11, z);

    let ordering = xf.mul_add_round(&yf, &zf, Round::Nearest);
    xf.subnormalize_ieee_round(ordering, Round::Nearest);
    let rug_res = (&xf).az::<f16>();

    println!("rug: {xf} {rug_res:?}");
    println!("std: {:?}", x.mul_add(y, z));
}

Prints:

rug: 5.9343e-8 0x0000
std: 0x0001

Also checked against Julia to get a third opinion, which agrees with 0x0001.

> x = fma(reinterpret(Float16, 0x0001), reinterpret(Float16, 0x3bf7), reinterpret(Float16, 0x0000))
Float16(6.0e-8)
> reinterpret(UInt16, x)
0x0001

@quaternic
Copy link
Contributor

It looks like what is happening there is that subnormalize{_ieee}{_round} propagate values smaller than the smallest subnormal unmodified, which is very unintuitive, but is also more or less behaving as documented. The docs of the _ieee -variants aren't explicit about what the subnormal range is, and I would have expected that to be just any exponent small enough.

I created an issue: https://gitlab.com/tspiteri/rug/-/issues/78

@tgross35
Copy link
Contributor Author

tgross35 commented Jan 23, 2025

Great find, thanks for looking into it. This is a problem with subnormalize then right, not specifically the IEEE versions?

From a quick check it seems like this could be worked around by reducing and reexpanding the precision, which shouldn't reallocate. Any idea if there is a better way?

let mut xf = rug::Float::with_val(24, f32::from_bits(1));
xf *= 0.75;
subnormalize(&mut xf);
xf *= 16.0;
subnormalize(&mut xf);
assert_eq!(xf.to_f32(), f32::from_bits(1) * 0.75 * 16.0);

fn subnormalize(xf: &mut rug::Float) {
    let e = xf.get_exp().unwrap();
    if e < -126 {
        xf.set_prec(24_u32.saturating_sub(-e as u32 - 126));
        xf.set_prec(24);
    }
}

@quaternic
Copy link
Contributor

The subnormalize_round was needed in the first place because it avoids double rounding by taking as input the direction that the Float was previously rounded. Any workaround will need to handle that as well. I'll add a specific test case to the rug issue.

Yes, all the variants of subnormalize have the same quirk, but the general ones (without _ieee) document it more explicitly.

@tspiteri
Copy link

After the change mentioned in Rug issue 78 which is now in the master branch, the code above prints:

rug: 5.9605e-8 0x0001
std: 0x0001

@tgross35
Copy link
Contributor Author

tgross35 commented Jan 23, 2025

Thank you for the quick fix, it seems to work great! The failure in the extensive tests looks like a real failure (interestingly, Julia returns the same result though std's implementation returns the MPFR result).

@quaternic
Copy link
Contributor

---- mp_extensive_fmaf16 ----

    input:    (0x8f10, 0x0488, 0x83e8) (0x8f10, 0x0488, 0x83e8)
    expected: 0x83e9 0x83e9
    actual:   0x83e8 0x83e8

Caused by:
    ulp 1 > 0

Indeed the expected result is correct. This is a case of a * b + c where:

  • c is subnormal
  • a * b is just slightly larger than half the least subnormal
  • a * b + c, when computed in f32, rounds to the halfway point between c = 0x83e8 and the next representable number 0x83e9

I'm not entirely sure where exactly the algorithm goes wrong, but I think it is because the test for halfway cases doesn't account for the sum being subnormal, so the f32-result (which is not subnormal) has even more excess precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants