Add `fmaf16` #419

tgross35 · 2025-01-11T21:34:02Z

Split from #390 since I think the f128 version will be trickier.

tgross35 · 2025-01-22T23:31:08Z

@beetrees or @tspiteri would you be able to help me figure this out? The version of fmaf16 in this PR says fma(0x0001, 0x3bf7, 0x0000) = 0x0001 (6E-8 * 0.9956 + 0 = 6E-8, same result as x * y), which agrees with what LLVM does via fmaf. However, Rug is saying the result should be 0x0000. Is this usage incorrect?

#![feature(f16)]

use az::Az;
use rug::float::Round;

fn main() {
    let x = f16::from_bits(0x0001);
    let y = f16::from_bits(0x3bf7);
    let z = f16::from_bits(0x0000);

    let mut xf = rug::Float::with_val(11, x);
    let yf = rug::Float::with_val(11, y);
    let zf = rug::Float::with_val(11, z);

    let ordering = xf.mul_add_round(&yf, &zf, Round::Nearest);
    xf.subnormalize_ieee_round(ordering, Round::Nearest);
    let rug_res = (&xf).az::<f16>();

    println!("rug: {xf} {rug_res:?}");
    println!("std: {:?}", x.mul_add(y, z));
}

Prints:

rug: 5.9343e-8 0x0000
std: 0x0001

Also checked against Julia to get a third opinion, which agrees with 0x0001.

> x = fma(reinterpret(Float16, 0x0001), reinterpret(Float16, 0x3bf7), reinterpret(Float16, 0x0000))
Float16(6.0e-8)
> reinterpret(UInt16, x)
0x0001

quaternic · 2025-01-23T03:09:53Z

It looks like what is happening there is that subnormalize{_ieee}{_round} propagate values smaller than the smallest subnormal unmodified, which is very unintuitive, but is also more or less behaving as documented. The docs of the _ieee -variants aren't explicit about what the subnormal range is, and I would have expected that to be just any exponent small enough.

I created an issue: https://gitlab.com/tspiteri/rug/-/issues/78

tgross35 · 2025-01-23T03:49:21Z

Great find, thanks for looking into it. This is a problem with subnormalize then right, not specifically the IEEE versions?

From a quick check it seems like this could be worked around by reducing and reexpanding the precision, which shouldn't reallocate. Any idea if there is a better way?

let mut xf = rug::Float::with_val(24, f32::from_bits(1));
xf *= 0.75;
subnormalize(&mut xf);
xf *= 16.0;
subnormalize(&mut xf);
assert_eq!(xf.to_f32(), f32::from_bits(1) * 0.75 * 16.0);

fn subnormalize(xf: &mut rug::Float) {
    let e = xf.get_exp().unwrap();
    if e < -126 {
        xf.set_prec(24_u32.saturating_sub(-e as u32 - 126));
        xf.set_prec(24);
    }
}

quaternic · 2025-01-23T05:22:36Z

The subnormalize_round was needed in the first place because it avoids double rounding by taking as input the direction that the Float was previously rounded. Any workaround will need to handle that as well. I'll add a specific test case to the rug issue.

Yes, all the variants of subnormalize have the same quirk, but the general ones (without _ieee) document it more explicitly.

tspiteri · 2025-01-23T16:27:02Z

After the change mentioned in Rug issue 78 which is now in the master branch, the code above prints:

rug: 5.9605e-8 0x0001
std: 0x0001

tgross35 · 2025-01-23T20:50:34Z

Thank you for the quick fix, it seems to work great! The failure in the extensive tests looks like a real failure (interestingly, Julia returns the same result though std's implementation returns the MPFR result).

quaternic · 2025-01-23T23:24:17Z

---- mp_extensive_fmaf16 ----

    input:    (0x8f10, 0x0488, 0x83e8) (0x8f10, 0x0488, 0x83e8)
    expected: 0x83e9 0x83e9
    actual:   0x83e8 0x83e8

Caused by:
    ulp 1 > 0

Indeed the expected result is correct. This is a case of a * b + c where:

c is subnormal
a * b is just slightly larger than half the least subnormal
a * b + c, when computed in f32, rounds to the halfway point between c = 0x83e8 and the next representable number 0x83e9

I'm not entirely sure where exactly the algorithm goes wrong, but I think it is because the test for halfway cases doesn't account for the sum being subnormal, so the f32-result (which is not subnormal) has even more excess precision.

tgross35 marked this pull request as draft January 11, 2025 21:34

tgross35 force-pushed the f16-fma branch from 9ee4137 to 4451a20 Compare January 11, 2025 21:51

tgross35 force-pushed the f16-fma branch 5 times, most recently from 67f2865 to 8aca043 Compare January 22, 2025 23:15

tgross35 force-pushed the f16-fma branch from 8aca043 to 36451e3 Compare January 23, 2025 07:24

tgross35 mentioned this pull request Jan 23, 2025

Add fmaf16 and fmaf128 #390

Closed

tgross35 mentioned this pull request Jan 24, 2025

Add ldexpf16, ldexpf128, scalbnf16, and scalbnf128 #391

Merged

tgross35 force-pushed the f16-fma branch from 7ae247b to 9f50116 Compare January 25, 2025 02:05

tgross35 force-pushed the f16-fma branch 3 times, most recently from ec64899 to 601b5a2 Compare February 7, 2025 04:59

Add fmaf16

f3e8cb1

tgross35 force-pushed the f16-fma branch from 601b5a2 to f3e8cb1 Compare February 7, 2025 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `fmaf16` #419

Add `fmaf16` #419

tgross35 commented Jan 11, 2025

tgross35 commented Jan 22, 2025 •

edited

Loading

quaternic commented Jan 23, 2025

tgross35 commented Jan 23, 2025 •

edited

Loading

quaternic commented Jan 23, 2025

tspiteri commented Jan 23, 2025

tgross35 commented Jan 23, 2025 •

edited

Loading

quaternic commented Jan 23, 2025

Add fmaf16 #419

Are you sure you want to change the base?

Add fmaf16 #419

Conversation

tgross35 commented Jan 11, 2025

tgross35 commented Jan 22, 2025 • edited Loading

quaternic commented Jan 23, 2025

tgross35 commented Jan 23, 2025 • edited Loading

quaternic commented Jan 23, 2025

tspiteri commented Jan 23, 2025

tgross35 commented Jan 23, 2025 • edited Loading

quaternic commented Jan 23, 2025

Add `fmaf16` #419

Add `fmaf16` #419

tgross35 commented Jan 22, 2025 •

edited

Loading

tgross35 commented Jan 23, 2025 •

edited

Loading

tgross35 commented Jan 23, 2025 •

edited

Loading