__rust_u128_mulo aborts when rhs is zero #367

bjorn3 · 2020-07-13T19:22:14Z

Line 78 in 156fcf1

if self > Self::max_value().aborting_div(other) {

This is not a problem when using cg_llvm, as it is never called. However unlike cg_llvm, cg_clif does use it for the checked u128 multiplication implementation.

cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/1063

AaronKutch · 2020-08-16T20:17:33Z

I'm going to try fixing this and improving performance.

AaronKutch · 2020-08-18T00:48:39Z

edit: With some help, I found the answer to be adding some bounds to the associated types inside the associated types, such as type H: HInt<D = Self>; and type D: DInt<H = Self>;.

@nagisa I was having trouble using LargeInt to implement functions using generics when I had a revelation. u128 does not have any primitives larger than it, and u8 does not have any primitives smaller than it. Some functions need to split a larger integer into smaller ones, while others need to use widening operations. If there is only one trait such as LargeInt, then that trait will be fundamentally biased depending on if an impl such as impl LargeInt for u128 exists or if impl LargeInt for u8 exists. The single trait cannot do both at the same time, and that asymmetry will propagate up in subtle ways when implementing functions. What I am trying to do instead is replace LargeInt by two traits:

/// Trait for integers twice the bit width of another integer. This is implemented for all
/// primitives except for `u8`, because there is not a smaller primitive.
pub(crate) trait DInt: Int {
    /// Integer that is half the bit width of the integer this trait is implemented for
    type H: HInt;

    /// Returns the low half of `self`
    fn lo(self) -> Self::H;
    /// Returns the high half of `self`
    fn hi(self) -> Self::H;
    /// Returns the low and high halves of `self` as a tuple
    fn lo_hi(self) -> (Self::H, Self::H);
}

/// Trait for integers half the bit width of another integer. This is implemented for all
/// primitives except for `u128`, because it there is not a larger primitive.
pub(crate) trait HInt: Int {
    /// Integer that is double the bit width of the integer this trait is implemented for
    type D: DInt;

    /// Widens (zero extends or sign extends) the integer to have double bit width
    fn widen(self) -> Self::D;
    /// Widens the integer to have double bit width and shifts the integer into the higher bits
    fn widen_hi(self) -> Self::D;
    /// Constructs a double bit width integer using lower and higher parts
    fn from_lo_hi(lo: Self, hi: Self) -> Self::D;
    /// Widening multiplication. This cannot overflow.
    /// We use `widen_mul` to avoid colliding with `widening_mul` in case RFC PR #2417 is
    /// implemented.
    fn widen_mul(self, rhs: Self) -> Self::D;
}

In practice, these work wonders such as converting this incomprehensible mess:

trait Mul: LargeInt {
    fn mul(self, other: Self) -> Self {
        let half_bits = Self::BITS / 4;
        let lower_mask = !<<Self as LargeInt>::LowHalf>::ZERO >> half_bits;
        let mut low = (self.low() & lower_mask).wrapping_mul(other.low() & lower_mask);
        let mut t = low >> half_bits;
        low &= lower_mask;
        t += (self.low() >> half_bits).wrapping_mul(other.low() & lower_mask);
        low += (t & lower_mask) << half_bits;
        let mut high = Self::low_as_high(t >> half_bits);
        t = low >> half_bits;
        low &= lower_mask;
        t += (other.low() >> half_bits).wrapping_mul(self.low() & lower_mask);
        low += (t & lower_mask) << half_bits;
        high += Self::low_as_high(t >> half_bits);
        high += Self::low_as_high((self.low() >> half_bits).wrapping_mul(other.low() >> half_bits));
        high = high
            .wrapping_add(self.high().wrapping_mul(Self::low_as_high(other.low())))
            .wrapping_add(Self::low_as_high(self.low()).wrapping_mul(other.high()));
        Self::from_parts(low, high)
    }
}

into this (note: I haven't checked this one for translation bugs because of compilation problems, but I have verified the two traits in other simpler generic implementations)

// This generic implementation uses small multiplications to generate large multiplications, hence the `DInt` bound
trait Mul: Int + DInt {
    fn mul(self, other: Self) -> Self {
        let mut tmp0 = self.lo().widen_mul(other.lo());
        let mut sum = tmp0.lo();
        sum += self.hi().wrapping_mul(other.lo());
        tmp0 += sum.widen_hi();
        let mut tmp1 = sum.widen_hi();
        sum = tmp0.hi();
        tmp0 &= lower_mask;
        sum += other.hi().wrapping_mul(self.lo());
        tmp0 += sum.widen_hi();
        tmp1 += sum.hi();
        tmp1 += self.hi().wrapping_mul(other.hi()).widen_hi();
        tmp1 = tmp1
            .wrapping_add(self.hi().wrapping_mul(other.hi()))
            .wrapping_add(self.lo().wrapping_mul(other.hi()).widen_hi());
        tmp0 + tmp1
    }
}

There is one serious problem that I don't know how to get around, however. I run into this error when I do some computation on a different sized integer and transform it back for the return value:

fn f<I: Int + DInt>(x: I) -> I {
    let t = x.lo();
    // ...
    t.widen() // expected type parameter `I`
              found associated type `<<I as int::DInt>::H as int::HInt>::D`
}

fn g<I: Int + HInt>(x: I) -> I {
    let t = x.widen();
    // ...
    t.lo() // expected type parameter `I`
              found associated type `<<I as int::HInt>::D as int::DInt>::H`
}

The type checker doesn't understand that we went in a circle such as DInt -> HInt -> DInt or HInt -> DInt -> HInt. Does someone with deep knowledge of typechecking shenanigans know the best solution to this problem?

The only alternative I know is using a single bikeshedded version of LargeInt that cannot support impl LargetInt for u8 because the vast majority of cases in compiler-builtins would use DInt anyway, and I could manually implement the remaining cases.

amosonn · 2020-09-01T09:14:36Z

You need to add bounds on the associated types, i.e.

trait DInt: Int {
    type H: HInt<D=Self>;
    ...
}

trait HInt: Int {
    type D: DInt<H=Self>;
    ...
}

Toy example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7d22241eb541e4db9500b3e9bbc12e43

AaronKutch · 2020-09-13T23:38:17Z

I almost have the divisionless algorithm working, just that the signed versions are giving me extreme trouble with iX::MIN cases. I have run out of the time I can allocate to rust for this week however, so this will need to wait

nagisa · 2020-09-13T23:53:54Z

I suggest you to copy the algorithm LLVM uses to lower the operation as I implemented in https://reviews.llvm.org/rG73e8a784e62f945a51363c8b5ec4eaedcf9f87e8.

nagisa · 2020-09-13T23:55:05Z

In particular (iNh being half-size, iN being original sized unsigned integer):

%0 = %LHS.HI != 0 && %RHS.HI != 0
%1 = { iNh, i1 } @umul.with.overflow.iNh(iNh %LHS.HI, iNh %RHS.LO)
%2 = { iNh, i1 } @umul.with.overflow.iNh(iNh %RHS.HI, iNh %LHS.LO)
%3 = mul nuw iN (%LHS.LOW as iN), (%RHS.LOW as iN)
%4 = add iN (%1.0 as iN) << Nh, (%2.0 as iN) << Nh
%5 = { iN, i1 } @uadd.with.overflow.iN( %4, %3 )
%res = { %5.0, %0 || %1.1 || %2.1 || %5.1 }

(edit: ah, signed, yeah I had trouble with them too!)

AaronKutch · 2020-09-14T00:07:56Z

Is it doing a widening multiplication, and checking if the high part is not zero? That is the fastest way to do it, but only up to the largest widening multiplication supported on an architecture. I have a algorithm based on leading_zeros that I am developing for u128, that doesn't require full u128 by u128 widening multiplication, which I think should be faster.

nagisa · 2020-09-14T00:21:36Z

Is it doing a widening multiplication, and checking if the high part is not zero?

No, none of the operations in this are “widening” in a sense that there are no 256-bit operations involved for a 128-bit UMULO (but there are some simpler 128-bit operations). Here's the algorithm above with types substituted for u128 umulo, maybe it'll be clearer:

%0 = %LHS.HI != 0 && %RHS.HI != 0
%1 = { i64, i1 } @umul.with.overflow.i64(i64 %LHS.HI, i64 %RHS.LO)
%2 = { i64, i1 } @umul.with.overflow.i64(i64 %RHS.HI, i64 %LHS.LO)
%3 = mul nuw i128 (%LHS.LOW as i128), (%RHS.LOW as i128)          ; note: regular multiplication, can't wrap
%4 = add i128 (%1.0 as i128) << 64, (%2.0 as i128) << 64
%5 = { i128, i1 } @uadd.with.overflow.i128(i128 %4, i128 %3)
%res = { %5.0, %0 || %1.1 || %2.1 || %5.1 }                                             ; res is { i128, i1 }

Do note that this is the unsigned version, I couldn't figure out one for signed integers.

AaronKutch · 2020-09-14T00:37:04Z

Oh I see what is happening. I will use that instead.

AaronKutch · 2020-12-08T21:54:16Z

This is fixed in master now.

AaronKutch · 2020-12-08T21:57:29Z

@alexcrichton can a new compiler-builtins version be released? It would be nice to have all the changes I made make it into Rust 1.50.

alexcrichton · 2020-12-09T15:15:38Z

Thanks for the ping! Looks like Amanieu just took care of it a few hours ago, so I'll go ahead and close this.

AaronKutch mentioned this issue Aug 16, 2020

#[aapcs_on_arm] is applied to two functions with no floating point #373

Closed

AaronKutch mentioned this issue Aug 31, 2020

Greatly improve division performance for u128 and other cases #332

Merged

AaronKutch mentioned this issue Oct 13, 2020

Overhaul split integer handling and fix numerous issues #384

Merged

alexcrichton closed this as completed Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

__rust_u128_mulo aborts when rhs is zero #367

__rust_u128_mulo aborts when rhs is zero #367

bjorn3 commented Jul 13, 2020

AaronKutch commented Aug 16, 2020

AaronKutch commented Aug 18, 2020 •

edited

Loading

amosonn commented Sep 1, 2020 •

edited

Loading

AaronKutch commented Sep 13, 2020

nagisa commented Sep 13, 2020

nagisa commented Sep 13, 2020 •

edited

Loading

AaronKutch commented Sep 14, 2020

nagisa commented Sep 14, 2020 •

edited

Loading

AaronKutch commented Sep 14, 2020

AaronKutch commented Dec 8, 2020

AaronKutch commented Dec 8, 2020

alexcrichton commented Dec 9, 2020

__rust_u128_mulo aborts when rhs is zero #367

__rust_u128_mulo aborts when rhs is zero #367

Comments

bjorn3 commented Jul 13, 2020

AaronKutch commented Aug 16, 2020

AaronKutch commented Aug 18, 2020 • edited Loading

amosonn commented Sep 1, 2020 • edited Loading

AaronKutch commented Sep 13, 2020

nagisa commented Sep 13, 2020

nagisa commented Sep 13, 2020 • edited Loading

AaronKutch commented Sep 14, 2020

nagisa commented Sep 14, 2020 • edited Loading

AaronKutch commented Sep 14, 2020

AaronKutch commented Dec 8, 2020

AaronKutch commented Dec 8, 2020

alexcrichton commented Dec 9, 2020

AaronKutch commented Aug 18, 2020 •

edited

Loading

amosonn commented Sep 1, 2020 •

edited

Loading

nagisa commented Sep 13, 2020 •

edited

Loading

nagisa commented Sep 14, 2020 •

edited

Loading