Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

widening_mul #2417

Closed
wants to merge 5 commits into from
Closed

widening_mul #2417

wants to merge 5 commits into from

Conversation

strake
Copy link
Contributor

@strake strake commented Apr 24, 2018

@le-jzr
Copy link

le-jzr commented Apr 24, 2018

I think we could come up with a better name for the method, but otherwise this is something I'd very much like to see added. Given that on some architectures (x86 to name one), multiplication always produces result twice the width of the operands, it seems rather inefficient to not provide a way to use the higher half.

@strake
Copy link
Contributor Author

strake commented Apr 24, 2018

@le-jzr We can also choose an efficient-as-possible implementation on architectures where it doesn't and save users the bother.

Some name ideas:

  • carrying_mul
  • double_mul
  • double_wide_mul

@strake
Copy link
Contributor Author

strake commented Apr 24, 2018

For reference, here is the generic definition from my u2size crate:

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let n = 0usize.count_zeros() >> 1;
    let halves = |x| (x >> n, x & (!0 >> n));
    let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
    let (z2, z0) = (x1 * y1, x0 * y0);
    let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
    let (w, c) = usize::overflowing_add(z0, z1 << n);
    (z2 + z1 >> n + c as u32, w)
}

The emitted machine code for this is quite heinous.

@Ixrec
Copy link
Contributor

Ixrec commented Apr 24, 2018

Having never heard of this operation before, but immediately understanding why it would be useful, I think the name needs to have some form of the word "wide" in it.

carrying_mul sounds as if regular multiplication doesn't "carry the 1", which would just be inaccurate. double_mul sounds like multiplying twice, or multiplying by two, which are totally different operations. double_wide_mul I'd be fine with, but is probably overkill unless there's a quadruple_wide_mul we need to contrast it with. Finally, std already has wrapping_mul and overflowing_mul methods on these types, so for naming consistency I would call it widening_mul.

@le-jzr
Copy link

le-jzr commented Apr 24, 2018

After a bit of though, I see the logic behind widening_mul. It could work.

@tspiteri
Copy link

tspiteri commented Apr 24, 2018

@strake I know it's not really the point, but couldn't you do something like this?

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let usize_bits = 0usize.count_zeros();
    if usize_bits <= 32 {
        let z = (x as u64) * (y as u64);
        ((z >> usize_bits) as usize, z as usize)
    } else if usize_bits <= 64 {
        let z = (x as u128) * (y as u128);
        ((z >> usize_bits) as usize, z as usize)
    } else {
        // fallback
        let n = usize_bits >> 1;
        let halves = |x| (x >> n, x & (!0 >> n));
        let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
        let (z2, z0) = (x1 * y1, x0 * y0);
        let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
        let (w, c) = usize::overflowing_add(z0, z1 << n);
        (z2 + z1 >> n + c as u32, w)
    }
}

@devonhollowood
Copy link

Maybe instead of returning a tuple, it would make sense to return a struct? So e.g. u32::carrying_mul() would return a carried_product<u32>. This struct would have high and low fields. This makes it hard to confuse the fields, which would otherwise be an error-prone and hard to lint against. (Feel free to bikeshed on all the names here.)

@eddyb
Copy link
Member

eddyb commented Apr 25, 2018

I have only seen this called "widening multiply", but mostly in low-level (compiler/ISA) contexts.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 25, 2018

I do wonder why this is a Mul impl on u2size rather than a mul of two usize values that returns a u2size (or equivalent such as a 2-tuple of usize, which I'd prefer over a new type here).

@scottmcm
Copy link
Member

The broader rust project already has this in compiler-builtins (https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/int/mul.rs#L7), so I'd personally be happy to have it in core instead.

A quick playground experiment shows the ASM is simpler for returning (low, high) with u64. (For smaller types it didn't matter.) Conveniently that's consistent with overflowing_mul which also returns the low part of the result as the .0 part of the tuple.

But I wonder if this really wants to be ternary instead, like

fn mul_with_carry(self, multiplier: Self, carry: Self) -> (Self, Self)

@nagisa
Copy link
Member

nagisa commented Apr 25, 2018 via email

@tspiteri
Copy link

Regarding output tuple order, I think it could be made more obvious using the method name. For example fn mul_low_high(self, multiplier: Self) -> (Self, Self) makes it more obvious that ret.0 is the low word and ret.1 is the high word, and let (high, low) = a.mul_low_high(b); sets off alarm bells.

@hanna-kruppe
Copy link

To me this operation falls in the same category as leading/trailing zeros, rotates, bit reverse, saturation, etc. in that it

  • is relatively niche, but important for some crucial routines and sometimes provided by hardware
  • is not provided directly by C and friends, and thus generally implemented with intrinsics or in terms of other operators (e.g., rotate ~= bit shifts and bitwise or) that then have to be pattern matched against to recognize the operation
  • (on some targets, at least) requires special casing by instruction selection to achieve good codegen, so the aforementioned pattern matching is important

For these operations I'm in favor of providing them in the Rust standard library, both to offer a canonical way to write out these important primitives and to make it easier for the compiler to generate good code reliably (i.e., no need to pattern match a particular way to implement the operation).

@joshtriplett joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Apr 25, 2018
@strake
Copy link
Contributor Author

strake commented Apr 26, 2018

@joshtriplett the Mul impl is merely an example of use; the actual method is fn(usize, usize) -> (usize, usize).

@nagisa yes, it does for me too, but i as a crate author worry whether it will remain so on other targets — will it generate such code on a 32-bit machine? a 128-bit machine? is Rust willing to preclude ever supporting 128-bit targets (e.g. RV128I)? I would prefer to have a method which means efficient double-wide multiplication rather than rely on the compiler to figure out what i mean and myself to not mess it up every time.

@rkruppe thanks for verbalizing these points — yes, the notion is to have a canonical form so neither the compiler nor a future reader needs to guess what is going on.

widening_mul LGTM, will modify RFC when we reach a consensus.

@clarfonthey
Copy link
Contributor

clarfonthey commented Apr 26, 2018

Personally I think that the return type should be the next size up of an integer type. I wouldn't mind creating u256 and i256 types that only allow bit operations and casting, but we can also just not define this for 128-bit integers.

@joshtriplett
Copy link
Member

@clarcharr I'd love to have u256 (and maybe even u512) available, but not with just a limited set of operations. But that aside, I think we can universally define these operations as a function from T, T to (T, T), even for the largest type we have. Having versions that turn into the next larger type, for every type other than the largest, seems useful as well, but I'd hope that the LLVM optimizer makes it unnecessary to have two different versions.

@JonSeverinsson
Copy link

As the order of the high and low half of the result tuple is not obvious suggests to me that a tuple is probably not the best return type. The obvious solution would be a struct with two fields named "high" and "low", but I'm wondering if simply returning a wider integer type is not simpler (though that means i128/u128 will unfortunately lack the method).

I.e. add the following functions (and the equivalent for unsigned integers) to the standard library.

impl i8 {
    pub fn widening_mul(self, rhs: Self) -> i16 { ... }
}
impl i16 {
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
}
impl i32 {
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
}
impl i64 {
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}
impl isize {
    #[cfg(target_pointer_width = "16")]
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
    #[cfg(target_pointer_width = "32")]
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
    #[cfg(target_pointer_width = "64")]
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}

@eddyb
Copy link
Member

eddyb commented Apr 26, 2018

@JonSeverinsson Not sure there is much of a point in doing that because you can simply cast to the wider type and perform the multiplication.

@le-jzr
Copy link

le-jzr commented Apr 26, 2018

Returning twice as wide type instead of a tuple has the issue that it wouldn't work on usize, which is the canonical use case for this kind of operation (at least until someone adds ireg/ureg).

@strake
Copy link
Contributor Author

strake commented Apr 26, 2018

I'd be fine with returning either a tuple or a struct. As @le-jzr noted, returning twice-as-wide type is impossible for usize (unless we define u2size in core, which i consider far out of scope of this RFC).

@kennytm
Copy link
Member

kennytm commented Apr 26, 2018

Does it make sense to define widening_mul on signed integers?

fn widening_mul(self, other: isize) -> (isize, usize);

@clarfonthey
Copy link
Contributor

@kennytm absolutely imho.

@scottmcm
Copy link
Member

scottmcm commented May 4, 2018

Double-wide multiplication is a prerequisite of arbitrary-precision multiplication

I took a stab at the simplest version of that using the proposed method, and I think it ends up being something like this:

fn scalar_mul_assign(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (low, high) = i.wide_mul(scalar);
        let (low, overflowed) = low.overflowing_add(carry);
        *i = low;
        carry = high + if overflowed { 1 } else { 0 };
    }
    if carry > 0 {
        big.push(carry);
    }
}

That's not great, and I doubt LLVM will do a great job with it.

So I think it'd be good to have the carry built-in, so it can be something like

pub fn scalar_mul_assign2(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (x, c) = i.mul_with_carry(scalar, carry);
        *i = x;
        carry = c;
    }
    if carry > 0 {
        big.push(carry);
    }
}

Since one can always pass 0 as the carry, which LLVM will fold away easily.

@Pratyush
Copy link

Pratyush commented May 6, 2018

+1, would make writing certain cryptographic code very simple

@clarfonthey
Copy link
Contributor

One interesting use case for this might be a more efficient version of the multiply_and_divide function that num-integer uses internally:

/// Calculate r * a / b, avoiding overflows and fractions.
///
/// Assumes that b divides r * a evenly.
fn multiply_and_divide<T: Integer + Clone>(r: T, a: T, b: T) -> T {
    // See http://blog.plover.com/math/choose-2.html for the idea.
    let g = gcd(r.clone(), b.clone());
    r/g.clone() * (a / (b/g))
}

I mean, num-integer specifically almost certainly won't use this as it's not backwards compatible, but it is a nice use case.

@strake
Copy link
Contributor Author

strake commented Jul 3, 2018

ping

have we reached consensus on the name widening_mul?

@Centril Centril added A-arithmetic Arithmetic related proposals & ideas A-inherent-impl Proposals related to inherent implementations A-primitive Primitive types related proposals & ideas labels Nov 22, 2018
@sm-Fifteen
Copy link

I'd like to point out that the apfloat component of the official librustc has had a widening_mul function for well over a year, so there's some precedent for this in official rust projects, just not in the language itself.

@TheIronBorn
Copy link

The rand library uses wmul

@eddyb
Copy link
Member

eddyb commented Dec 1, 2018

@sm-Fifteen It is no coincidence I wrote that function as a non-allocating bigint multiplication helper when I ported APFloat from C++ and #2417 (comment) - so don't count my "vote" for widening_mul twice, heh.

@strake strake changed the title carrying_mul widening_mul Dec 5, 2018
@strake
Copy link
Contributor Author

strake commented Dec 5, 2018

Consensus for name seems to be widening_mul; modified RFC.

@fstirlitz
Copy link

What's the rationale for having u2size at all? Given that usize is supposed to be used for pointers, offsets and memory sizes, if multiplication involving it overflows, it means the result represents an amount of memory you cannot possibly allocate or refer to. You shouldn't even care what the value is: all that really matters is that it's too large to handle.

@clarfonthey
Copy link
Contributor

@fstirlitz One use case would be intermediate results that may overflow, e.g. rationals. While the end result of x*y/z might fit in a usize, the product x*y may not.

@strake
Copy link
Contributor Author

strake commented Jan 21, 2019

@fstirlitz I had 2 motivating use cases for u2size:

  • Intermediate values in arbitrary-precision multiplication
  • The which-trees-present value in smoothsort, which is word_size/log₂((1+√5)/2) bits

@lu-zero
Copy link

lu-zero commented Jan 29, 2019

Would be good to have the variants that return the wider type directly, since it has direct uses for a number of common dsp operations.

@strake
Copy link
Contributor Author

strake commented Jan 29, 2019

@lu-zero u2size is not in core/std; i merely cite it as an example use of this operation. For fixed-width types, you can merely cast to the wider type before multiplying, and the emitted code will use the machine instruction for widening multiplication.

@hanna-kruppe
Copy link

hanna-kruppe commented Jun 21, 2019

I think it's sad that this petered out. This is a small, useful API addition which apparently isn't too controversial or prone to vigorous bikeshedding (except the name, but that's now resolved). Maybe we should just skip ahead to implementing it? Evidently the RFC process is not working well for keeping this RFC PR moving.

@Centril
Copy link
Contributor

Centril commented Jun 21, 2019

Maybe we should just skip ahead to implementing it?

That's usually a good idea for small libs features.

@clarfonthey
Copy link
Contributor

So I found myself needing this again today, and it looks like the overall decision was to just implement this? I would be willing to help do that, but I honestly have no idea how -- I'm not sure what LLVM intrinsics are used to do wide-multiplication, and I couldn't find the docs for it. If we at least add some intrinsics for this it would make implementing a working solution easier, and we can figure out the exact API later.

@scottmcm
Copy link
Member

scottmcm commented May 6, 2021

@clarfonthey there's no intrinsic for it; it represents it with n[su]w multiplication on the wider integer type.

Demo, with beautiful assembly: https://rust.godbolt.org/z/86of1z184

(I do think that people writing Rust shouldn't need to know that, though, and I could imagine the best way to implement it in cranelift could be different, which would be another reason to have an exposed specialized method for it.)

m-ou-se added a commit to m-ou-se/rust that referenced this pull request Aug 31, 2021
…ou-se

Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers

This comes in part from my own attempts to make (crude) big integer implementations, and also due to the stalled discussion in [RFC 2417](rust-lang/rfcs#2417). My understanding is that changes like these are best offered directly as code and then an RFC can be opened if there needs to be more discussion before stabilisation. Since all of these methods are unstable from the start, I figured I might as well offer them now.

I tried looking into intrinsics, messed around with a few different implementations, and ultimately concluded that these are "good enough" implementations for now to at least put up some code and maybe start bikeshedding on a proper API for these.

For the `carrying_add` and `borrowing_sub`, I tried looking into potential architecture-specific code and realised that even using the LLVM intrinsics for `addcarry` and `subborrow` on x86 specifically, I was getting exactly the same assembly as the naive implementation using `overflowing_add` and `overflowing_sub`, although the LLVM IR did differ because of the architecture-specific code. Longer-term I think that they would be best suited to specific intrinsics as that would make optimisations easier (instructions like add-carry tend to use implicit flags, and thus can only be optimised if they're done one-after-another, and thus it would make the most sense to have compact intrinsics that can be merged together easily).

For `widening_mul` and `carrying_mul`, for now at least, I simply cast to the larger type and perform arithmetic that way, since we currently have no intrinsic that would work better for 128-bit integers. In the future, I also think that some form of intrinsic would work best to cover that case, but for now at least, I think that they're "good enough" for now.

The main reasoning for offering these directly to the standard library even though they're relatively niche optimisations is to help ensure that the code generated for them is optimal. Plus, these operations alone aren't enough to create big integer implementations, although they could help simplify the code required to do so and make it a bit more accessible for the average implementor.

That said, I 100% understand if any or all of these methods are not desired simply because of how niche they are. Up to you. 🤷🏻
@scottmcm
Copy link
Member

This was accepted for nightly without the RFC, tracking issue rust-lang/rust#85532

https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.widening_mul
https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.carrying_mul

As such, I'm going to close this on the assumption that it's no longer needed. Feel free to reopen if so desired.

@scottmcm scottmcm closed this Sep 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-arithmetic Arithmetic related proposals & ideas A-inherent-impl Proposals related to inherent implementations A-primitive Primitive types related proposals & ideas Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.