Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeInclusive<Integer>::sum Optimization Suggestion #94922

Closed
ghost opened this issue Mar 14, 2022 · 2 comments
Closed

RangeInclusive<Integer>::sum Optimization Suggestion #94922

ghost opened this issue Mar 14, 2022 · 2 comments

Comments

@ghost
Copy link

ghost commented Mar 14, 2022

The idiomatic way to sum a range of integers from a up to b is simply (a..=b).sum(). Yet, this always emits suboptimal codegen, as LLVM cannot perfectly optimise this, yet slightly better mathematics can be employed to do the same, more efficiently.

I suggest that specialisation be applied to range inclusive for integer elements where T::BITS < 128, to exchange a default internal implementation of sum for a better optimised implementation.
(Note that when an intrinsic is created for u128::carrying_{op} to yield the carry value, as opposed to just a boolean indicator, then this could be implemented for all types using carrying arithmetic; tag me then?)

given a, b, of integer type T, and the immediately wider type Wider, one can compute the sum via

// a: T, b: T
let a = Wider::from(a);
let b = Wider::from(b);
let sum_result = ((a + b) * (1 + b).saturating_sub(a) / 2);
// the intermittent values may exceed the bounds of {T}, but the result could still fit
if overflow_checks_enabled && !(Wider::from(T::MIN)..=Wider::from(T::MAX)).contains(&sum_result) {
    panic!("integer underflow/overflow")
} else {
    // if overflow checks are disabled, casting will work as expected, otherwise it fits as-is
    return sum_result as T
}

If this wasn't the right place to have made the issue, please redirect me as appropriate.

@ghost
Copy link
Author

ghost commented Sep 21, 2022

Update, barely coming back around to this, implemented a special case for u128.
I'm using

sum: Z² -> Z
sum(x, y) = [x ≤ y] ⋅ ⌊(x + y) ⋅ (y - x + 1) ÷ 2⌋ (mod 2ⁿ)

This is incorrect because the intermittent operations are modulo 2ⁿ, and the final result after the division has (n - 1) bits.
We can divide the even factor between x + y and y - x + 1 (note that -x ≡ x (mod 2), so these two factors have strictly unequal parities), then multiply, and only the factor being divided will need to use carrying instructions.

sum(x, y) = [x ≤ y] ⋅ {
 ⌊(x + y) ÷ 2⌋ ⋅ (y - x + 1), if 2 | (x + y),
 (x + y) ⋅ ⌊(y - x + 1) ÷ 2⌋, else
}

I rearrange the expression so that only one side needs any carrying instructions:

sum(x, y) = [x ≤ y] ⋅ ⌊(a ⋅ x + y + b) ÷ 2⌋ ⋅ (c ⋅ x + y + d)
a = { +1, if (x + y) is even, -1, else }
b = [(x + y) is odd]
c = { +1, if (x + y) is odd, -1, else }
d = [(x + y) is even]

Concrete implementation looks like so:

type T = usize;

// f(x, y) = (x + y ⋅ 2ⁿ) ÷ 2 = x ÷ 2 + y ⋅ 2^(n - 1)
fn divide_by_2_with_carry(
 x: T,
 carry: bool
) -> T {
 (x >> 1) | (T::from(carry) << (T::BITS - 1))
}

// (cneg(x, y), (0 != x) & y)
fn overflowing_cneg(
 x: T,
 y: bool
) -> (T, bool) {
 if y {
  x.overflowing_neg()
 } else {
  (x, false)
 }
}

fn cneg(x: T, y: bool) -> T {
 if y {
  x.wrapping_neg()
 } else {
  x
 }
}

// sum(x, y) = [x ≤ y] ⋅ (x + y) ⋅ (y - x + 1) ÷ 2
pub fn sum(x: T, y: T) -> T {
 T::from(x <= y) * {
  let is_odd = 1 == 1 & (x ^ y);
  let a = {
   let (result, underflowed) = overflowing_cneg(x, is_odd);
   let (result, overflowed_1) = result.overflowing_add(y);
   let (result, overflowed_2) = result.overflowing_add(T::from(is_odd));
   let carry_bit = underflowed ^ overflowed_1 ^ overflowed_2;
   let result = divide_by_2_with_carry(
    result,
    carry_bit
   );
   result
  };
  let b = cneg(x, !is_odd) + y + T::from(!is_odd);
  a * b
 }
}

(bit ugly without comments, but if it gets in, I'll clean it up)
I'll try discussing this on the Zulip soon, and seeing whether libs is interested in the change.

@ghost
Copy link
Author

ghost commented Oct 6, 2022

Never got around to discussing on Zulip; closing in favour of letting this be fixed by LLVM auto-vectorisation of inclusive ranges instead, whenever that happens.

See #45222 (comment)

@ghost ghost closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants