RangeInclusive<Integer>::sum Optimization Suggestion #94922

ghost · 2022-03-14T02:09:41Z

The idiomatic way to sum a range of integers from a up to b is simply (a..=b).sum(). Yet, this always emits suboptimal codegen, as LLVM cannot perfectly optimise this, yet slightly better mathematics can be employed to do the same, more efficiently.

I suggest that specialisation be applied to range inclusive for integer elements where T::BITS < 128, to exchange a default internal implementation of sum for a better optimised implementation.
(Note that when an intrinsic is created for u128::carrying_{op} to yield the carry value, as opposed to just a boolean indicator, then this could be implemented for all types using carrying arithmetic; tag me then?)

given a, b, of integer type T, and the immediately wider type Wider, one can compute the sum via

// a: T, b: T
let a = Wider::from(a);
let b = Wider::from(b);
let sum_result = ((a + b) * (1 + b).saturating_sub(a) / 2);
// the intermittent values may exceed the bounds of {T}, but the result could still fit
if overflow_checks_enabled && !(Wider::from(T::MIN)..=Wider::from(T::MAX)).contains(&sum_result) {
    panic!("integer underflow/overflow")
} else {
    // if overflow checks are disabled, casting will work as expected, otherwise it fits as-is
    return sum_result as T
}

If this wasn't the right place to have made the issue, please redirect me as appropriate.

The text was updated successfully, but these errors were encountered:

ghost · 2022-09-21T07:43:49Z

Update, barely coming back around to this, implemented a special case for u128.
I'm using

sum: Z² -> Z
sum(x, y) = [x ≤ y] ⋅ ⌊(x + y) ⋅ (y - x + 1) ÷ 2⌋ (mod 2ⁿ)

This is incorrect because the intermittent operations are modulo 2ⁿ, and the final result after the division has (n - 1) bits.
We can divide the even factor between x + y and y - x + 1 (note that -x ≡ x (mod 2), so these two factors have strictly unequal parities), then multiply, and only the factor being divided will need to use carrying instructions.

sum(x, y) = [x ≤ y] ⋅ {
 ⌊(x + y) ÷ 2⌋ ⋅ (y - x + 1), if 2 | (x + y),
 (x + y) ⋅ ⌊(y - x + 1) ÷ 2⌋, else
}

I rearrange the expression so that only one side needs any carrying instructions:

sum(x, y) = [x ≤ y] ⋅ ⌊(a ⋅ x + y + b) ÷ 2⌋ ⋅ (c ⋅ x + y + d)
a = { +1, if (x + y) is even, -1, else }
b = [(x + y) is odd]
c = { +1, if (x + y) is odd, -1, else }
d = [(x + y) is even]

Concrete implementation looks like so:

type T = usize;

// f(x, y) = (x + y ⋅ 2ⁿ) ÷ 2 = x ÷ 2 + y ⋅ 2^(n - 1)
fn divide_by_2_with_carry(
 x: T,
 carry: bool
) -> T {
 (x >> 1) | (T::from(carry) << (T::BITS - 1))
}

// (cneg(x, y), (0 != x) & y)
fn overflowing_cneg(
 x: T,
 y: bool
) -> (T, bool) {
 if y {
  x.overflowing_neg()
 } else {
  (x, false)
 }
}

fn cneg(x: T, y: bool) -> T {
 if y {
  x.wrapping_neg()
 } else {
  x
 }
}

// sum(x, y) = [x ≤ y] ⋅ (x + y) ⋅ (y - x + 1) ÷ 2
pub fn sum(x: T, y: T) -> T {
 T::from(x <= y) * {
  let is_odd = 1 == 1 & (x ^ y);
  let a = {
   let (result, underflowed) = overflowing_cneg(x, is_odd);
   let (result, overflowed_1) = result.overflowing_add(y);
   let (result, overflowed_2) = result.overflowing_add(T::from(is_odd));
   let carry_bit = underflowed ^ overflowed_1 ^ overflowed_2;
   let result = divide_by_2_with_carry(
    result,
    carry_bit
   );
   result
  };
  let b = cneg(x, !is_odd) + y + T::from(!is_odd);
  a * b
 }
}

(bit ugly without comments, but if it gets in, I'll clean it up)
I'll try discussing this on the Zulip soon, and seeing whether libs is interested in the change.

ghost · 2022-10-06T20:46:59Z

Never got around to discussing on Zulip; closing in favour of letting this be fixed by LLVM auto-vectorisation of inclusive ranges instead, whenever that happens.

See #45222 (comment)

ghost closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeInclusive<Integer>::sum Optimization Suggestion #94922

RangeInclusive<Integer>::sum Optimization Suggestion #94922

ghost commented Mar 14, 2022 •

edited by ghost

Loading

ghost commented Sep 21, 2022 •

edited by ghost

Loading

ghost commented Oct 6, 2022 •

edited by ghost

Loading

RangeInclusive<Integer>::sum Optimization Suggestion #94922

RangeInclusive<Integer>::sum Optimization Suggestion #94922

Comments

ghost commented Mar 14, 2022 • edited by ghost Loading

ghost commented Sep 21, 2022 • edited by ghost Loading

ghost commented Oct 6, 2022 • edited by ghost Loading

ghost commented Mar 14, 2022 •

edited by ghost

Loading

ghost commented Sep 21, 2022 •

edited by ghost

Loading

ghost commented Oct 6, 2022 •

edited by ghost

Loading