-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RangeInclusive<Integer>::sum Optimization Suggestion #94922
Comments
Update, barely coming back around to this, implemented a special case for u128.
This is incorrect because the intermittent operations are modulo 2ⁿ, and the final result after the division has (n - 1) bits.
I rearrange the expression so that only one side needs any carrying instructions:
Concrete implementation looks like so: type T = usize;
// f(x, y) = (x + y ⋅ 2ⁿ) ÷ 2 = x ÷ 2 + y ⋅ 2^(n - 1)
fn divide_by_2_with_carry(
x: T,
carry: bool
) -> T {
(x >> 1) | (T::from(carry) << (T::BITS - 1))
}
// (cneg(x, y), (0 != x) & y)
fn overflowing_cneg(
x: T,
y: bool
) -> (T, bool) {
if y {
x.overflowing_neg()
} else {
(x, false)
}
}
fn cneg(x: T, y: bool) -> T {
if y {
x.wrapping_neg()
} else {
x
}
}
// sum(x, y) = [x ≤ y] ⋅ (x + y) ⋅ (y - x + 1) ÷ 2
pub fn sum(x: T, y: T) -> T {
T::from(x <= y) * {
let is_odd = 1 == 1 & (x ^ y);
let a = {
let (result, underflowed) = overflowing_cneg(x, is_odd);
let (result, overflowed_1) = result.overflowing_add(y);
let (result, overflowed_2) = result.overflowing_add(T::from(is_odd));
let carry_bit = underflowed ^ overflowed_1 ^ overflowed_2;
let result = divide_by_2_with_carry(
result,
carry_bit
);
result
};
let b = cneg(x, !is_odd) + y + T::from(!is_odd);
a * b
}
} (bit ugly without comments, but if it gets in, I'll clean it up) |
Never got around to discussing on Zulip; closing in favour of letting this be fixed by LLVM auto-vectorisation of inclusive ranges instead, whenever that happens. See #45222 (comment) |
The idiomatic way to sum a range of integers from
a
up tob
is simply(a..=b).sum()
. Yet, this always emits suboptimal codegen, as LLVM cannot perfectly optimise this, yet slightly better mathematics can be employed to do the same, more efficiently.I suggest that specialisation be applied to range inclusive for integer elements where
T::BITS < 128
, to exchange a default internal implementation ofsum
for a better optimised implementation.(Note that when an intrinsic is created for
u128::carrying_{op}
to yield the carry value, as opposed to just a boolean indicator, then this could be implemented for all types using carrying arithmetic; tag me then?)given
a
,b
, of integer typeT
, and the immediately wider typeWider
, one can compute the sum viaIf this wasn't the right place to have made the issue, please redirect me as appropriate.
The text was updated successfully, but these errors were encountered: