-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iter with step_by(2) performs slowly #59281
Comments
Interesting enough, if you are using
|
For tiny loop bodies like this it's not unusual for internal iteration to be faster -- |
Godbolt: https://rust.godbolt.org/z/LRTDre Also related: #57517 |
Triage: no change |
I ran into this while implementing the inner loop of a rasterizer. Internal iteration or not, I find it a bit unfortunate that there's no obvious equivalent for the "classic" Edit: apparently there was a fix proposed but it had to be reverted :( 3 Footnotes |
@jdahlstrom Make sure you compare equivalent semantics. Of course, the "classic" loop of |
@scottmcm True, and I appreciate that Rust does the right thing there, but still it seems unfortunate that the penalty for doing the right thing is so great in the common case (where (On the other hand, widening the induction variable of the |
This is a really interesting one. It looks like https://rust.godbolt.org/z/15W97roTo it has trouble removing the Might be worth experimenting with that a bit to see whether there's a reduced example that could be an A-llvm issue. The "obvious" version it does just fine (<https://rust.godbolt.org/z/4jhj6WTdG>) so there's something confounding going on here. |
Note: this comment has been edited to fix an incorrect assumption. The incorrect assumption was that it was rustc, not LLVM, that was optimizing the
It seems that the "obvious" version /// Optimized as long as the compiler knows:
/// - (a) the value of y;
/// - and (b) that x <= u32::MAX - y.
/// Note that the order of (a) and (b) also matters.
pub fn optimized(x: u32, y: u32) -> u32 {
assert!(y == 42); // cannot be moved down
assert!(x <= u32::MAX - y);
x.checked_add(y).unwrap()
}
/// The check for overflow is not yet optimized away, even though
/// x <= u32::MAX - y.
pub fn not_optimized_yet(x: u32, y: u32) -> u32 {
assert!(x <= u32::MAX - y);
x.checked_add(y).unwrap()
} —https://rust.godbolt.org/z/rWvrqjME1 (I used |
Is it? In https://rust.godbolt.org/z/Yov3GfTqc the MIR still contains the call to |
My reasoning was based on my example that is optimized not calling But isn't LLVM IR the boundary between rustc and LLVM? |
rustc emits LLVM-IR, yes, but if optimizations are on then the You can pass opt-level=0 to see the often-horrifying IR that rustc directly produces. |
Oh, I didn't know that : \
A bit off topic (and please feel free to hide this comment), but wouldn't that also affect optimizations performed by rustc? For example, I notice that with Would |
You can pass The zero-optimization-fuel approach is interesting. I don't know if all the LLVM optimizations actually hook into that properly. If they do it sounds like an interesting approach, though. |
|
Comparing |
#111850 fixed this. https://rust.godbolt.org/z/h1M73Tf34 The iter and loop assembly look a bit different but in both cases there are just 3 branches in the loop:
A big improvement compared to the rustc 1.33 version which had tons of branches The benchmarks from #59281 (comment) look identical now:
So I think this can be closed. |
Comparing the performance between these similar functions, the one that uses the iterator with the step_by(2) function performs significantly slower than the one using a traditional "while loop".
Running the experimental
cargo bench --all-targets
produced the following results for me using the nightly-x86_64-unknown-gnu toolchain (1.35.0-nightly):The text was updated successfully, but these errors were encountered: