-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flat_map fails to autovectorize internal iterators #104914
Comments
#[no_mangle]
fn flat_map(query: &mut [&mut [f32]]) {
let iter = query.iter_mut().flat_map(|segment| segment.iter_mut());
for value in iter {
*value += 1.0;
}
} That's still external iteration. If you want internal iteration use fn flat_map(query: &mut [&mut [f32]]) {
query.iter_mut().flat_map(AsMut::as_mut).for_each(|value| *value += 1.0);
} which does vectorize. |
Is there a way for us to get it to vectorize without internal iteration? The aforementioned case in Bevy cannot be separated like |
Well, Other than that you could see if you can structure your data to return arrays or an iterator-of-iterators so users could write their own nested loops. |
The implementation we have looks very similar.
This unfortunately is a 100% deal breaker since that fundamentally breaks the UX we're aiming to expose, and potentially forces |
@rustbot label +A-codegen +I-slow |
This isn't an iterator method. You'll want to override the default |
The intent was largely for UX (
Is it possible to implement those functions in stable Rust right now? I'm getting an error saying that Again ideally we wouldn't need this and the external iteration option optimizes to the same result. This isn't what I would call a zero-cost abstraction if users need to make a conscious choice to choose one over the other purely for performance purposes. |
Well, at least
Yes, that's a long-known issue (see the blog post linked previously) but there aren't any good plans how to fix it. |
… Iterator combinators (#6773) # Objective After #6547, `Query::for_each` has been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However, `Query::for_each` isn't idiomatic Rust, and lacks the flexibility of iterator combinators. Ideally, `Query::iter` and friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations. ## Solution This is an intermediate solution and refactor. This moves the `Query::for_each` implementation onto the `Iterator::fold` implementation for `QueryIter` instead. This should result in the same automatic vectorization optimization on all `Iterator` functions that internally use fold, including `Iterator::for_each`, `Iterator::count`, etc. With this, it should close the gap between the two completely. Internally, this PR changes `Query::for_each` to use `query.iter().for_each(..)` instead of the duplicated implementation. Separately, the duplicate implementations of internal iteration (i.e. `Query::par_for_each`) now use portions of the current `Query::for_each` implementation factored out into their own functions. This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in `for_each` and `par_iter().for_each()`. --- ## Changelog Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`, and `Query::for_each_mut` have been moved to `QueryIter`'s `Iterator::for_each` implementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.12 and will be removed in 0.13. --------- Co-authored-by: JoJoJet <21144246+JoJoJet@users.noreply.github.com>
… Iterator combinators (#6773) # Objective After #6547, `Query::for_each` has been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However, `Query::for_each` isn't idiomatic Rust, and lacks the flexibility of iterator combinators. Ideally, `Query::iter` and friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations. ## Solution This is an intermediate solution and refactor. This moves the `Query::for_each` implementation onto the `Iterator::fold` implementation for `QueryIter` instead. This should result in the same automatic vectorization optimization on all `Iterator` functions that internally use fold, including `Iterator::for_each`, `Iterator::count`, etc. With this, it should close the gap between the two completely. Internally, this PR changes `Query::for_each` to use `query.iter().for_each(..)` instead of the duplicated implementation. Separately, the duplicate implementations of internal iteration (i.e. `Query::par_for_each`) now use portions of the current `Query::for_each` implementation factored out into their own functions. This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in `for_each` and `par_iter().for_each()`. --- ## Changelog Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`, and `Query::for_each_mut` have been moved to `QueryIter`'s `Iterator::for_each` implementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.13 and will be removed in 0.14. --------- Co-authored-by: JoJoJet <21144246+JoJoJet@users.noreply.github.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
… Iterator combinators (#6773) # Objective After #6547, `Query::for_each` has been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However, `Query::for_each` isn't idiomatic Rust, and lacks the flexibility of iterator combinators. Ideally, `Query::iter` and friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations. ## Solution This is an intermediate solution and refactor. This moves the `Query::for_each` implementation onto the `Iterator::fold` implementation for `QueryIter` instead. This should result in the same automatic vectorization optimization on all `Iterator` functions that internally use fold, including `Iterator::for_each`, `Iterator::count`, etc. With this, it should close the gap between the two completely. Internally, this PR changes `Query::for_each` to use `query.iter().for_each(..)` instead of the duplicated implementation. Separately, the duplicate implementations of internal iteration (i.e. `Query::par_for_each`) now use portions of the current `Query::for_each` implementation factored out into their own functions. This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in `for_each` and `par_iter().for_each()`. --- ## Changelog Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`, and `Query::for_each_mut` have been moved to `QueryIter`'s `Iterator::for_each` implementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.13 and will be removed in 0.14. --------- Co-authored-by: JoJoJet <21144246+JoJoJet@users.noreply.github.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
I expected to see this happen: Both versions of the function to autovectorize.
Instead, this happened: only the normal for-loop version autovectorizes. See https://godbolt.org/z/caTqoq3x3.
Meta
rustc --version --verbose
:Also happens on nightly and for other "flattened" iterators. For example, Bevy's QueryIter and Query::for_each show this exact API split due to this performance difference.
There seems to be #87411 that was targeting something similar.
The text was updated successfully, but these errors were encountered: