-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad codegen for array::map-based code #102202
Comments
A more straightforward implementation of |
We could special-case based on |
There is another issue here, the simple |
@the8472 @veber-alex |
The Guard type does just that. |
I wrote variation of the |
@rustbot label +A-codegen +C-bug |
What would it take to get @newpavlov's variation upstreamed? When dealing with vectorized code, it would be really nice to not have to unroll everything manually and use just use |
Testing with |
Another performance pitfall of This wouldn't be so bad if it only concerned I think either |
The code gets compiled into expected assembly on beta, so I think this issue can be closed? Or should we wait for 1.68 to be released? @HadrienG2 |
@newpavlov Monomorphization does not help you if two functions (which may be e.g. two instances of the same generic function) end up calling array::from_fn with the same callable and the same size. I'll roll out a little godbolt later, am on a phone right now. |
After struggling a while to reproduce this with godbolt, I realized that I couldn't because godbolt uses --emit-asm and --emit-asm forces codegen-units = 1, whereas this is a problem that requires multiple codegen units to reproduce. So here's something that works locally:
#[inline(always)]
fn round_trip(input: [usize; 4]) -> [usize; 4] {
let mut iter = input.into_iter();
std::array::from_fn(|_| iter.next().unwrap())
}
pub fn make_array(input: [usize; 4]) -> usize {
round_trip(input)[1]
}
pub fn make_array_again(input: [usize; 4]) -> usize {
round_trip(input)[0]
}
Although round_trip is correctly inlined, no inlining of the array::map inside array::from_fn is performed. |
I suspect the issue here might be that codegen-units splits code into compilation units based on how complex it is before optimization, whereas in the case of iterator-based code (or more generally from_fn or map with callables that use sufficiently complex optimizations), the code is pretty complex before optimization but will become very simple afterwards. Which might explain why codegen-units tends to break "zero-cost abstractions" in my code unless inline is used very liberally in their implementation: these abstractions only turn into efficient code after optimization passes. Resolving this true underlying issue that keeps biting me over and over again would be a largely separate issue that would require a mechanism to feed back information about compiled machine code to the compiler frontend so that in release mode + incremental builds, it can take more informed decisions about upfront splitting into codegen units. Think PGO, but without execution. That's likely very complex to implement though, so for now #[inline] on everything that should be a zero cost abstraction + transitively called functions is a Good Enough Hack. |
I will close this issue, since the particular problem referenced by the OP got resolved on Beta (I assume by a newer LLVM version). But in some situations codegen for @HadrienG2 |
array::map
results in a bad codegen compared to manually unrolled code. This issue can be demonstrated by the following code: https://rust.godbolt.org/z/T7bnTE398Relevant issue: #86912
The text was updated successfully, but these errors were encountered: