Bad codegen for array::map-based code #102202

newpavlov · 2022-09-23T17:38:39Z

array::map results in a bad codegen compared to manually unrolled code. This issue can be demonstrated by the following code: https://rust.godbolt.org/z/T7bnTE398

Relevant issue: #86912

The text was updated successfully, but these errors were encountered:

newpavlov · 2022-09-23T17:50:18Z

A more straightforward implementation of array::map makes codegen identical: https://rust.godbolt.org/z/a9csPcnob But it's unclear how to properly make it panic-safe in regards to f, i.e. we need a way to properly drop created Us after f has panicked.

the8472 · 2022-09-23T18:40:06Z

We could special-case based on needs_drop::<U>?

veber-alex · 2022-09-23T19:58:22Z

There is another issue here, the simple map function produces very bad assembly on nightly.

newpavlov · 2022-09-23T20:12:36Z

@the8472
No, I think we simply need something like ReadBuf to track number of initialized elements. With no-drop types the additional code should get eliminated.

@veber-alex
It looks like a clear nightly regression. I will open an issue for it.

the8472 · 2022-09-23T20:21:14Z

I think we simply need something like ReadBuf to track number of initialized elements

The Guard type does just that.
The thing is that the surrounding code is super-generic and relies heavily on the optimizer. That it runs on two loop conditions (0..N and iter.next()) probably doesn't help, LLVM is bad at optimizing that kind of thing.

newpavlov · 2022-09-23T20:34:09Z

I wrote variation of the Guard and it properly compiles into expected assembly. Dunno what exactly makes the difference.

Rageking8 · 2022-09-24T16:14:34Z

@rustbot label +A-codegen +C-bug

InBetweenNames · 2022-11-30T01:48:46Z

What would it take to get @newpavlov's variation upstreamed? When dealing with vectorized code, it would be really nice to not have to unroll everything manually and use just use map.

memoryruins · 2022-12-07T17:24:28Z

Testing with rustc 1.67.0-nightly (e1d819583 2022-12-05), the initial example using <[T; N]>::map in foo2 produces the same codegen as the manually unrolled code. Whichever change improved its codegen has not reached beta yet.

HadrienG2 · 2023-03-02T19:14:01Z

Another performance pitfall of array::map is that it isn't marked #[inline], and sometimes (think codegen-units != 1), the compiler will take this opportunity to fail to inline it, resulting in disaster when the mapping is very simple and at the bottom of a hot loop.

This wouldn't be so bad if it only concerned array::map, which performance-conscious people should use sparingly anyway (overly enthusiastic use tends to create lots of data movement to and from the stack, which LLVM can have a hard time optimizing). But the problem as far as I am concerned is that array::map is also currently a building block of array::from_fn, which is pretty much the standard way to safely initialize arrays of inhomogeneous values and unknown length in current Rust. And not being able to use that in performance-sensitive code is a much greater loss.

I think either array::map should be made #[inline], or the implementation of array::from_fn should be changed to refrain from relying on array::map.

newpavlov · 2023-03-03T07:57:41Z

The code gets compiled into expected assembly on beta, so I think this issue can be closed? Or should we wait for 1.68 to be released?

@HadrienG2
Can you provide a small example which would demonstrate this issue? I thought that because of monomorphization (array::map is generic over both N and F) #[inline] on generic functions is generally redundant.

HadrienG2 · 2023-03-03T11:16:10Z

@newpavlov Monomorphization does not help you if two functions (which may be e.g. two instances of the same generic function) end up calling array::from_fn with the same callable and the same size. I'll roll out a little godbolt later, am on a phone right now.

HadrienG2 · 2023-03-03T19:38:03Z

After struggling a while to reproduce this with godbolt, I realized that I couldn't because godbolt uses --emit-asm and --emit-asm forces codegen-units = 1, whereas this is a problem that requires multiple codegen units to reproduce.

So here's something that works locally:

Create a tiny lib with this code

#[inline(always)]
fn round_trip(input: [usize; 4]) -> [usize; 4] {
    let mut iter = input.into_iter();
    std::array::from_fn(|_| iter.next().unwrap())
}

pub fn make_array(input: [usize; 4]) -> usize {
    round_trip(input)[1]
}

pub fn make_array_again(input: [usize; 4]) -> usize {
    round_trip(input)[0]
}

Compile with multiple codegen units (I reproduced this with 16, but you probably need much less).

Although round_trip is correctly inlined, no inlining of the array::map inside array::from_fn is performed.

HadrienG2 · 2023-03-04T05:03:52Z

I suspect the issue here might be that codegen-units splits code into compilation units based on how complex it is before optimization, whereas in the case of iterator-based code (or more generally from_fn or map with callables that use sufficiently complex optimizations), the code is pretty complex before optimization but will become very simple afterwards.

Which might explain why codegen-units tends to break "zero-cost abstractions" in my code unless inline is used very liberally in their implementation: these abstractions only turn into efficient code after optimization passes.

Resolving this true underlying issue that keeps biting me over and over again would be a largely separate issue that would require a mechanism to feed back information about compiled machine code to the compiler frontend so that in release mode + incremental builds, it can take more informed decisions about upfront splitting into codegen units. Think PGO, but without execution. That's likely very complex to implement though, so for now #[inline] on everything that should be a zero cost abstraction + transitively called functions is a Good Enough Hack.

newpavlov · 2023-03-05T07:30:31Z

I will close this issue, since the particular problem referenced by the OP got resolved on Beta (I assume by a newer LLVM version). But in some situations codegen for array::map can be still quite suboptimal as discussion here and the stack usage issue demonstrate.

@HadrienG2
I think you should open a separate issue with request to add inline attributes on the methods.

newpavlov mentioned this issue Sep 23, 2022

Codegen regression from stable to nightly and beta #102208

Closed

rustbot added A-codegen Area: Code generation C-bug Category: This is a bug. labels Sep 24, 2022

InBetweenNames mentioned this issue Nov 30, 2022

Implement batched query support bevyengine/bevy#6161

Closed

nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 7, 2022

newpavlov closed this as completed Mar 5, 2023

HadrienG2 mentioned this issue Mar 5, 2023

array::from_fn should be fully inlined #108765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad codegen for array::map-based code #102202

Bad codegen for array::map-based code #102202

newpavlov commented Sep 23, 2022

newpavlov commented Sep 23, 2022

the8472 commented Sep 23, 2022 •

edited

Loading

veber-alex commented Sep 23, 2022

newpavlov commented Sep 23, 2022

the8472 commented Sep 23, 2022

newpavlov commented Sep 23, 2022

Rageking8 commented Sep 24, 2022

InBetweenNames commented Nov 30, 2022

memoryruins commented Dec 7, 2022

HadrienG2 commented Mar 2, 2023 •

edited

Loading

newpavlov commented Mar 3, 2023

HadrienG2 commented Mar 3, 2023 •

edited

Loading

HadrienG2 commented Mar 3, 2023

HadrienG2 commented Mar 4, 2023

newpavlov commented Mar 5, 2023

Bad codegen for array::map-based code #102202

Bad codegen for array::map-based code #102202

Comments

newpavlov commented Sep 23, 2022

newpavlov commented Sep 23, 2022

the8472 commented Sep 23, 2022 • edited Loading

veber-alex commented Sep 23, 2022

newpavlov commented Sep 23, 2022

the8472 commented Sep 23, 2022

newpavlov commented Sep 23, 2022

Rageking8 commented Sep 24, 2022

InBetweenNames commented Nov 30, 2022

memoryruins commented Dec 7, 2022

HadrienG2 commented Mar 2, 2023 • edited Loading

newpavlov commented Mar 3, 2023

HadrienG2 commented Mar 3, 2023 • edited Loading

HadrienG2 commented Mar 3, 2023

HadrienG2 commented Mar 4, 2023

newpavlov commented Mar 5, 2023

the8472 commented Sep 23, 2022 •

edited

Loading

HadrienG2 commented Mar 2, 2023 •

edited

Loading

HadrienG2 commented Mar 3, 2023 •

edited

Loading