-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad codegen for non-copy-derived
struct with all Copy
derived fields
#128081
Comments
There was some brief discussion on Zulip about changing |
For primitives we already change StorageLive(_45);
_45 = ((*_1).43: u8);
StorageLive(_46);
_46 = ((*_1).44: u8);
StorageLive(_47);
_47 = ((*_1).45: u8);
StorageLive(_48);
_48 = ((*_1).46: u8);
StorageLive(_49);
_49 = ((*_1).47: u8);
StorageLive(_50);
_50 = ((*_1).48: u8);
StorageLive(_51);
_51 = ((*_1).49: u8);
StorageLive(_52);
_52 = ((*_1).50: u8);
StorageLive(_53);
_53 = ((*_1).51: u8);
StorageLive(_54);
_54 = ((*_1).52: [Dav1dSequenceHeaderOperatingParameterInfo; 32]);
_0 = Dav1dSequenceHeader { profile: move _2, max_width: move _3, max_height: move _4, layout: move _5, pri: move _6, trc: move _7, mtrx: move _8, chr: move _9, hbd: move _10, color_range: move _11, num_operating_points: move _12, operating_points: move _13, still_picture: move _14, reduced_still_picture_header: move _15, timing_info_present: move _16, num_units_in_tick: move _17, time_scale: move _18, equal_picture_interval: move _19, num_ticks_per_picture: move _20, decoder_model_info_present: move _21, encoder_decoder_buffer_delay_length: move _22, num_units_in_decoding_tick: move _23, buffer_removal_delay_length: move _24, frame_presentation_delay_length: move _25, display_model_info_present: move _26, width_n_bits: move _27, height_n_bits: move _28, frame_id_numbers_present: move _29, delta_frame_id_n_bits: move _30, frame_id_n_bits: move _31, sb128: move _32, filter_intra: move _33, intra_edge_filter: move _34, inter_intra: move _35, masked_compound: move _36, warped_motion: move _37, dual_filter: move _38, order_hint: move _39, jnt_comp: move _40, ref_frame_mvs: move _41, screen_content_tools: move _42, force_integer_mv: move _43, order_hint_n_bits: move _44, super_res: move _45, cdef: move _46, restoration: move _47, ss_hor: move _48, ss_ver: move _49, monochrome: move _50, color_description_present: move _51, separate_uv_delta_q: move _52, film_grain_present: move _53, operating_parameter_info: move _54 }; which is copying the fields, then moving them into the aggregate. https://rust-lang.github.io/rfcs/1521-copy-clone-semantics.html lets the standard library call My instinct is that this should be filed to LLVM, because it's much better positioned to look at all the loads and stores we give it https://godbolt.org/z/4W9PP8nTW and coalesce them into something smaller. And types should be marked |
Could we do something like run the There are some good reasons not to use (That being said, it does seem like minimizing and opening an LLVM issue would be good since there is something it's not seeing through) |
The reason we didn't is because the types are fairly large and so we want to avoid accidental copies. I was expecting a We may change this to It does seem like there should be a better way for
@CrazyboyQCD, I think it'd be good to file this against LLVM, too. I would think LLVM should be able to optimize this without help from |
@scottmcm, would you mind doing this for LLVM? I'm not quite sure how to describe this clearly. |
Can you minimize the code example as much as possible? Remove fields, manually inline function calls, delete irrelevant code, etc as long as the issue still shows up. If you do that, you can more or less just post the LLVM IR with It's better yet if you can get something that reproduces with LLC. I don't have a great process for this but usually I copy the LLVM IR from the Rust to a LLC godbolt (I just use llvm.godbolt.org, set the input language to "LLVM IR") and try to delete more stuff there. Note you might need to manually demangle the function names so it actually compiles. Scott is definitely far more in the know here than I am and can probably give some better suggestions, but if you can minimize it a bit then that's a great start :) |
Sorry about that, Github decided to click a button for me. |
@tgross35, just minimized the examples and pasted them. |
Found a "future" regression, @rustbot label +A-LLVM |
…=<try> Perform instsimplify before inline to eliminate some trivial calls I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)): ``` bb0: { StorageLive(_2); _2 = ((*_1).0: i32); StorageLive(_3); _3 = ((*_1).1: u64); _0 = Foo { a: move _2, b: move _3 }; StorageDead(_3); StorageDead(_2); return; } ``` Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`. I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`. r? `@saethlin`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`
…=saethlin Perform instsimplify before inline to eliminate some trivial calls I am currently working on rust-lang#128081. In the current pipeline, we can get the following clone statements ([godbolt](https://rust.godbolt.org/z/931316fhP)): ``` bb0: { StorageLive(_2); _2 = ((*_1).0: i32); StorageLive(_3); _3 = ((*_1).1: u64); _0 = Foo { a: move _2, b: move _3 }; StorageDead(_3); StorageDead(_2); return; } ``` Analyzing such statements will be simple and fast. We don't need to consider branches or some interfering statements. However, this requires us to run `InstSimplify`, `ReferencePropagation`, and `SimplifyCFG` at least once. I can introduce a new pass, but I think the best place for it would be within `InstSimplify`. I put `InstSimplify` before `Inline`, which takes some of the burden away from `Inline`. r? `@saethlin`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. Currently being blocked by rust-lang#128265. `@rustbot` label +S-blocked r? `@saethlin`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. r? `@cjgillot`
Another example of the issue: Godbolt link |
Simplify the canonical clone method and the copy-like forms to copy Fixes rust-lang#128081. The optimized clone method ends up as the following MIR: ``` _2 = copy ((*_1).0: i32); _3 = copy ((*_1).1: u64); _4 = copy ((*_1).2: [i8; 3]); _0 = Foo { a: move _2, b: move _3, c: move _4 }; ``` We can transform this to: ``` _0 = copy (*_1); ``` r? `@cjgillot`
Godbolt Link
As you can see in the asm output, even set
opt-level = 3
, if we don't addCopy
to structs with all fieldsCopy
derived, inclone()
it generates moremov
and large struct can't triggermemcpy
.The text was updated successfully, but these errors were encountered: