MIR per-block copy elimination #86430

shamatar · 2021-06-18T09:58:02Z

There were many attempts to cleanup a MIR patterns like

StorageLive(_2);
_2 = _1;
_3 = ...(move _2);
StorageDead(_2);

This is a simple pass that only works per basic blocks and doesn't try to do harder liveness analysis on a body level.

It also depends on my work from PR #86383 , so I'll rebase when it's accepted.

rust-highfive · 2021-06-18T09:58:08Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

shamatar · 2021-06-18T11:28:53Z

Effect of this pass works the best with the following "SimplifyLocals" pass.
First this pass eliminates the specified patterns:

           StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
-         _4 = _1;                         // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
+         nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
          StorageLive(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          StorageLive(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _6 = _2;                         // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _5 = Len((*_6));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
+         nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
+         _5 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          StorageDead(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
-         _3 = Lt(move _4, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
+         _3 = Lt(move _1, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
          StorageDead(_5);

and then "SimplifyLocals" eliminates a lot of locals

          StorageLive(_3);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
-         StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
          nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
-         StorageLive(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
-         StorageLive(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
+         StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _5 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
-         StorageDead(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
-         _3 = Lt(move _1, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
-         StorageDead(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
+         _4 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
+         _3 = Lt(move _1, move _4);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
          StorageDead(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
          switchInt(move _3) -> [false: bb2, otherwise: bb1]; // scope 0 at $DIR/lower_slice_len.rs:7:5: 11:6

Logic of the pass is simple itself:

for every block find a specific pattern. Here it only visits "assign" to find copy events, and "operand" to find if a local we have copied into was move or copy used
then for interesting candidates only a mutating pass looks for "copy" statements, makes them NOP and writes down a copy source (if it's not "fancy"). Then if our local of interest is used anywhere in "place" (because I want to catch e.g. Len operations, that are not reachable through "operand" visitor)

I think that local state analyzer can be simplified, and I'd want to rework the "safeguard" logic, but first would like to check if this pass most likely sound and is worth it.

May be @oli-obk or @bjorn3 can review it as they have reviewed my previous PR and it's kind of logical step after it

bjorn3 · 2021-06-18T12:17:37Z

What if a reference to a local is taken and then the local is mutated through this reference? With fixes for this (and maybe related cases) I think it is a worthwhile optimization.

shamatar · 2021-06-18T18:47:30Z

I've identified this weakness and few other, will make an update soon. I expect it to be quite expensive pass, but it should eliminate quite a lot

shamatar · 2021-06-18T21:40:39Z

r=@bjorn3 most likely it's in a good shape to try for performance impact

bjorn3 · 2021-06-19T09:41:02Z

@bors try @rust-timer queue

rust-timer · 2021-06-19T09:41:03Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-06-19T09:41:12Z

⌛ Trying commit c2245e2c8519d40fa967fb5d1bcca17ba7f639b1 with merge 28c1cfbaee6d9198e43621666275d1d53336d5bd...

bors · 2021-06-19T10:03:32Z

💔 Test failed - checks-actions

shamatar · 2021-06-19T10:19:03Z

Hm, that is interesting. In theory, if I have a copy into local like _2 = Expr(_1) and _1 is not mutated or dropped until the end of the block then what can invalidate _1, so that expression can not be propagated further as substitutions for _2? Obvious cases of mutation of _1 are rejected to be optimized already.

Case when _2 itself is mutated in some way is discarded by this pass (mutation of the local copy). Between the place of copy into _2 and the substitution it's only allowed to copy _2, and make non-use actions.

petrochenkov · 2021-06-20T09:59:10Z

r? @tmiasko

petrochenkov · 2021-06-20T09:59:54Z

r? @bjorn3

bjorn3 · 2021-06-20T10:04:18Z

r? @tmiasko

I don't feel confident enough in reviewing this PR. It is way too susceptible to causing miscompilations.

shamatar · 2021-06-20T11:44:57Z

r=@bjorn3 can you start a CI job please? Looks better locally, tests pass, for now it's less zealous with optimizations, will tune later on

bjorn3 · 2021-06-20T12:32:17Z

@bors try @rust-timer queue

rust-timer · 2021-06-20T12:32:18Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-06-20T12:32:26Z

⌛ Trying commit a4d288d8387af1a8fb43cf713683f9e2af30a5ae with merge 0a02c3670999e4954a874ff832bcc92354eb47cf...

bjorn3 · 2021-06-20T12:33:16Z

Oh, you just wanted CI, not a perf run. My bad. Still can't hurt to get a perf run I guess.

shamatar · 2021-06-21T13:57:05Z

Well, it could have been indirectly touched. As mentioned above it's strictly not one MIR pass but two: lower slice::len and this copy elimination. For a clear measurement I'd need the first to be merged and then rebase

oli-obk · 2021-06-21T14:01:21Z

depends on what you mean by indirectly. It could just be random LLVM decisions making rustc slower, even on stage 1. We get this kind of noise pretty often.

but there is indeed one real regression https://perf.rust-lang.org/detailed-query.html?commit=0a02c3670999e4954a874ff832bcc92354eb47cf&base_commit=9543636cd6609cd2a9880da8f93aac96192ae40a&benchmark=html5ever-opt&run_name=full

shamatar · 2021-06-21T14:05:18Z

This may be logical: if we eliminate intermediates that have made some parts of code look "far" from each other then we can get more work for LTO

In any case I only consider LLVM as a black box that works somewhere "after" MIR, so less lines in MIR was kind of a goal (even while you only see a reduction in number of lines after "SimplifyLocals" actually runs)

shamatar · 2021-06-21T14:15:17Z

For completeness I should also note that this pass is kind of lightweight (O(locals * blocks) instead of O(locals^2 * blocks)) and most likely sound analog of https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir/src/transform/dest_prop.rs

Most likely sound because it only works on the block level and replaces destinations with sources in case if destination was not messed up between copy and potential substitution place, and if destination is not dead by the end of the block then last copy statement is preserved as it may be used in other blocks. It's also conservative on what messes up with a source and where to substitute the destination

scottmcm · 2021-06-21T19:36:10Z

There were many attempts to cleanup a MIR patterns like
StorageLive(_2);
_2 = _1;
_3 = ...(move _2);
StorageDead(_2);
This is a simple pass that only works per basic blocks and doesn't try to do harder liveness analysis on a body level.

For completeness I should also note that this pass is kind of lightweight (O(locals * blocks) instead of O(locals^2 * blocks))

How much is this getting from being quadratic? I agree that pattern is super-common, but I wonder if perhaps most of the value it gives could be done in a single walk -- rather than one per local -- by just looking at the most recent assignment. (And if the use is a move, like in that example, then it could even directly eliminate the assignment in those cases, since the move checker must have already proved that it can't be needed later.)

shamatar · 2021-06-21T19:42:32Z

I think my estimate of O(locals * blocks) is not 100% correct, it's still quadratic, but with small constants:

analyzer is single-pass, so it's O(blocks)
first I only care about locals that ever has _2 = _1 pattern (here it would indeed be linear over locals)
then I still has to follow the fate of both _2 or _1 for cases if e.g. _1 is mutated, so potential substitution should be discarded (and here it becomes quadratic, I have to work back) or if _2 is e.g. reassigned (this is easier)

May be destination propagation is better in global, but it's under unsound + mir-opt-level=3 and this on is intended to be simpler and sound

shamatar · 2021-06-21T22:56:59Z

@rustbot label: +S-blocked

bors · 2021-06-22T01:42:59Z

☔ The latest upstream changes (presumably #86383) made this pull request unmergeable. Please resolve the merge conflicts.

shamatar · 2021-06-22T12:15:58Z

@rustbot label: -S-blocked

bors · 2021-07-06T14:22:05Z

☔ The latest upstream changes (presumably #86143) made this pull request unmergeable. Please resolve the merge conflicts.

oli-obk · 2021-07-09T15:31:08Z

src/test/mir-opt/issue_73223.main.PreCodegen.32bit.diff

          _3 = ((_2 as Some).0: i32);      // scope 0 at $DIR/issue-73223.rs:3:14: 3:15
-          _1 = _3;                         // scope 2 at $DIR/issue-73223.rs:3:20: 3:21
+          _1 = ((_2 as Some).0: i32);      // scope 2 at $DIR/issue-73223.rs:3:20: 3:21


I'm surprised our DCE didn't get rid of _3, situations like this could explain why we don't see much gain in the perf runner, all the dead code is still passed to LLVM and eliminated there.

hmm... maybe we don't have any pass that removes useless assignments? That should be a fairly easy pass... just walk every block backwards and remember any StorageDead. As long as you don't see a read from the locals that have a StorageDead, then you can remove assignments to the local.

I actually did see a result of the pass (SimplifyLocals I think) that does what you say - if there is a variable that is just Live - copy-assign- Dead then it gets removed. The problem is that such patters it not present in MIR by default and only emerges e.g. as a result of this pass

Oh, I think there is a pass that does what you say (not a pass from this PR), but it's under opt-level = 4 and also under "unsound mir optimizations". While this pass basically does what you say with some extra heuristics like copying into the variable more then once, and handling a case when it's not dead by the end of the block.

oli-obk · 2021-07-09T16:40:12Z

So... I tried to review your pass's logic, but ran out of time in the hour I allocated for it. I don't think your categorization of "simple" fits what it is, as it is rather complex and it's not trivial to reason about its correctness (at least not for me).

I don't know how to keep reviewing this pass and whether the maintenance effort to keep it sound and working is worth it when we could spend that effort on getting destination propagation bug-free and get a more powerful optimization out of it.

shamatar · 2021-07-09T18:53:38Z

I've made this pass because it's for sure much more trivial than full destination propagation and it's easier to make it sound.

Regarding the logic, it's simple

visit every MIR statement
if it's in a form _2 = _1 (any Place can be instead of _1), then we mark _2 as "interesting" in and keep track of it's state, as well as track _1. Any copy statement like _2 = _1 marks a new generation of _2
if _2 (destination) is mutated or used anyhow other then copy or move we discard destination from potential replacement candidates
if we see mutable use of _1 (source) then we discard any destinations that use this source from an "interesting list"
otherwise if _2 is still interesting and we see it's use then we record "coordinates" of this use for future replacement of _2 by it's source
if _2 is not dead at the end of the block then we do not mark the last "generation" to be suitable for replacing

This allows to handle cases like

_2 = _1;
_3 = use(_2);
StorageDead(_2);

made into

_2 = _1;
_3 = use(_1);
StorageDead(_2);

that is later simplified by SimplifyLocals, as well as

_2 = _1;
_3 = use(_2);
_2 = _4;
... = use(_2);

that is replaced only as

_2 = _1;
_3 = use(_1);
_2 = _4;
... = use(_2);

JohnCSimon · 2021-09-13T03:11:42Z

ping from triage:
@shamatar
Returning to you to address merge conflicts and after that switch back to S-waiting-on-review.
thanks.

JohnCSimon · 2021-10-03T06:10:35Z

ping from triage:
@shamatar - I'm closing this as inactive, please feel free to reopen when you're ready to continue.

rust-highfive assigned petrochenkov Jun 18, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 18, 2021

shamatar marked this pull request as ready for review June 18, 2021 21:40

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 19, 2021

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 19, 2021

rust-highfive assigned bjorn3 and unassigned petrochenkov Jun 20, 2021

This comment has been minimized.

Sign in to view

rustbot added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Jun 21, 2021

shamatar added 2 commits June 22, 2021 13:45

squash_0

1967770

end squash/rebase

f9c6cc9

shamatar force-pushed the simple_copy_elimination branch from aee3c6f to f9c6cc9 Compare June 22, 2021 12:13

rustbot removed the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Jun 22, 2021

oli-obk reviewed Jul 9, 2021

View reviewed changes

oli-obk mentioned this pull request Jul 16, 2021

Local copy propagation #87126

Closed

JohnCSimon removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 25, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 15, 2021

inquisitivecrystal added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 24, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021

JohnCSimon added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Oct 3, 2021

JohnCSimon closed this Oct 3, 2021

shamatar deleted the simple_copy_elimination branch February 29, 2024 14:49

MIR per-block copy elimination #86430

MIR per-block copy elimination #86430

Conversation

shamatar commented Jun 18, 2021

rust-highfive commented Jun 18, 2021

shamatar commented Jun 18, 2021 • edited Loading

bjorn3 commented Jun 18, 2021 • edited Loading

shamatar commented Jun 18, 2021

shamatar commented Jun 18, 2021

bjorn3 commented Jun 19, 2021

rust-timer commented Jun 19, 2021

bors commented Jun 19, 2021

This comment has been minimized.

bors commented Jun 19, 2021

shamatar commented Jun 19, 2021

petrochenkov commented Jun 20, 2021

petrochenkov commented Jun 20, 2021

bjorn3 commented Jun 20, 2021

This comment has been minimized.

shamatar commented Jun 20, 2021

bjorn3 commented Jun 20, 2021

rust-timer commented Jun 20, 2021

bors commented Jun 20, 2021

bjorn3 commented Jun 20, 2021

shamatar commented Jun 21, 2021

oli-obk commented Jun 21, 2021

shamatar commented Jun 21, 2021 • edited Loading

shamatar commented Jun 21, 2021 • edited Loading

scottmcm commented Jun 21, 2021

shamatar commented Jun 21, 2021

shamatar commented Jun 21, 2021

bors commented Jun 22, 2021

shamatar commented Jun 22, 2021

bors commented Jul 6, 2021

oli-obk Jul 9, 2021

Choose a reason for hiding this comment

oli-obk Jul 9, 2021

Choose a reason for hiding this comment

shamatar Jul 9, 2021

Choose a reason for hiding this comment

shamatar Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

oli-obk commented Jul 9, 2021

shamatar commented Jul 9, 2021 • edited Loading

JohnCSimon commented Sep 13, 2021

JohnCSimon commented Oct 3, 2021

shamatar commented Jun 18, 2021 •

edited

Loading

bjorn3 commented Jun 18, 2021 •

edited

Loading

shamatar commented Jun 21, 2021 •

edited

Loading

shamatar commented Jun 21, 2021 •

edited

Loading

shamatar Jul 9, 2021 •

edited

Loading

shamatar commented Jul 9, 2021 •

edited

Loading