Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR per-block copy elimination #86430

Closed
wants to merge 2 commits into from

Conversation

shamatar
Copy link
Contributor

There were many attempts to cleanup a MIR patterns like

StorageLive(_2);
_2 = _1;
_3 = ...(move _2);
StorageDead(_2);

This is a simple pass that only works per basic blocks and doesn't try to do harder liveness analysis on a body level.

It also depends on my work from PR #86383 , so I'll rebase when it's accepted.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 18, 2021
@shamatar
Copy link
Contributor Author

shamatar commented Jun 18, 2021

Effect of this pass works the best with the following "SimplifyLocals" pass.
First this pass eliminates the specified patterns:

           StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
-         _4 = _1;                         // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
+         nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
          StorageLive(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          StorageLive(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _6 = _2;                         // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _5 = Len((*_6));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
+         nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
+         _5 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          StorageDead(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
-         _3 = Lt(move _4, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
+         _3 = Lt(move _1, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
          StorageDead(_5);  

and then "SimplifyLocals" eliminates a lot of locals

          StorageLive(_3);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
-         StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
          nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:13
-         StorageLive(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
-         StorageLive(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
+         StorageLive(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
          nop;                             // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:21
-         _5 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
-         StorageDead(_6);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
-         _3 = Lt(move _1, move _5);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
-         StorageDead(_5);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
+         _4 = Len((*_2));                 // scope 0 at $DIR/lower_slice_len.rs:7:16: 7:27
+         _3 = Lt(move _1, move _4);       // scope 0 at $DIR/lower_slice_len.rs:7:8: 7:27
          StorageDead(_4);                 // scope 0 at $DIR/lower_slice_len.rs:7:26: 7:27
          switchInt(move _3) -> [false: bb2, otherwise: bb1]; // scope 0 at $DIR/lower_slice_len.rs:7:5: 11:6

Logic of the pass is simple itself:

  • for every block find a specific pattern. Here it only visits "assign" to find copy events, and "operand" to find if a local we have copied into was move or copy used
  • then for interesting candidates only a mutating pass looks for "copy" statements, makes them NOP and writes down a copy source (if it's not "fancy"). Then if our local of interest is used anywhere in "place" (because I want to catch e.g. Len operations, that are not reachable through "operand" visitor)

I think that local state analyzer can be simplified, and I'd want to rework the "safeguard" logic, but first would like to check if this pass most likely sound and is worth it.

May be @oli-obk or @bjorn3 can review it as they have reviewed my previous PR and it's kind of logical step after it

@bjorn3
Copy link
Member

bjorn3 commented Jun 18, 2021

What if a reference to a local is taken and then the local is mutated through this reference? With fixes for this (and maybe related cases) I think it is a worthwhile optimization.

@shamatar
Copy link
Contributor Author

I've identified this weakness and few other, will make an update soon. I expect it to be quite expensive pass, but it should eliminate quite a lot

@shamatar shamatar marked this pull request as ready for review June 18, 2021 21:40
@shamatar
Copy link
Contributor Author

r=@bjorn3 most likely it's in a good shape to try for performance impact

@bjorn3
Copy link
Member

bjorn3 commented Jun 19, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 19, 2021
@bors
Copy link
Contributor

bors commented Jun 19, 2021

⌛ Trying commit c2245e2c8519d40fa967fb5d1bcca17ba7f639b1 with merge 28c1cfbaee6d9198e43621666275d1d53336d5bd...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 19, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 19, 2021
@shamatar
Copy link
Contributor Author

Hm, that is interesting. In theory, if I have a copy into local like _2 = Expr(_1) and _1 is not mutated or dropped until the end of the block then what can invalidate _1, so that expression can not be propagated further as substitutions for _2? Obvious cases of mutation of _1 are rejected to be optimized already.

Case when _2 itself is mutated in some way is discarded by this pass (mutation of the local copy). Between the place of copy into _2 and the substitution it's only allowed to copy _2, and make non-use actions.

@petrochenkov
Copy link
Contributor

r? @tmiasko

@petrochenkov
Copy link
Contributor

r? @bjorn3

@rust-highfive rust-highfive assigned bjorn3 and unassigned petrochenkov Jun 20, 2021
@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2021

r? @tmiasko

I don't feel confident enough in reviewing this PR. It is way too susceptible to causing miscompilations.

@rust-log-analyzer

This comment has been minimized.

@shamatar
Copy link
Contributor Author

r=@bjorn3 can you start a CI job please? Looks better locally, tests pass, for now it's less zealous with optimizations, will tune later on

@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Jun 20, 2021

⌛ Trying commit a4d288d8387af1a8fb43cf713683f9e2af30a5ae with merge 0a02c3670999e4954a874ff832bcc92354eb47cf...

@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2021

Oh, you just wanted CI, not a perf run. My bad. Still can't hurt to get a perf run I guess.

@shamatar
Copy link
Contributor Author

Well, it could have been indirectly touched. As mentioned above it's strictly not one MIR pass but two: lower slice::len and this copy elimination. For a clear measurement I'd need the first to be merged and then rebase

@oli-obk
Copy link
Contributor

oli-obk commented Jun 21, 2021

depends on what you mean by indirectly. It could just be random LLVM decisions making rustc slower, even on stage 1. We get this kind of noise pretty often.

but there is indeed one real regression https://perf.rust-lang.org/detailed-query.html?commit=0a02c3670999e4954a874ff832bcc92354eb47cf&base_commit=9543636cd6609cd2a9880da8f93aac96192ae40a&benchmark=html5ever-opt&run_name=full

@shamatar
Copy link
Contributor Author

shamatar commented Jun 21, 2021

This may be logical: if we eliminate intermediates that have made some parts of code look "far" from each other then we can get more work for LTO

In any case I only consider LLVM as a black box that works somewhere "after" MIR, so less lines in MIR was kind of a goal (even while you only see a reduction in number of lines after "SimplifyLocals" actually runs)

@shamatar
Copy link
Contributor Author

shamatar commented Jun 21, 2021

For completeness I should also note that this pass is kind of lightweight (O(locals * blocks) instead of O(locals^2 * blocks)) and most likely sound analog of https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir/src/transform/dest_prop.rs

Most likely sound because it only works on the block level and replaces destinations with sources in case if destination was not messed up between copy and potential substitution place, and if destination is not dead by the end of the block then last copy statement is preserved as it may be used in other blocks. It's also conservative on what messes up with a source and where to substitute the destination

@scottmcm
Copy link
Member

There were many attempts to cleanup a MIR patterns like

StorageLive(_2);
_2 = _1;
_3 = ...(move _2);
StorageDead(_2);

This is a simple pass that only works per basic blocks and doesn't try to do harder liveness analysis on a body level.

For completeness I should also note that this pass is kind of lightweight (O(locals * blocks) instead of O(locals^2 * blocks))

How much is this getting from being quadratic? I agree that pattern is super-common, but I wonder if perhaps most of the value it gives could be done in a single walk -- rather than one per local -- by just looking at the most recent assignment. (And if the use is a move, like in that example, then it could even directly eliminate the assignment in those cases, since the move checker must have already proved that it can't be needed later.)

@shamatar
Copy link
Contributor Author

I think my estimate of O(locals * blocks) is not 100% correct, it's still quadratic, but with small constants:

  • analyzer is single-pass, so it's O(blocks)
  • first I only care about locals that ever has _2 = _1 pattern (here it would indeed be linear over locals)
  • then I still has to follow the fate of both _2 or _1 for cases if e.g. _1 is mutated, so potential substitution should be discarded (and here it becomes quadratic, I have to work back) or if _2 is e.g. reassigned (this is easier)

May be destination propagation is better in global, but it's under unsound + mir-opt-level=3 and this on is intended to be simpler and sound

@shamatar
Copy link
Contributor Author

@rustbot label: +S-blocked

@rustbot rustbot added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Jun 21, 2021
@bors
Copy link
Contributor

bors commented Jun 22, 2021

☔ The latest upstream changes (presumably #86383) made this pull request unmergeable. Please resolve the merge conflicts.

@shamatar shamatar force-pushed the simple_copy_elimination branch from aee3c6f to f9c6cc9 Compare June 22, 2021 12:13
@shamatar
Copy link
Contributor Author

@rustbot label: -S-blocked

@rustbot rustbot removed the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Jun 22, 2021
@bors
Copy link
Contributor

bors commented Jul 6, 2021

☔ The latest upstream changes (presumably #86143) made this pull request unmergeable. Please resolve the merge conflicts.

Comment on lines 47 to +48
_3 = ((_2 as Some).0: i32); // scope 0 at $DIR/issue-73223.rs:3:14: 3:15
_1 = _3; // scope 2 at $DIR/issue-73223.rs:3:20: 3:21
_1 = ((_2 as Some).0: i32); // scope 2 at $DIR/issue-73223.rs:3:20: 3:21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised our DCE didn't get rid of _3, situations like this could explain why we don't see much gain in the perf runner, all the dead code is still passed to LLVM and eliminated there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... maybe we don't have any pass that removes useless assignments? That should be a fairly easy pass... just walk every block backwards and remember any StorageDead. As long as you don't see a read from the locals that have a StorageDead, then you can remove assignments to the local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did see a result of the pass (SimplifyLocals I think) that does what you say - if there is a variable that is just Live - copy-assign- Dead then it gets removed. The problem is that such patters it not present in MIR by default and only emerges e.g. as a result of this pass

Copy link
Contributor Author

@shamatar shamatar Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think there is a pass that does what you say (not a pass from this PR), but it's under opt-level = 4 and also under "unsound mir optimizations". While this pass basically does what you say with some extra heuristics like copying into the variable more then once, and handling a case when it's not dead by the end of the block.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 9, 2021

So... I tried to review your pass's logic, but ran out of time in the hour I allocated for it. I don't think your categorization of "simple" fits what it is, as it is rather complex and it's not trivial to reason about its correctness (at least not for me).

I don't know how to keep reviewing this pass and whether the maintenance effort to keep it sound and working is worth it when we could spend that effort on getting destination propagation bug-free and get a more powerful optimization out of it.

@shamatar
Copy link
Contributor Author

shamatar commented Jul 9, 2021

I've made this pass because it's for sure much more trivial than full destination propagation and it's easier to make it sound.

Regarding the logic, it's simple

  • visit every MIR statement
  • if it's in a form _2 = _1 (any Place can be instead of _1), then we mark _2 as "interesting" in and keep track of it's state, as well as track _1. Any copy statement like _2 = _1 marks a new generation of _2
  • if _2 (destination) is mutated or used anyhow other then copy or move we discard destination from potential replacement candidates
  • if we see mutable use of _1 (source) then we discard any destinations that use this source from an "interesting list"
  • otherwise if _2 is still interesting and we see it's use then we record "coordinates" of this use for future replacement of _2 by it's source
  • if _2 is not dead at the end of the block then we do not mark the last "generation" to be suitable for replacing

This allows to handle cases like

_2 = _1;
_3 = use(_2);
StorageDead(_2);

made into

_2 = _1;
_3 = use(_1);
StorageDead(_2);

that is later simplified by SimplifyLocals, as well as

_2 = _1;
_3 = use(_2);
_2 = _4;
... = use(_2);

that is replaced only as

_2 = _1;
_3 = use(_1);
_2 = _4;
... = use(_2);

@oli-obk oli-obk mentioned this pull request Jul 16, 2021
@JohnCSimon JohnCSimon removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 25, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 15, 2021
@inquisitivecrystal inquisitivecrystal added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021
@JohnCSimon
Copy link
Member

ping from triage:
@shamatar
Returning to you to address merge conflicts and after that switch back to S-waiting-on-review.
thanks.

@JohnCSimon JohnCSimon added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Oct 3, 2021
@JohnCSimon
Copy link
Member

ping from triage:
@shamatar - I'm closing this as inactive, please feel free to reopen when you're ready to continue.

@JohnCSimon JohnCSimon closed this Oct 3, 2021
@shamatar shamatar deleted the simple_copy_elimination branch February 29, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.