Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_mir: implement an "lvalue reuse" optimization (aka destination propagation aka NRVO). #46321

Closed
wants to merge 1 commit into from

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Nov 28, 2017

Replaces a chain of moves, such as a = ...; ... b = move a; ... f(&mut b) ... c = move b, with the final destination, i.e. only c = ...; ... f(&mut c); ... remains (note that borrowing works).

@rust-highfive
Copy link
Collaborator

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2017

@bors try

@bors
Copy link
Contributor

bors commented Nov 28, 2017

⌛ Trying commit 69b5139 with merge cf73ed6...

bors added a commit that referenced this pull request Nov 28, 2017
 rustc_mir: implement an "lvalue reuse" optimization (aka destination propagation aka NRVO).

Replaces a chain of moves, such as `a = ...; ... b = move a; ... f(&mut b) ... c = move b`, with the final destination, i.e. only `c = ...; ... f(&mut c); ...` remains (note that borrowing is allowed).

**DO NOT MERGE** only for testing atm. Based on #46142.
@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2017

r? @nikomatsakis
cc @Mark-Simulacrum Let's throw some benchmarks at this!

@est31
Copy link
Member

est31 commented Nov 28, 2017

Is placement new doing the same? Will the only reason to use it then be that you like the syntax?

@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2017

@est31 No, this doesn't change execution order, placement has to allocate before evaluating the value into the allocation, while Box::new(expr) allocates (in the callee) after evaluating expr .

OTOH, this optimizations only reuses stack local destinations when doing so can preserve behavior.

@bors
Copy link
Contributor

bors commented Nov 28, 2017

☀️ Test successful - status-travis
State: approved= try=True

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 28, 2017
@Mark-Simulacrum
Copy link
Member

Try build added to queue. Will be complete soon (probably a couple hours).

@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2017

Assuming the perf results are accurate, this regresses compile times. It'd be a shame if we'd have to disable this, especially the first iteration which seems straight-forward (2 visits over the MIR).

Also, do we have any runtime benchmarks yet?

@Mark-Simulacrum
Copy link
Member

Hold up, those perf results are invalid. If you rerun try with DEPLOY_ALT removed from

- env: IMAGE=dist-x86_64-linux DEPLOY_ALT=1
then we should be able to rebenchmark and then be successful. The current perf results (I believe) are comparing a build without LLVM asserts with a LLVM-asserts enabled build.

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2017

@bors try

@bors
Copy link
Contributor

bors commented Nov 29, 2017

⌛ Trying commit f41425b with merge 9de9a00...

bors added a commit that referenced this pull request Nov 29, 2017
 rustc_mir: implement an "lvalue reuse" optimization (aka destination propagation aka NRVO).

Replaces a chain of moves, such as `a = ...; ... b = move a; ... f(&mut b) ... c = move b`, with the final destination, i.e. only `c = ...; ... f(&mut c); ...` remains (note that borrowing works).
@bors
Copy link
Contributor

bors commented Nov 29, 2017

☀️ Test successful - status-travis
State: approved= try=True

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2017

@Mark-Simulacrum I'm not sure my hack worked, I'm probably missing the changes to actually upload the results from #46354.

@Mark-Simulacrum
Copy link
Member

I think you need to change this condition to $DEPLOY

condition: $DEPLOY_ALT = 1

@Mark-Simulacrum
Copy link
Member

#46354 actually just merged so I think you should be able to just rebase.

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2017

@bors try

@bors
Copy link
Contributor

bors commented Nov 29, 2017

⌛ Trying commit e87ab56 with merge 9afa5ec24b58d6d3f271a9d655e1b98c6b550ea7...

@bors
Copy link
Contributor

bors commented Nov 29, 2017

☀️ Test successful - status-travis
State: approved= try=True

@Mark-Simulacrum
Copy link
Member

Perf started.

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2017

Perf results are better, although most wins are minor and there's a regression on inflate-opt.
(The inflate crate has the entirety of the static huffman table logic in one function, with a lot of macro-generated code, and presumably, very many variables, so any MIR optimization will be slow)

@ghost
Copy link

ghost commented Nov 29, 2017

As a random lurker, I wonder how this PR and this LLVM patch relate. AFAICT, they do very similar optimizations but with different code representations (MIR and LLVM IR).

@pcwalton
Copy link
Contributor

@stjepang For one, the patch in this PR can handle borrows across functions, while that LLVM patch cannot.

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2017

@stjepang I think @pcwalton abandoned that patch in favor of MIR optimizations, which had a lower priority and very little happened since. @pcwalton's own (forward, unlike this PR) "copy propagation" MIR pass is disabled by default because it was too slow to run (missing some caching).

Also, this PR does more than optimize across basic blocks, allowing borrows without needing to track them. OTOH, LLVM could likely handle field accesses, while this PR doesn't, by itself.

Not sure how we could encode the semantics of Move (invalidating borrows) for LLVM to use.

@eddyb eddyb added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 30, 2017
@arielb1
Copy link
Contributor

arielb1 commented Nov 30, 2017

I talked with @eddyb on IRC and this appears this optimization is unjustified, especially with NLL..

@eddyb
Copy link
Member Author

eddyb commented Nov 30, 2017

On IRC @arielb1 pointed out that with NLL this might become valid:

let (a, b, x);
x = None;
loop {
    a = init;
    use(x, &mut a);
    endregion('α);
    b = move a;
    x = Some(&'α b);
}
endregion('α);

This PR, in its current state, would incorrectly reuse b's memory for a.
He also suggested using the existing moveck dataflow analyses which should allow tracking all candidates we might possibly be interested in and the interactions between them, and borrows.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK um so I never really finished this review. I was hoping to write up a kind of description of what the code is doing. At this point I'm not sure of current status so I'll just post these incomplete and not that interesting comments for now.

@eddyb -- what is your latest thinking for how to proceed here?

// except according to those terms.

//! A pass that reuses "final destinations" of values,
//! propagating the lvalue back through a chain of moves.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing some kind of meta comment here somewhere in this file. In my ideal world, the explanation would include the "before" and "after" MIR, in such a way that we can refer back to the example in the comments below. I'll take a stab at writing comments as my review to see if I understand what's going on. =)

@@ -22,20 +22,17 @@ fn main() {
// START rustc.test.CopyPropagation.before.mir
// bb0: {
// ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get some tests specific to this optimization? Ideally a directory like mir-opt/reuse_lvalues/ with various corner cases, showing before/after --- but anyway at least one? =)

// Keep going, in case the move chain doesn't stop here.
self.visit_local(local, context, location);

// Cache the final result, in a similar way to union-find.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute.

// as they are guaranteed to have all accesses in between.
// Also, the destination local of the move has to also have
// a single initialization (the move itself), otherwise
// there could be accesses that overlap the move chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

L1 = ...
...
L2 = move L1

Here, local is L1 and dest is L2. We will set the reused flag on L2 to true.

&local: &Local,
context: LvalueContext<'tcx>,
_location: Location) {
let (ref mut def, ref mut mov) = self.defs_moves[local];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can now do let (def, move) = &mut self.defs_moves[local];

let (ref mut def, ref mut mov) = self.defs_moves[local];
match context {
// We specifically want the first direct initialization.
LvalueContext::Store |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Store corresponds also to things like SetDiscriminant and InlineAsm -- is that a problem here? I guess not.

@carols10cents
Copy link
Member

@eddyb -- what is your latest thinking for how to proceed here?

ping @eddyb !

@eddyb
Copy link
Member Author

eddyb commented Dec 12, 2017

@carols10cents I have discussed a new set of analyses with @nikomatsakis which should make the optimization sound, and I hope to wrap up bug hunting and get back to this PR soon.

@scottmcm
Copy link
Member

Looks like this will fix #32966

@eddyb
Copy link
Member Author

eddyb commented Dec 20, 2017

I'll close this PR as a reimplementation is required anyway, for soundness wrt loops.

@eddyb eddyb closed this Dec 20, 2017
@eddyb eddyb deleted the even-mirer-3 branch December 20, 2017 06:03
@eddyb
Copy link
Member Author

eddyb commented Dec 25, 2017

Note to self, I've tried this:

#![feature(nll)]

struct Foo(u8);

fn without_nll() {
    let (mut a, mut b);
    let r;
    let mut first = true;
    loop {
        a = Foo(0);
        if first {
            first = false;
            b = a;
        } else {
            r = &a;
            break;
        }
    }
    // `a`, `b`, and `*r` are the same location.
    // However, I don't see a way to abuse it.
}

fn with_nll() {
    let (mut a, mut b);
    let mut r = None;
    loop {
        a = Foo(0);
        // `*r` and `a` are the same location.
        drop((r, &mut a));
        b = a;
        r = Some(&b);
    }
}

fn main() {
    without_nll();
    with_nll();
}

with_nll doesn't compile yet, the region still covers too much, and it does conflict with b = a;.

without_nll is optimized, but I was hoping for an easier way to show how going multiple times around a loop, yet taking different branches inside it, could result in unsound optimizations.

As it stands, all I can do is keep this example for future mir-opt tests, and make sure it's not optimized, preferably without loop-specific special cases, just a generalized dataflow algorithm.

@jrmuizel
Copy link
Contributor

Any chance we can see something like this resurrected?

@eddyb
Copy link
Member Author

eddyb commented Nov 22, 2018

@jrmuizel This PR is naive and was superseded by #47954.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.