Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better documentation of our MIR dialects #86299

Open
RalfJung opened this issue Jun 14, 2021 · 14 comments
Open

Better documentation of our MIR dialects #86299

RalfJung opened this issue Jun 14, 2021 · 14 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

We like to pretend that MIR is a single language, but that is not actually true. There are at least two different languages represented by MIR; I will call them MIR "dialects" in the following:

  • Pre-drop-elaboration MIR: this dialect implicitly keeps a drop flag for each local variable. Move operands and calling Drop sets that flag. The Drop and DropAndReplace terminators honor that flag and do not call drop_in_place when the drop flag indicates this variable has already been moved/dropped.
  • Post-drop-elaboration MIR: this dialect has no implicit drop flags, and each Drop terminator indicates an unconditional call to drop_in_place. (DropAndReplace terminators AFAIK cannot exist any more in this dialect.)

The problem with multiple dialects is that they can have different rules for interpreters, code generators, and optimizations, so we need to be careful not to use the wrong optimization with the wrong dialect.

Miri can only interpret the latter dialect. All our optimizations run post-drop-elaboration, so I assume they are all correct only for that second dialect (though some might also be correct for the first one).

I'd like to propose that we treat these dialects more explicitly: a mir::Body should indicate which dialect it is written in, such that optimizations, codegen, and Miri can ensure they are working with MIR of the right dialect. This becomes more pressing since #86155 is suggesting to add yet another dialect.

Cc @oli-obk -- there is some overlap here with the "phase" in the MIR validator, but I don't think this is entirely the same concept. Different dialects correspond to different phases, but we might have phases within a dialect where certain instructions are being lowered without changing the operational semantics of the language.

Cc @bjorn3 @rust-lang/wg-mir-opt

@bjorn3
Copy link
Member

bjorn3 commented Jun 14, 2021

For drop-elaboration: Maybe split Drop into Drop and DropIfNecessary. The former is used only after drop-elaboration and the latter only before drop-elaboration.

For AbortUnwindingCalls: Why can't this be done before all optimizations and maybe even during mir building? Doesn't running it at the end make MIR inlining remove aborts for unwinding past an inlined extern "C" function?

@bjorn3 bjorn3 added the A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html label Jun 14, 2021
@RalfJung
Copy link
Member Author

Why can't this be done before all optimizations and maybe even during mir building?

I asked @alexcrichton exactly that question in #86155. :)

For drop-elaboration: Maybe split Drop into Drop and DropIfNecessary. The former is used only after drop-elaboration and the latter only before drop-elaboration.

That would be another option, indeed. However, the question of whether the Abstract Machine keeps track of drop flags seems like a rather global one, where a dialect switch seems appropriate to me. Otherwise we have to make slightly subtle arguments based on the fact that yes, drop flags exist in the Abstract Machine, and they are always updated by moving or dropping, but if there are no more DropIfNecessary and DropAndReplace in the code then the drop flags have no effects so we can ignore their existence. IOW, we'd still rely on a "dialect-like" global property, namely that only Drop is used. Optimizations that work on the post-drop-elaboration dialect could still easily be wrong when run on the earlier dialect -- e.g. imagine replacing a move by a copy (which should be a correct transformation post-drop-elaboration).

@bjorn3
Copy link
Member

bjorn3 commented Jun 14, 2021

e.g. imagine replacing a move by a copy (which should be a correct transformation post-drop-elaboration).

That can make code slower. In case of move for large arguments only a pointer to the local is passed. For copy and const a copy is first made to an anonymous stackslot and then a pointer to this copy is passed. The callee directly uses the memory pointer to by the passed pointer argument as backing storage for the argument local and may change it. This is not something we can change. It is part of the C calling convention on most systems.

// The callee needs to own the argument memory if we pass it
// by-ref, so make a local copy of non-immediate constants.
match (arg, op.val) {
(&mir::Operand::Copy(_), Ref(_, None, _))
| (&mir::Operand::Constant(_), Ref(_, None, _)) => {
let tmp = PlaceRef::alloca(&mut bx, op.layout);
op.val.store(&mut bx, tmp);
op.val = Ref(tmp.llval, None, tmp.align);
}
_ => {}
}

I have been meaning to change the call arguments from Operand to Place to make this clearer, but that is currently blocked on simd_* taking a const generic instead of a const argument.

@tschuett
Copy link

https://mlir.llvm.org encourages and supports to have more dialects.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 15, 2021

That can make code slower. In case of move for large arguments only a pointer to the local is passed. For copy and const a copy is first made to an anonymous stackslot and then a pointer to this copy is passed. The callee directly uses the memory pointer to by the passed pointer argument as backing storage for the argument local and may change it.

How is that observable? Making an extra copy seems fine here.
(There is some kind of typo in the last sentence so I cannot entirely parse it.)

In MIR, as far as I am aware (i.e. in the preliminary spec in my head), move and copy make no difference. Miri also implements it like that. If replacing one by the other doesn't work then Miri must treat them differently somehow. But while I could imagine turning copy into move being problematic (even though, according to Miri -- the closest thing to a MIR spec we have -- this is a fine transformation), I can't see how the inverse, turning a move into a copy, can be problematic. Cc #71117 rust-lang/unsafe-code-guidelines#188

EDIT: Oh, I now realize you just said it can make code slower, not that it is wrong. Sure, but that's not relevant. I didn't say that it is a smart or useful change to make. I just said that it is a correct change to make. That is all that matters.

@RalfJung
Copy link
Member Author

https://mlir.llvm.org encourages and supports to have more dialects.

I think dialects are fine if they are treated explicitly in a principled way. Possibly we could take some ideas from MLIR there, are you familiar with the details?

What I am worried about is "accidental" dialects without proper documentation and care.

@tschuett
Copy link

tschuett commented Jun 16, 2021

https://mlir.llvm.org encourages and supports to have more dialects.

I think dialects are fine if they are treated explicitly in a principled way. Possibly we could take some ideas from MLIR there, are you familiar with the details?

What I am worried about is "accidental" dialects without proper documentation and care.

Dialects in MLIR are almost for free. You can define your own operations independent of other dialects.

It sounds as if you are talking about two implicit dialects (pre and post drop-elaboration). Maybe make the distinction explicit and call them out as two dialects with different names. I assume the second dialect would be simpler than the first one.

It smells, if you distinguish dialects by a bit or an enum instead of types.

@inquisitivecrystal
Copy link
Contributor

inquisitivecrystal commented Jun 20, 2021

@rustbot label +C-enhancement +T-compiler +T-doc

@rustbot rustbot added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools labels Jun 20, 2021
@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2021

This is not user-facing documentation, so I don't think it it relevant to the doc team.

@rustbot label -T-doc

@rustbot rustbot removed the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Jun 20, 2021
@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2021

#86155 landed, so we now also have a phase during which "unwinding from a fn call inside a nounwind function" implicitly aborts, and another phase during which it is UB.

@pierwill
Copy link
Member

Would this be better in the rustc API docs or the rustc dev guide, or maybe parts of both?

@RalfJung
Copy link
Member Author

Parts of it is actual code (like a mir::Body "knowing" which dialect it is written in), for the rest -- it matters less where the docs are than that they are in some canonical place. :)

@RalfJung
Copy link
Member Author

RalfJung commented Sep 2, 2022

#99102 made some progress here (Cc @JakobDegen).

One thing I am a bit confused by is that the transition from "conditional drop + implicit abort-on-unwind" to making both of these explicit seems to happen atomically when doing from analysis MIR to runtime MIR, but AFAIK these are two separate passes. So what do we say about the MIR in between those passes?

@JakobDegen
Copy link
Contributor

My current approach is basically that we say nothing, and that the fact that this atomic transition occurs over two implementations of MirPass is an implementation detail. In practice, what this means is that there is no expectation that these passes can be reordered or that executing some but not others won't cause breakage. That's not ideal, but I think the way to fix this is to have fewer semantic differences, and not to define more Mir dialects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants