-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closures Capture Disjoint Fields #2229
Conversation
Some previous discussion https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387
Ok, ping @nikomatsakis |
text/0000-capture-disjoint-fields.md
Outdated
move || foo.a; | ||
``` | ||
- Borrowck passes because `foo` is not borrowed elsewhere. | ||
- The closure moves and captures `foo.a` but not `foo.b`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if foo
's type implements Drop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oli-obk Thanks for catching that! For those examples I am assuming only cases where the desugar will compile, since it should be functionally equivalent to the final implementation. If foo
implements Drop
then it can not be destructed anyway, capturing or not.
text/0000-capture-disjoint-fields.md
Outdated
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
- How exactly does codegen change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the struct is not used outside the closure, this will add one reference per field used in the closure instead of just one reference for the entire struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oli-obk That would be the trivial implementation, but (as per previous discussion) the closure would be smaller if it captured a single nonexclusive pointer to the whole struct and then derefrenced with constant offsets.
text/0000-capture-disjoint-fields.md
Outdated
captured. This was simple to implement but is inconstant with the rest of the | ||
language because rust normally allows simultaneous borrowing of disjoint fields. | ||
Remembering this exception adds to the mental burden of the programmer and | ||
worsens rust's already terrible learning curve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Terrible sounds a bit strong - I'd agree to Rust having a steep learning curve, but terrible is a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well-written and great RFC! I have a single nit =)
Thank you for this! When discussing the rationale, I wold suggest also mentioning that this can make it harder to factor a group of related variables into a struct and provide functionality as methods on that struct, since that can then lead to borrow-checker errors that would not exist otherwise. |
If I understand correctly, this has the potential of significantly increasing the size of environments passed to a closure -- if the closure uses 10 of 12 fields, now it will have 10 pointers instead of 1 in the environment. Could that be a problem? |
We could use refinement typing (or some analogue thereof) internally and keep the same environment but give it a type/extra information to indicate how fields are/can be used. |
@RalfJung I don't see why the closure should use many pointers. The generated code should be exactly the same, it's just the borrowck which should be a bit more lax. |
I would love to have some more input on this. Any thoughts on MIR safety? Another option might be to make each individual field an upvar with hints to combine the pointers back together during codegen. Treat it more like an optimization. Also, pinging the lang team, what needs to happen so that we can move forward? |
My take: I would very much like this to happen. I've been wanting to, in fact, to @evanbergeron who a long time back pinged me on this topic, but I've just been too busy to invest a lot of thought into it. (@evanbergeron -- still interested in implementing? And, if I have the wrong github name, er, sorry, about that.) My one hesitation here is that perhaps we need to invest a bit more effort into thinking through if there are backwards compatibility hazards? @RalfJung mentions that the size of a closure could increase, which I think is true, but I'm not too worried about it -- we could probably optimize it as well, if we really wanted to. Probably it suffices to work things through in the implementation phase and see what crater has to say and so forth. OK, I've kind of convinced myself. I think we should merge this. =) Nominating just for a brief check-in with @rust-lang/lang team. |
OK, so we talked about this in the @rust-lang/lang meeting. That discussion helped me to remember some of the edge cases I was concerned about. I still think we should make this change, but it's important that we lay these things up out front. First, I think we should define this with respect to a "simple reference desugaring" but of course leave it undefined what layout the closure actually has. This is I guess more or less what the RFC does, but I think I'd prefer something a bit more concrete. Basically, I want to be able to "think of" closures as capturing a set of paths -- even if, in the underlying code, we sometimes choose not to. (In reality, I think we actually will do the dumb desugaring until forced to do otherwise by some horrendously performing code.) Mostly I'm referring to text like this:
I don't want to alter any borrowck rules. These rules are defined in terms of MIR and I think they should stay the same. What I want to do is to alter the "MIR desugaring" of closures to capture longer paths instead of just local variables. This contains the scope of the RFC and means that we don't endanger soundness (though we can still alter runtime semantics, more on that later). Currently, closures are defined roughly like so. We take the set of free variables that appear in the closure definition (those variables referenced by the closure) and we infer a borrowing mode for each one. We then create a closure with one field per free variable. We would be altering the definition so that we define a set of "free places" (lvalues). We'll have to tinker a bit to find the best way to define this, per the edge cases I'm about to bring up. =) So as a sketch: || foo(&a.b.c); would desugar to {
let r = &a.b.c;
NewClosure { r }
} where the Second, we should distinguish indexing from other paths. If you have a closure like this: let mut vecs = (vec![1, 2, 3], vec![4, 5, 6]);
let inc0 = |index: usize| {
vecs.0[index] += 1;
}; Clearly, this closure can't borrow Third, overloaded deref is kinda' special, but I think we don't need a special case for it. In particular, something like let tmp = Deref::deref(&some_Rc);
&tmp.foo As a result, assuming let x = || use(&mut foo.bar);
let y = || use(&mut foo.baz); This would desugar to: let tmp = DerefMut::deref_mut(&mut foo);
let f0 = &mut tmp.bar;
let tmp = DerefMut::deref_mut(&mut foo);
let f1 = &mut tmpbaz;
let x = Closure0 { f0 };
let x = Closure1 { f1 }; This would create an error. You can get similar errors today without closures. Still, it seems like this is an orthogonal problem, and we may solve it eventually, which would then carry over to closures. Great. Fourth, we can't capture paths that are moved unless we know that there are no destructors. The RFC does mention the need not to move a field that is contained in something with a destructor, which is good, but in general this RFC can have a subtle impact on when destructors run. Consider: {
let tuple = (vec![], vec![]);
{
let c = || send(tuple.0);
} // c goes out of scope here
} // tuple goes out of scope here Today, in this code, It seems pretty damn unlikely that this would cause problems for anyone. However, this kind of silent change to semantics definitely makes me nervous. I suspect we should not do it and instead, for places that are moved, we should just always move the underlying variable. That said, we could probably write some code that detects this scenario and reports an error, just to get an idea how much code out there might be affected. (I don't think we need to settle it now, but one way to change this that might be safer is via an epoch. We could for example issue warnings in cases like the one above -- where the timing of when a dtor runs would change -- and then actually make the change only with opt-in. I don't feel great about that, though, and the transition might have to span 2 epochs (one to make it an error to do anything that would change behavior, and one to change the behavior).) Thoughts? |
@nikomatsakis A lot of great points here! I'm already thinking of tons of improvements that can be made to the RFC. In the mean time, there are several items I would like to discuss a bit more. The underlying goal of this RFC is to improve consistency in the language. One way to think about this: block expressions and closures should have the same behavior whenever possible. I want us to keep that idea of consistency in mind. Change borrowck or lowering?
You are right! The compiler (ideally) does borrowck at the MIR level. Closures don't exist in MIR. Consequently, the borrowck implementation should not be altered as a result of this RFC. Only the conceptual understanding of borrowck will change. The text of the RFC does not reflect this. Desugaring & Closure Size
That's pretty reasonable, I think I'll end up doing exactly that.
We really only need to change lowering (the HIR to MIR conversion). The reason for the added complexity is closure size. Currently, capturing a variable by reference only ever adds one pointer to the closure data. However, implementing this RFC as a simple desugar may result in many extra, unnecessary pointers being captured. In keeping with the rust philosophy of "you don't pay for features you don't use", we want to find a way to merge several borrows into a single, nonexclusive pointer. In order to preserve the safety of MIR, this borrow merging has to be done as an optimization during codgen. Note that this potential size increase does not apply to capture-by-move/copy, which should become less expensive no matter how we implement this. PathsCurrently this RFC proposes that only true lvalues get split up. Encountering a let foo = Box::new(Foo { a: 1, b: 2 });
// all generate the same MIR:
|| &foo.b;
|| &Deref::deref(&foo).b;
Closure0 { &foo }; This is consistant with the behavior of block expressions: let mut foo = Box::new(Foo { a: 1, b: 2 });
let _a = &mut foo.a;
{ &foo.b; } // cannot borrow `foo` (via `foo.b`) as immutable because `foo` is also borrowed as mutable (via `foo.a`) Admittedly, the RFC needs to make this rule more clear. Drop Ordering
I panic!ed a bit when I saw this drop-ordering example, since it demonstrates a case where the lifetime of a reference might increase as a result of this RFC. Thankfully, I have been unable to turn that into a backwards-incompatible borrowck error. Still, that doesn't mean we are out of the woods. I am particularly interested to see if someone can cook something up involving NLL. Alas, Rust does provide guarantees on field drop order (which this would change). We can't forbid the splitting of structs that contain |
I don't think this is well-defined because it requires cooperation between:
So I don't think this can be an opportunistic optimization at all, which means it has to be encoded in the type and guaranteed throughout, even if behaves like something else. |
@eddyb Yes, it would need quite a lot information about where the borrows came from. It can't be opportunistic, as you said. Still, it has to happen during codegen if we want to avoid changing the borrowck implementation. And because the point is to decrease memory usage, it can be thought of as a sort of optimization. Or maybe not. I could be thinking about this wrong! |
Argh, I apologize again for the lengthy delay in replying! This slipped off my radar, obviously.
👍
Agreed. Doing a "merge" operation like that after borrowck seems perfectly reasonable. I am not even sure if needs to be detailed in the RFC -- just adding a note that closure representation is undefined and that we are free to do such a thing seems sufficient to me.
Yep.
I don't know what a "true lvalue" is -- can you define that a bit more precisely? It sounds like you mean "no dereferences", particularly given this quote:
If so, I don't think this is a good idea. Consider this example: struct Foo {
a: u32,
b: u32,
}
impl Foo {
fn bar(&mut self) {
let increment_a = || self.a += 1;
let increment_b = || self.b += 1;
// ...
}
} These closures ought to be borrowed
So, that behavior is specific to overloaded deref (and, in fact, in MIR borrowck, we've gone back to treating
Say more? This restriction would only apply to fields of structs that impl drop and which move bits of themselves into closures. We already have similar restrictions on moving out from such structs, it seems reasonable to account for that here too. (We could probably give a decent error message too, spelling out that the entire variable had to be moved because of the |
I don't agree here. I mean, what you are saying is technically true, but we know a lot about the way the "origin" of closure types, and we can certainly include hints that say that all these references come from "adjacent" things. Moreover, at the time that we are optimizing the MIR for the closure, we also have access to all of its creators, so we can change them in concert -- no? |
How would that work? Right now the layout is computed from the type itself, which can easily leak out (e.g |
Something like that. Anyway I don't think it's so important. |
Gentle ping =) I'd particular like to discuss the meaning of this comment of yours, which I still am not sure I understand:
|
@nikomatsakis Sorry about the silence on this, school's been keeping me busy. Any overloaded deref calls can not be moved outside of the closure. The plan is to desugar My understanding is that overloaded derefs generate an additional statement in MIR. Unlike |
I'm looking forward to having this automatic/implicit disjoint capturing in the language because I have to do it manually very often.. @samsartor |
@Boscop Yes, but we can't guarantee that there are no side effects, and rust is a language of guarantees. I am mostly fine with partially pre-evaluating closures in theory, but it would be a breaking change. More importantly, precalling deref would not make this RFC more useful. Paths that include an overriden deref are not disjoint ( |
I'm not sure quite what you mean. In any case, it is as @eddyb said -- the move keyword means that the compiler doesn't create references into the enclosing stack frame. But you it does not mean that no such references exist -- just that the compiler hasn't added any. It is tricky to phrase it properly.
This isn't really how I think about it. Put another way, in your example, you do own the value That said, the example you gave is somewhat complicated by the use of indexing. I assume we will not capture "some random index", since we don't suport moving out. Perhaps a better example is: fn hello() {
let tuple = (1, 2, 3);
let p = &tuple;
let foo = move || p.0;
} Does this capture I guess I still feel like we should move on from the "RFC" period -- because I think we really want to solve this problem and we should move on to implementing and exploring the solutiosn -- but I do think there is a lot of "nitty gritty" to work out here. I worry about the rules being quite hard to follow. :/ |
@nikomatsakis The way I see it, that example would probably capture a reference to I don't like saying that "a Also, in my example, the indexing isn't relevant at all. It's just there to make the closure a little less boringly trivial. Really, it's all just semantics at this point. We all intuitivly know how this should go, and I agree that we really just need to get some code down and see how it works in practice. Are you happy with the current text? I can change anything today if you want. I would be nice to see this merged in the next day or so. |
@nikomatsakis I think I see what is going on here!
You are right of course. However, this RFC is about how captures are split up, made more minimal, so that closures work better alongside mutable references and partial moves. How we split up a capture does not depend on wether the closure is It only makes sense to split a reference by borrowing fields. It only makes sense to split a value by moving fields. If a Sure, strictly speaking, that Am I making any sense? |
Whenever you want this RFC merged, please write a comment ccing me saying so :) |
@samsartor interesting. I think that is going in the right direction, yes. You can think of it perhaps as a kind of iterative refinement. Let's say we start out capturing all of the local variables. Then we iteratively make those paths more precise based on which paths are used in the closure and how. A capture of some place Seems to make sense. |
@nikomatsakis YES! Looks like we are on exactly the same page. |
@nikomatsakis @Centril Ping again. Can we merge this now? It sounds like everyone is happy. |
@samsartor apparently your first ping didn't reach me :/ I'll merge this sometime today :) |
Huzzah! This RFC has been merged! Tracking issue: rust-lang/rust#53488 |
TL;DR: Closures will capture individual fields and not the whole struct when possible.
This RFC proposes that closure capturing should be minimal rather than maximal.
Specifically, existing rules regarding borrowing and moving disjoint fields
should be applied to capturing. If implemented, the following code examples
would become valid:
Resolves rust-lang/rust#19004
Rendered
Tracking issue