-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Settle execution order uncertainty for +=
#28160
Comments
See also #27868. |
I think this is a degenerate case, and should be at least linted against. |
I'm surprised this is allowed by the borrow checker (regardless of non-lexical lifetimes).
, but then modified here
despite being borrowed. |
@petrochenkov |
@llogiq |
There is no borrow. |
@llogiq I mean, the effective signature of |
Ah, I see. You're right, |
The following code prints fn main() {
let mut a;
a += { a = 2; 3 };
println!("{:?}", a);
} |
Does it add the address of &a to the value? |
Edit: erroneous example removed. |
@llogiq no. 3 is getting added to uninitialized memory. It's a repeating byte (0x1d) followed by a 0x1d + 3 byte. http://is.gd/Q0Hhr3 |
IMO AddAssign::add_assign(&mut self, rhs); where AddAssign is some trait added in the future to overload a = 0;
AddAssign::add_assign(&mut a, &{ a = 22; 2 }); |
@nagisa Yes, that could be a valid desugaring, but no, the borrow doesn't have to happen before There can be no memory unsafety here, even if borrowck allows
However, there's an evaluation bug in trans here (looking at @tomaka's example): the LHS must not be read before evaluating the RHS, but the current implementation disobeys that rule and can cause UB AFAICT. EDIT: clarified some bits here and there. |
Currently borrowck evaluates fn main() {
let a = Box::new(2);
let mut b = 2;
*{ *a; &mut b } //~ ERROR use of moved value
+=
{ drop(a); 1 };
}
Unfortunately, evaluating let x: &mut [u8] = calculate();
x[i] = x[j]; As it would translate into
Here the borrow of One way to fix this (suggested by @eddyb) is to move the lvalue evaluation (including the bounds check) to after the RHS computation, which relies on lvalue evaluations not being able to invalidate borrows. We also thought of potentially packing coercions into the lvalue evaluation. This could also make code like |
Heh. fn report(a: &Vec<i32>) {
println!("{:?} {:?} {:?} {:?}", a.len(), a.capacity(), a[0], &a[0] as *const _);
}
fn main() -> () {
let mut a = vec![0];
report(&a);
let xxx = &a[0] as *const _;
println!("{:?} {:?}", xxx, unsafe { *xxx });
a[0] += {
report(&a);
a.push(1);
report(&a);
a.push(2);
report(&a);
3
};
report(&a);
println!("{:?} {:?}", xxx, unsafe { *xxx });
}
1 1 0 0x7f4386024008
0x7f4386024008 0
1 1 0 0x7f4386024008
2 2 0 0x7f4386024008
3 4 0 0x7f4386023000
3 4 0 0x7f4386023000
0x7f4386024008 3 |
Since several people voiced their disagreement on the need for supporting assignment in the RHS, as a counterexample I have this piece of code written this week (coincidentally) which was miscompiled ( The relevant bit: dirty |= match event {
E::KeyboardInput(ElementState::Pressed, _, Some(key)) => {
dirty |= root.dispatch(&ui::event::KeyDown(key));
for e in key_tracker.down(key) {
dirty |= root.dispatch(&e);
}
false
}
...
}; And the workaround looks like this: dirty |= match event {
E::KeyboardInput(ElementState::Pressed, _, Some(key)) => {
let mut dirty = root.dispatch(&ui::event::KeyDown(key));
for e in key_tracker.down(key) {
dirty |= root.dispatch(&e);
}
dirty
}
...
}; Is this an usecase we want to support or not? If there is an easy way to trigger borrowck errors on such usecases (or warnings, but that seems less likely), can we get a crater run to estimate the prevalence of LHS reads/writes in the RHS of |
@arielb1 You could argue that bounds-checked indexing is not "pure" and thus cannot avoid being borrowed, in some way, unlike other lvalues. |
On IRC I came up with this example which seems to make the smarter schemes fall apart: let mut boxed = Box::new(vec![...]);
(*boxed).push({
mem::replace(&mut boxed, Box::new(vec![])).len()
}) Do we need to differentiate between |
This is annoying because overloaded deref can be both. We know about the bugs with |
One proposal that mostly maintains LTR and handles DefinitionsAn lvalue expression is basically the current rustc lvalue expression. Unresolved Question: do we include overloaded index/deref in index/deref lvalues? this makes more code compile but could be confusing in some cases?
Evaluating an expression "to an lvalue" is just evaluating all the rvalue-expressions in it. Note that after an expression is evaluated to an lvalue, finalizing its evaluation can still read memory and possibly run user code (if we allow overloaded derefs/indexing). Evaluating an expression "to an rvalue" is just standard evaluation. To evaluate an expression with a receiver, that's it "LHS = RHS", "LHS OP= RHS", "LHS.foo(ARG1, ARG2)", first the pre-final-autoref receiver is evaluated to an lvalue, then the other operands are evaluated to rvalues, then the receiver evaluation is finalized and autoref-ed if needed. Unresolved question: should this happen with by-value-self taking methods too? Provide an example for and against. ExamplesSimple assignmentx[I] = x[J];
// equiv
let i = I;
let rhs = x[J];
x[I] = rhs; Simple function calla.b.f(a.b.g(), a.b.h())
// equiv
let arg0 = a.b.g();
let arg1 = a.b.h();
a.b.f(arg0, arg1); // potentially with overloaded autoderef Changing receiver(*boxed).push({
mem::replace(&mut boxed, Box::new(vec![])).len()
})
// equiv
let arg0 = mem::replace(&mut boxed, Box::new(vec![])).len();
Vec::push(&mut *boxed, arg0); // this can be surprising, I guess. Changing receiver, deeplet y = Vec::new();
(***boxed).get({boxed=&&&y; 42})
// equiv
let y = Vec::new();
let ix = {boxed=&&&y; 42};
<[u8]>::get(&<Vec<u8> as Deref>::deref(&***boxed), ix) Cutting your own receiverlet boxed: Box<&[u8]> = get();
boxed.get({drop(boxed); 4})
// equiv
let boxed: Box<&[u8]> = get();
let ix = { drop(boxed); 4 };
<[u8]>::get(&**boxed) //~ ERROR use of moved value Cutting your own receiver, by-valuelet a: &mut [u32] = &mut [1,2];
let b: &mut [u32] = &mut [3,4];
let c;
a.get_mut({a=b; c=a; 1})
// equiv
let ix = {a=b; c=a; 1}
<&mut [u32]>::get_mut(a, ix) //~ ERROR use of moved value Chained function calla.b().c().d()
// equiv
let t0 = a.b(); // this is an rvalue
let t1 = t0.c();
let t2 = t1.d(); Pushing lengthx[ix()].push(x[0].len());
// equiv
let index = ix();
let arg0 = x[0].len();
x[index].push(arg0); Pushing length, via functionx.get_mut().push(x[0].len());
// equiv
let t0 = x.get_mut();
let len = x[0].len(); //~ ERROR cannot borrow
t0.push(len); Simple assign-opdirty |= SOMETHING_MODIFYING_DIRTY;
// equiv
let rhs = SOMETHING_MODIFYING_DIRTY;
dirty = dirty | rhs; AdvantagesMost code should compile and do something sane. DisadvantagesIndexing/deref is handled somewhat differently from normal methods. If we don't allow overloaded deref/indexing, they can behave differently from primitive deref/indexing. If we do, their order-of-evaluation can be somewhat confusing. The handling of coercions needs to be thought about - if they can occur in the middle of an lvalue, they can break it. It would be nice if someone who knows them could provide an example. This also changes order-of-evaluation, which could subtly break existing code. CoercionsCould someone help me there (@eddyb, you know these best) Lvalues and derefWith the borrow checker, overloaded deref as well as deref of This is actually not totally precise: references can be "leaked". A by-value Dereference of |
ApplicationWhen should the "evaluate lvalue's rvalue components first, then lvalue" rule be applied? Only to method calls and assignment ops? Everywhere an lvalue is required? Does this include the pseudo-lvalue of |
@arielb1 I think the "Simple assignment" should read
|
@nikomatsakis (replying here because discourse refuses to work)
I don't really have a problem with a "typically LTR, receiver and assignment
You can't have the trivial desugaring anyway - |
@arielb1 I don't think changing the order of assignment for receiver is a On Fri, Sep 11, 2015 at 4:48 PM, arielb1 notifications@github.com wrote:
|
Given that commenters in here have demonstrated undefined behavior in here, I'm tagging this with I-soundness. Personally I'd like to keep the rules as simple as possible even if it means a trivial amount of breakage (which is permissible from fixing soundness flaws). If the operators can be desugared to method calls as you'd expect everywhere else, that would be ideal. |
This will not be unsound with the MIR, just cause more compile-time errors. |
@arielb1 , good to hear, but I'd prefer not to remove the label until we start trans-ing from MIR, which won't be for a while yet. :) |
Now that MIR presumably has had to deal with this, what was the decision? |
Actually this has not been formally decided. (I mean, we've always had the impl doing something in any case, and it still does -- but is it the right thing?) I'm pretty dubious about changing the order of execution just to allow for fewer errors though -- I think we should stick to left-to-right whenever possible ( |
@nikomatsakis Huh, I'm very surprised Only overloaded augmented assignment seems to have LTR ordering in MIR. |
@nikomatsakis If I had to give one reason for evaluating assignments LTR it'd be that it enables RVO. |
@nikomatsakis I agree with @eddyb: LTR is better. It's how the rest of the language works, so it's less confusing. Also, I'd like it if |
See this internals thread: https://internals.rust-lang.org/t/settling-execution-order-for/4253 |
Putting all the examples into the playground now suggests soundness issues have been resolved. Maybe remove the label, since now it's just a matter of figuring out what the right execution order is? https://is.gd/2xjOGo (prints 5 rather than the uninitialised memory it did previously) |
I'm not sure that we have much leeway to change this at this point anyhow. Maybe a little. =) |
It could use someone to kind of draw together all the considerations and make a summary of current status, at minimum. |
Triage, @nikomatsakis any updates? |
@nikomatsakis @eddyb are there any updates on this? |
I thought this has been resolved forever ago, am surprised this is still open. |
I'm going to just close the issue. We're not going to change the MIR we generate now in any major way. If in the process of looking at some of the issues around defining the semantics of MIR we make tweaks here, we can address those individually. |
Note that this here is not a question of MIR semantics but of surface Rust semantics -- so MIR semantics discussion seems entirely orthogonal to me. |
Yes, I know. What I meant was: if we uncover some kind of problem with the current lowering in the course of that work, we might consider a specific proposal, but it seems obvious that we don't want to be altering the observed semantics of running code at this juncture absent a very concrete proposal and strong motivation. |
When translating something like
a += b
to MIR, an uncertainty arose about what the semantics ofought to be. Should resulting value of
a
be24
or2
, basically? The current MIR yields 24, on the basis that this is more-or-less what will happen in the case of overloading as well (presuming non-lexical lifetimes, since I think otherwise you get a borrowck error).The text was updated successfully, but these errors were encountered: