-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add &own T
#965
Add &own T
#965
Conversation
Unresolved question, how to get a |
Unresolved question: how would deref coercions work? |
Unresolved question: is it possible to convert |
Three thoughts:
|
You're correct. I didn't consider that you could drop the container and the contained object at different points. I've updated the detailed design of the RFC. Unfortunately this makes
When you drop a I think I've addressed your comments in the new detailed design. |
The restrictions in place are somewhat natural:
|
Just to document this since I accidentally killed the first revision: A previous version of this RFC supported |
I've renamed |
I would expect |
Yes,
Both interpretations seem valid. |
I had some concerns with being able to return fn soundness_hole() -> &out T {
let x: T = T::new();
&out x
} |
@aidancully: Like all references, the lifetime of |
@mahkoh Yes, that makes sense. If I can say, lifetime annotations should probably be added to I am for this RFC. |
I have to say, from reading the RFC I can't quite grasp why this is desirable. The examples seem to boil down to "This gives you trait objects you can move out of, without requiring a heap allocation". |
|
||
```rust | ||
fn f() { | ||
let x: Option<String> = Some(String::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut x
I don't really see how Why would we need references to move out of, if we could just move the type directly. If you need to move out of traits, use generics. I see that something needs to be done about slices, but I'd much prefer to have unsized arrays. |
Let me try to explain myself some more. I think the idea that I have a reference to an object I own but I don't own/care about the memory it is stored inWhy this is useful for sized objectsAs @oli-obk pointed out, you can already transfer ownership of sized objects by passing them by value. This also requires you to copy the object to a place where the new owner can see it. Most of the time this is absolutely what you want and what you should do! There are, however, certain situations where you don't want to do this: The first case is when you want to precisely control when something is copied. If I have a large object fn f(x: X) -> Box<X> {
Box::new(x)
} This will first copy The second case is when your object is in a smart pointer. A smart pointer (currently the only one that exists is fn f(x: X, y: Box<X>) {
g(x);
g(*y);
}
fn g(x: X) { /* ... */ } The function fn f(x: X, y: Box<X>) {
g(box x);
g(y);
}
fn g(x: Box<X>) { /* ... */ } This time fn f(x: X, y: Box<X>) {
g(&own x);
g(&own *y);
}
fn g(x: &own X) { /* ... */ } C has these features because C's pointers are fundamentally unsafe, you cannot distinguish between owned and borrowed pointers. Rust's references are much safer but they also can't express the case where a pointer is owned. Rust is a systems language and I believe that something like this should be expressible in a systems language. Why this is useful for slicesIn Rust, slices are unsized arrays. Slices are written If a functions wants to flexibly accept slices of type fn f(x: &mut [T]) { /* ... */ }
fn g(mut x: [T; 16], mut y: Vec<T>, mut z: Box<[T]>) {
f(&mut x[..]);
f(&mut y[..]);
f(&mut *z);
} The problem here is that the ownership of the elements in the slices cannot be transferred to One way to work around this is by having fn f(x: Box< [T]>) { /* ... */ }
fn g(x: [T; 16], y: Vec<T>, z: Box<[T]>) {
f(box x);
f(y.into_boxed_slice());
f(z);
} This again requires an unnecessary allocation for fn f(x: &own [T]) { /* ... */ }
fn g(x: [T; 16], mut y: Vec<T>, z: Box<[T]>) {
f(&own x[..]);
f(y.as_owned_slice());
f(&own *z);
} No unnecessary allocations this time. One additional benefit is that, after fn as_owned_slice(&mut self) -> &own [T] {
unsafe {
let slice = mem::transmute(self.as_slice());
self.set_let(0);
slice
}
} Why this is useful for traitsAs with slices, there is no way to pass ownership of traits without allocating a fn f<T: Trait>(x: T) { /* ... */ } There are some downsides to this: Monomorphization can cause significant code/binary bloat for little benefit. While monomorphization avoids a virtual function call, this is often not necessary, even in a language used for systems programming. The syntax becomes unwieldy when used for multiple arguments: fn f<T: Trait, U: Trait, V: Trait>(x: T, y: U, z: V) { /* ... */ } It is not possible to pass slices this way. The following code will only ever accept one type: fn f<T: Trait>(xs: &mut [T]) { /* ... */ } Since you currently can only express owned traits with the trait Trait {
fn f(self: Box<Self>) -> T;
} which is a backwards compatibility hazard if we ever get other smart pointers. With owned trait references this becomes trait Trait {
fn f(&own self) -> T;
} |
wow... just reading stuff from you makes me want to accept everything you say :D You defend your arguments well.
Are you inferring this or are there discussions on this? I was under the impression we were trusting LLVM on this, even more so since we got jemalloc-optimizations last week (optimizing out jemalloc allocations in situations like boxing just for ownership). Especially since the book says that we should not return boxes: http://doc.rust-lang.org/book/pointers.html#returning-pointers Though I'm not sure about unboxing and reboxing. So just to be sure: Your suggestions are just optimizations to prevent copies and allocations? If so, I'd rather suggest making the internal optimizations a language-guarantee like tail-call-optimizations in Scheme |
Thanks.
I think there is actually an issue in the Rust repo that explicitly says this short and sweet. But I can't find it right now. You can also read the motivation in this RFC: https://github.com/rust-lang/rfcs/blob/8486fed94fbe85d8da62f3dd6ffddd58564c37f6/text/0000-placement-box.md
LLVM is not part of the language while references are. It is true that LLVM might optimize this away, it currently does not (if that has not changed very recently). Another Rust backend might do different things. You can do this in explicitly in C and I believe it should also be possible in Rust.
It is true that fn f() -> X {
X {
/* ... */
}
} will be optimized to code that does not copy. However this breaks down when you give the return value a name: fn f() -> X {
let mut x = /* ... */
/* fill x with data */
x
} This will cause a copy. The thing Rust would have to implement to make this not copy is called named return value optimization.
I think it's not "just optimizations". Being efficient is one of the explicit goals of Rust. |
But it should do so automatically, without requiring the programmer to write things just to make the code efficient. In C++ we learned the hard way NOT to use
It could simply be a rule, that a rust compiler needs to optimize this properly.
Alas, lets do it.
See rust-lang/llvm#37 . llvm detects intermediate heap allocations: http://is.gd/LPQJbJ . It does not (yet?) detect intermediate heap deallocations: http://is.gd/liX8kO |
You cannot make such a guarantee. Rustc can't even do it right now. If there is a regression in LLVM then you've suddenly broken the language specification. How would that guarantee even look like? In some cases (sufficiently small structs) it might be better to pass the struct by value. When I write Rust I want to approximately know what the generated assembly looks like. Certain core team members have expressed similar sentiments when speaking about Rust publicly. When I write I think I might have undersold the RFC a bit when I replied above to the statement that it is "just about optimizations". Not having owned references will affect libraries because APIs will have to accommodate for the fact that you can't move out of slices/references/traits. I already mentioned above that the People learn from the first day that creating boxes is something you try to avoid at all costs in Rust. If Let's consider an API where the user wants to move a variable number of objects into a function: fn f<T>(xs: &own [T]) { /* ... */ } Without this syntax available the library might look like the following: fn f_clone<T: Clone>(xs: &[T]) { /* ... */ }
fn f_vec(xs: Vec<T>) { /* ... */ } Even though the library really just wants to move out of a slice and has no need for either |
I have come around to your view. This could also be used in things like Vec::remove (which now exists for ranges, but out of necessity returns an iterator) |
On a "language design" level: |
Same way with my compiler magic as with I'm not against these semantics for passing ownership into functions without copying memory, I'm just not sure if it's worth extra syntax when a bit of compiler magic can get us half the way there. |
What if |
How does deref-consuming of |
In that case it does get passed by value. Not sure whether the box should get deallocated before or after the call in that case. In fact, "after the call" should probably read "at the end of the statement" (when temporaries get destroyed). But I'm not sure if the language even needs to specify when exactly the box has to be deallocated, since the difference usually isn't visible to user code (unless using a custom memory allocator).
No idea, but it's definitely compiler magic. |
Just thinking about that actually. Arguably we should have: struct Box<T>(&'static owned T);
struct EmptyBox<T>(&'static undef T); Taking I don't want to bring back typestate, but... such symmetry :) |
There has been some discussion (e.g. a smart pointer has to specify a second destructor that is run if and only if you've moved out before drop) but this is orthogonal to |
E.g. struct Container<T> {
random_field: Vec<i32>,
data: ManuallyDrop<T>,
}
trait Smaht: DerefMut {
/// Used when moving out or using `&own *Smaht`.
fn deref_move(&mut self) -> &own <Self as Deref>::Target;
/// Drop method called if the content has been moved out.
fn drop_moved(&mut self);
}
impl<T> Smaht for Container<T> {
fn deref_move(&mut self) -> &own T {
unsafe { mem::transmute(&*self.data) }
}
fn drop_moved(&mut self) { }
}
impl<T> Drop for Container<T> {
fn drop(&mut self) {
unsafe { ptr::read(&*self.data); }
}
} ManuallyDrop comes from another RFC. The content of ManuallyDrop is not automatically dropped in the destructor. |
On the syntax side: We can use Variant 1: Note that
So either we'll have to bump some form of 'language version' to allow for new keywords or syntax changes, or we'll need some disambiguation rule. I'd guess that |
That's not given at all.
It's orthogonal to this RFC. |
Consider: Rust currently has generic functions that take T by value. Those functions are necessarily restricted to sized types, but we could relax those restrictions if we allow passing unsized types by value. I don't want us to end up like C++ where all functions take In that light, what motivation remains for |
I'm really hoping this doesn't happen. One of the things I found really appealing about Rust follows naturally from the default being immutable values and "mut" being explicit. For a read-only value, it doesn't semantically matter if you pass it by reference or by value, other than efficiency, so the compiler can always choose to DTRT (e.g. "if the value fits in a processor register, pass by value, else pass by reference"). Which means that the undecorated type name is the thing you almost always want. |
@joshtriplett The availability of the facility does not force you to use it. You'd be free to continue passing by value wherever makes sense, and most people would in the same way that most people don't currently pass boxes around. On the other hand, the absence of the facility means that users are forced to assume that any time ownership changes, the address may also change. Usually this won't matter, but there are cases in which it does, especially with FFI. Further, as @mahkoh showed above not having The one thing that still sticks a little for me is that |
@aidancully The language (as opposed to libraries) being wider is not necessarily a feature. I would suggest, if there's a use case for this that isn't just efficiency, the motivation in the RFC needs to very clearly spell that out. Because if this is just about efficiency, the compiler can and should handle that. If this is needed to build certain interfaces that can't be built otherwise, then those use cases should be spelled out very clearly. The comment you linked to seems to assume that any use of "T" means a copy, but that's not the case. Assuming appropriate compiler optimizations, what use cases remain that can't be expressed without &own ? |
@joshtriplett A by-value iterator over a stack allocated array of a constant number of For what it's worth, I've run into these limitations surrounding owned DSTs several times in real code - these aren't just hypothetical concerns. |
@reem That makes sense; that couldn't be written without language changes. Could that also be handled by allowing code to pass a value of slice type by non-& type, with the compiler optimizing that by passing it by pointer internally? |
No, that's not what I meant at all. That's why I used the word "may" in "users are forced to assume that any time ownership changes, the address may also change." (emphasis added) In other words, it's currently impossible for an FFI to rely on an address not moving, when ownership can change. |
I wasn't referring to your comment; I was talking about the rationale in #965 (comment) , which seemed to assume that a non-& type always meant a copy. I'm not trying to argue that there are no use cases where &own makes sense; I'm asking that such use cases be added to the rationale in the RFC (along with why none of T, &T, &mut T, or a library type wrapping an unsafe T* will work for those use cases), to make it clear that this is about more than just optimizations. |
I guess that's like the old saw that any post correcting a spelling error will always include at least one spelling error, sorry, I misunderstood your response. I see why you say that, but the part referring to moves requiring copies was only one part of the comment linked ("why this is useful for sized objects"). While I agree with your criticism of that part (pass by value will be more optimizable than pass by |
My feeling is that it is way too soon to consider adding new builtin pointer types. In teaching Rust, I've found that the current setup ( In any case, I think many of the use cases in this RFC can be addressed using library types. For example, @eddyb once observed to me if we have custom allocators, then you can probably express the idea of an owned value that resides in an outer stack frame (i.e., |
I probably need to review everything else said here, but I have to make one thing clear: |
@eddyb yes, sorry, I didn't mean to imply that there was no need for language treatment, just no need for builtin types. (That said, it may be that we can find some kind of solution that doesn't involve growing the language at all, I don't know. I haven't thought about this in a while and would like to revisit it at some point.) In any case, we discussed this RFC at triage and came to the conclusion that it should be postponed. There is definitely a real use case here but it's not yet time to dive into it. I was surprised that I did not find a suitable pre-existing issue, so I opened #998. Thanks @mahkoh for another thoughtful RFC. |
A question about a case where &own might have some useful semantics: Suppose you have a structure "Options" with a ::new() constructor and a set of chainable self-mutating methods that each return a &mut to the structure. You can write this:
And you can write this:
But you can't write this:
That fails the borrow checker because the return value from ::new() owns the Options structure, and the &mut references returned from the chained methods just borrow it, so it dies at the end of the first statement, while opts lives until the end of the block. If &own existed, with the proposed semantics, could .opt1 and .opt2 accept and return an &own instead of a &mut to allow the above syntax? |
A builder.
Yes. You can still do the same by passing self by value, though it might become inefficient for large |
@nagisa Oh, that makes sense. So the compiler can relatively easily decide to optimize both exactly the same way, making &own again just an optimization hint that the compiler could derive on its own for builders. |
Add reference and slice types that take ownership of the objects they reference.
The motivation has not yet been updated. You might want to read the following and subsequent posts instead of the motivation in the RFC: #965 (comment)
Rendered