Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be a way to unwrap MaybeUninits without moving them #61011

Open
Manishearth opened this issue May 21, 2019 · 25 comments
Open

There should be a way to unwrap MaybeUninits without moving them #61011

Manishearth opened this issue May 21, 2019 · 25 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

MaybeUninit has an API where you construct a MaybeUninit<T>, can operate on it via raw pointer methods, and then can call assume_init() to move the inner value out. It also has methods that let you assume that it is initialized and obtain safe Rust references.

This is great for temporary initialization on the stack, but this doesn't work so well for the heap. What if you need to gradually initialize Vec<T> or Arc<T> heap values? This ends up leaving a carcass of a MaybeUninit in the type forever, basically infecting the type with an unsafe abstraction from the time of its initialization.

One solution here is to make sure MaybeUninit and ManuallyDrop have a repr of T (basically, repr(transparent), except we haven't defined that for unions) so that you can always transmute &MaybeUninit<T> to &T, or Arc<MaybeUninit<T>> into Arc<T>.

The fact that MaybeUninit has methods specifically for doing this kind of thing with slices seems to point to the need for this ability to be generalized.

Either way we should have clear documentation for using MaybeUninit with values not on your stack (on the heap, or passed in as a reference). Ideally we allow you to transmute it in such cases (and have examples doing this), but if we can't, we should still document that you shouldn't use it that way. (I think that this choice would preclude a lot of legitimate and necessary use cases, especially in FFI)

cc @eddyb @RalfJung

@jonas-schievink jonas-schievink added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels May 21, 2019
@RalfJung
Copy link
Member

RalfJung commented May 21, 2019

One solution here is to make sure MaybeUninit and ManuallyDrop have a repr of T (basically, repr(transparent), except we haven't defined that for unions)

That would be rust-lang/rfcs#2645. But yes, the intention is that you can do such transmutes. And indeed the section at https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#layout kind of says that you may do that, doesn't it? I guess it could be more explicit. PRs welcome!

Interaction of MaybeUninit and Box is on the list of planned API extensions (basically: a version of assume_init that does Box<MaybeUninit<T>> -> Box<T>). But this is a strong update, so I'd be curious how anything like this could work with Arc<T> (which is not a unique pointer and hence cannot have strong updates).

@Manishearth
Copy link
Member Author

Yeah, that's good, but it's unclear what kinda of transmutes are allowed when you're sure it's definitely initted. Examples would be good (I might add some).

The Arc case is where you've initialized the Arc with uninit contents and are the sole owner. Once you've filled in the contents (maybe via ffi!) you should be able to turn it into a "normal" arc.

@Manishearth
Copy link
Member Author

Manishearth commented May 21, 2019

This also becomes important for FFI: the reason we made the repr(transparent) RFC in the first place is that noalias happens at the type level and you need a way to opt out of such semantics for data from FFI, so you need to treat the value as if it were UnsafeCell-wrapped.

There's going to be a similar situation here with FFI providing or working on uninitialized places.

@RalfJung
Copy link
Member

Yeah, the aforementioned RFC was opened because of by-value-FFI concerns. But otherwise it shouldn't matter. Also we had an astute lack of examples for passing uninitialized data by value.

@Manishearth
Copy link
Member Author

Why does it have to be by value? FFI can pass me a reference to a complicated C++ datastructure with some uninit stuff in it, I have to be able to operate on that safely.

@myrrlyn
Copy link

myrrlyn commented May 21, 2019

Speaking solely as an FFI author, I believe the MaybeUninit::write method, which is &mut MaybeUninit<T> -> &mut T, is sufficient for that. FFI has the ability to "cheat" here by just changing the type signature at various parts of the boundary, and permitting the foreign caller not to know the difference. As long as MaybeUninit is repr(transparent), it is invisible to FFI, and thus only a concern to Rust code.

Since MaybeUninit is a property of a location, rather than of a structure, I think that it may be useful to use it as an intermediary between raw pointers (which have no constraints on handle or referent) and references (which constrain both handle and referent). If I remember correctly, there is prior concept art on the creation of another reference type, &uninit T, which bridges the gap by requiring that the handle have a correctly non-null, aligned, address to a location, but this location can only be written, and only once, after which the reference degrades to &mut T.

In order to implement this in the library rather than in the language, I would advocate that MaybeUninit should add the following function:

impl<T> MaybeUninit<T> {
 pub unsafe fn from_pointer(ptr: *mut T) -> Option<&mut Self> {
  if ptr.is_null() || !ptr.is_aligned::<T>() {
   return None;
  }
  Some(&mut * (ptr as *mut Self))
 }
}

I believe this would address both my wants as an FFI author, and Manish's concerns as a Rust-internal author, by removing MaybeUninit from the outer type and having it occur solely within the context of an initializer function.

Pure-Rust usage of this might be to produce *mut T through the existing mechanisms – collection implementations use the allocator to produce their respective boxed types, which can produce raw pointers – and then convert that pointer into an uninitialized reference within function scope.

For cases like Manish described, where the allocation of a container happens separately enough from the initialization of its contents that the allocated and initialized container of uninitialized data must cross function boundaries before its interior is initialized, I would expect that such transformations would have to be methods on the container, rather than on MaybeUninit. Precedent exists in the from_raw_parts constructors on Box and Vec; the function

impl<T> Container<T> {
 pub fn deferred_init(uninit: Container<MaybeUninit<T>>, value: T) -> Self {
  uninit.write(value);
  unsafe {
   let disasm = uninit.raw_parts();
   Self::from_raw_parts(disasm)
  }
 }
}

would serve to remove the MaybeUninit from the type level.

Is this a useful summary of the various concerns about deferred and/or out-of-place initialization, and how they might be addressed?


Edit: propagate the unsafe upwards to mark from_pointer entirely

@RalfJung
Copy link
Member

RalfJung commented May 21, 2019

Why does it have to be by value? FFI can pass me a reference to a complicated C++ datastructure with some uninit stuff in it, I have to be able to operate on that safely.

The only additional effect repr(transparent) is to change the type's ABI so that it gets passed by-value the same way as its sole field. If you do not pass things by-value, all you need is for (size, alignment, field offsets) to be the same; you don't need repr(transparent) for that.

@RalfJung
Copy link
Member

RalfJung commented May 21, 2019

@myrrlyn

&uninit seems thoroughly out-of-scope for this discussion. ;)

pub fn from_pointer(ptr: *mut T) -> Option<&mut Self>

That is not a safe function. &mut MaybeUninit<T> still must be dereferencable for size_of::<T>() many bytes, which your function does not (and cannot) check.

But I don't understand what that has to do with the discussion here...?

@Manishearth
Copy link
Member Author

The only additional effect repr(transparent) is to change the type's ABI so that it gets passed by-value the same way as its sole field

Not necessarily, it means that &[T] and &[MaybeInit<T>] also have the same layout. Which is useful.

@hanna-kruppe
Copy link
Contributor

The only additional effect repr(transparent) is to change the type's ABI so that it gets passed by-value the same way as its sole field

Not necessarily, it means that &[T] and &[MaybeInit<T>] also have the same layout. Which is useful.

That, too, is ensured by T and MaybeUninit<T> having the same size and alignment, no repr(transparent) necessary.

@RalfJung
Copy link
Member

I would expect that such transformations would have to be methods on the container, rather than on MaybeUninit. Precedent exists in the from_raw_parts constructors on Box and Vec; the function

I imagined we'd have something like this for Box, you seem to be saying something similar?

impl Box<MaybeUninit<T>> { // not sure if this works but you get the idea
  pub fn uninit() -> Self { Box::new(MaybeUninit::new()) } // with some hack to avoid a stack allocation
  // also: zeroed, new

  unsafe pub fn assume_init(self) -> Box<T> { mem::transmute(self) }

  pub fn write(&mut self, value: T) -> &mut Box<T> {
    MaybeUninit::write(&mut *self, value);
    unsafe { mem::transmute(self) }
  }
}

Though I am still not convinced the return value of write is actually useful (and if it isn't we could just use MaybeUninit::write via deref).

@Manishearth
Copy link
Member Author

@myrrlyn

I believe this would address both my wants as an FFI author, and Manish's concerns as a Rust-internal author, by removing MaybeUninit from the outer type and having it occur solely within the context of an initializer function.

It doesn't address my concerns, since my concerns are all about MaybeUninit's semantics when embedded deeply inside some other structure, and a need to be able to get rid of it.

Things that can be written as a library don't matter to me: I can write such a function myself. My question is more about the fundamental boundaries of the UB surrounding this.

@rkruppe

That, too, is ensured by T and MaybeUninit having the same size and alignment, no repr(transparent) necessary.

Yes, but it's not clear that all such transmutes are safe (provided the T is fully initialized, of course.)

Figuring this out is what this issue is about. If such transmutes are always safe, then that's good, we should document that, and we should add examples doing it.

@RalfJung
Copy link
Member

Not necessarily, it means that &[T] and &[MaybeInit] also have the same layout. Which is useful.

That, too, is ensured by T and MaybeUninit having the same size and alignment, no repr(transparent) necessary.

Yes, but it's not clear that all such transmutes are safe (provided the T is fully initialized, of course.)

At this point I should also quote what @rkruppe said on Zulip:

However, the discussion there also highlighted a caveat: when layout optimizations are involved, it's sometimes impossible to treat A<MaybeUninit> the same as A (e.g., for A = Option) because MaybeUninit has no niches even if T has them. IOW, "transparent wrapper" only means same size and alignment, not same niches.

So basically the thinking is that such transmutes are safe only when the type constructors involved ignore the niche. I think we should be able to guarantee that arrays and tuples do ignore the niche.

@Manishearth
Copy link
Member Author

Manishearth commented May 21, 2019

Ah, that's a good point. I don't know how to document this well.

(Yes, I know this is mentioned on the layout side, but layout being the same is only a prerequisite to transmutes being safe)

@myrrlyn
Copy link

myrrlyn commented May 21, 2019

This seems like it'd want a marker trait to denote that types without niches are sound to transform MaybeUninit<T> -> T in layout, but types such as NonNull or NonZero or bool are not.

@RalfJung
Copy link
Member

@myrrlyn that only solves half the problem. &[MaybeUninit<bool>] and &[bool] are fine to be transmuted. It's less about the type inside the MaybeUninit but about the type around it: if that is &[_] we are fine, but if that is Option<_> we have a problem.

@Pzixel
Copy link

Pzixel commented May 22, 2019

It's fine having APi for Box<MaybeUninit<T>> -> Box<T>, but what for arbitrary nested generics? I believe there should be an API A<B<C<D<...<MaybeUninit<T>>...>>>> transmuting to A<B<C<D<...<T>...>>>> . As a general rule is maybe something like "when we call into_innter all MaybeUninit<T> gets recursively replaced with T".

Can this work? It cannot be expressed with language terms, but maybe some compiler builtin?

@eddyb
Copy link
Member

eddyb commented May 22, 2019

I think we should be able to guarantee that arrays and tuples do ignore the niche.

IMO we can do that by saying they are structural product types.
"product" implies there is no reason to take advantage of nested layout, but merely contain it.
"structural" implies the layout cannot be modified by some definition (which may not be the case at all, I guess? but it's better not to rely on it), and is determined entirely by its (structural) components.

@Gankra
Copy link
Contributor

Gankra commented May 22, 2019

I would just like to note that many of the potential use-cases of "MaybeUninit, but it goes away" for pure rust applications are just The Placer API, and I think it would be best to pursue those applications through that avenue.

@benkay86
Copy link

benkay86 commented Sep 1, 2019

It looks like you can just transmute from Box<MaybeUninit<T>> to Box<T>, see line 44 in Rust Playground.

@RalfJung
Copy link
Member

RalfJung commented Sep 2, 2019

There's even an unstable function which you can use instead of the transmute: https://doc.rust-lang.org/nightly/std/boxed/struct.Box.html#method.assume_init

@Pzixel
Copy link

Pzixel commented Sep 2, 2019

I'd really like having something like:

fn unwrap_transparent<T<_>, R: ReprTransparent>(value: T<R<_>>) -> T<_> { 
   unsafe { mem::transmute(value) } 
}

fn unwrap_uninit<T<_>>(value: T<MaybeUninit<_>>) -> T<_> { 
   unwrap_transparent(value);
}

Or even more generic like unwrap_uninit :: forall a. (f :: * -> *) a -> ... to make it possible to convert Vec<HashSet<Option<MaybeUninit<i32>>>> -> Vec<HashSet<Option<i32>>> without additional costs.

@ia0
Copy link
Contributor

ia0 commented Jun 24, 2024

I started a discussion about strong updates (a possible solution to this issue): https://internals.rust-lang.org/t/strong-updates-status/21074.

As noted by Ralf, exclusive access is needed, so it won't be possible to remove a MaybeUninit from an Arc. The Arc content needs to be exclusively owned for that (either proved using unsafe or by using an intermediate UniqueArc or BoxArc type).

As noted by Gankra, the placer API would solve most of the use-cases. But I don't think it will solve all. I'm thinking about type-states:

impl Door<Closed> {
    fn old_open(self) -> Door<Open>;
    fn new_open(self: &mut [Door<Closed> .. Door<Open>]);
}

One can implement new_open() by writing the open door in place. But support is still needed to describe strong updates at type level (including mutable references).

@RalfJung
Copy link
Member

RalfJung commented Jun 24, 2024 via email

@ia0
Copy link
Contributor

ia0 commented Jun 24, 2024

Type-changing mutable ref is a significant language extension, please discuss that in separate thread(s). :)

Yes, that's the first sentence of my message :-) a link to an existing discussion for those looking for possible solutions (since the placer API seems dead AFAICT).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants