Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Rc::into_raw, Rc::from_raw, Arc::into_raw, Arc::from_raw #37197

Closed
cristicbz opened this issue Oct 15, 2016 · 48 comments
Closed
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@cristicbz
Copy link
Contributor

cristicbz commented Oct 15, 2016

Methods being added in #37192.

bors added a commit that referenced this issue Nov 8, 2016
Add `{into,from}_raw` to Rc and Arc

These methods convert to and from a `*const T` for `Rc` and `Arc` similar to the way they work on `Box`. The only slight complication is that `from_raw` needs to offset the pointer back to find the beginning of the `RcBox`/`ArcInner`.

I felt this is a fairly small addition, filling in a gap (when compared to `Box`) so it wouldn't need an RFC. The motivation is primarily for FFI.

(I'll create an issue and update a PR with the issue number if reviewers agree with the change in principle **Edit: done #37197**)

~~Edit: This was initially `{into,from}_raw` but concerns were raised about the possible footgun if mixed with the methods of the same name of `Box`.~~

Edit: This was went from `{into,from}_raw` to `{into,from}_inner_raw` then back to `{into,from}_raw` during review.
@cristicbz
Copy link
Contributor Author

@alexcrichton shouldn't this have a B-unstable label? Don't know if it matters, but I figured I'd check.

@alexcrichton
Copy link
Member

Oops, indeed!

@alexcrichton alexcrichton added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 21, 2016
@alexcrichton
Copy link
Member

And while we're at it, I'd like to stabilize this!

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Dec 21, 2016

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Amanieu
Copy link
Member

Amanieu commented Dec 26, 2016

Is there any chance of putting from_raw and into_raw into a trait or would that be a breaking change for Box?

@Amanieu
Copy link
Member

Amanieu commented Dec 26, 2016

Also I think from_raw and into_raw for Rc and Arc should accept unsized types, just like the implementations for Box.

@alexcrichton
Copy link
Member

@Amanieu could you elaborate on why you'd like to see these as a trait? I do think that'd be a breaking change to remove the inherent methods on Box (unless we add the trait to the prelude), but that doesn't meant we could also add a unifying trait later. We've just tended to avoid premature abstraction in libstd with extra traits.

Also on a technical level is accepting unsized types possible? Consider something like going from Rc<Trait> to *mut Trait. The vtable for *mut Trait has to come from somewhere, and the only one we have is the vtable for Rc<Trait>. Those methods expect the pointer being passed into them is *mut RcBox<T>, not *mut T. In that sense I don't think we could support unsized types?

@Amanieu
Copy link
Member

Amanieu commented Dec 27, 2016

@alexcrichton I'm building a safe interface for intrusive-collections, which works by having a collection "take ownership" of an owned pointer. This is done by calling into_raw before inserting an object into a collection, and calling from_raw when an object is removed from a collection. I currently define an IntrusivePointer trait myself which does this, so I guess a trait in the standard library isn't really needed.

Regarding vtables, I was under the impression that the vtable pointer in *mut RcBox<Trait> pointed to the same vtable as *mut Trait since the fat part of the point is only associated with the unsized part of a struct (Trait). This means that the compiler will first extract a pointer to the Trait before passing it to the vtable methods. On the other hand I am not very familiar with how vtables work in Rust, so I may be wrong here.

@alexcrichton
Copy link
Member

Hm yeah it's true I'm not entirely sure what the vtable would look like. Would be good to clarify before we support the functionality!

@Amanieu
Copy link
Member

Amanieu commented Dec 28, 2016

&RcBox<Trait> and &Trait have the same vtable pointer: https://is.gd/QNfAIo

@ishitatsuyuki
Copy link
Contributor

I'm looking forward to have something like std::shared_from_this(C++), thus constructing a new Arc from a reference (probably &self).

@cristicbz
Copy link
Contributor Author

cristicbz commented Dec 29, 2016 via email

@ishitatsuyuki
Copy link
Contributor

The best thing is to take self as &Arc, but it's impossible due to the language design. This is common in building circular reference, which has been hard to do in safe area of Rust.

@Amanieu
Copy link
Member

Amanieu commented Dec 31, 2016

Hmm this might be trickier than I thought. How would you implement offset_of! on a DST?

@brson
Copy link
Contributor

brson commented Jan 6, 2017

I'm not ready to sign off on this until the DST implications as raised by @Amanieu are understood.

@Amanieu
Copy link
Member

Amanieu commented Jan 7, 2017

Here is an implementation of from_raw which works for both sized and unsized types:

    unsafe fn from_raw(ptr: *const T) -> Rc<T> {
        let fake_rc: Self = mem::transmute(ptr);
        let fake_rc_target = fake_rc.as_ref() as *const _ as *const u8;
        mem::forget(fake_rc);
        let raw_ptr = ptr as *const u8;
        let rc_ptr = raw_ptr.offset(raw_ptr as isize - fake_rc_target as isize);
        let mut result = mem::transmute(ptr);
        ptr::write(&mut result as *mut _ as *mut *const u8, rc_ptr);
        result
    }

@cristicbz
Copy link
Contributor Author

cristicbz commented Jan 7, 2017

@Amanieu That's great! My only worry is:

let fake_rc: Self = mem::transmute(ptr);
let fake_rc_target = fake_rc.as_ref() as *const _ as *const u8;

Doesn't fake_rc_target potentially point to unallocated memory, making it UB? If all's kosher, I'm happy to lift that into something reusable across Rc<T> and Arc<T> and move the {from,into}::raw impls to the ?Sized blocks.

@Amanieu
Copy link
Member

Amanieu commented Jan 7, 2017

@cristicbz Yes it's an invalid pointer, but it should be fine as long as it isn't dereferenced.

Here is an updated version of the code with more comments:

unsafe fn from_raw(ptr: *const T) -> Rc<T> {
    // Create a temporary fake Rc object from the given pointer and
    // calculate the address of the inner T.
    let fake_rc: Self = mem::transmute(ptr);
    let fake_rc_target = fake_rc.as_ref() as *const _;
    mem::forget(fake_rc);

    // Calculate the offset of T in RcBox<T>
    let rc_offset = (fake_rc_target as *const u8) as isize - (ptr as *const u8) as isize;

    // Get the address of the RcBox<T> by subtracting the offset from the
    // pointer we were originally given.
    let rc_ptr = (ptr as *const u8).offset(-rc_offset);

    // If T is an unsized type, then *const T is a fat pointer. To handle
    // this case properly we need to preserve the second word of the fat
    // pointer but overwrite the first one with our adjusted pointer.
    let mut result = mem::transmute(ptr);
    ptr::write(&mut result as *mut _ as *mut *const u8, rc_ptr);
    result
}

@cristicbz
Copy link
Contributor Author

cristicbz commented Jan 7, 2017

@Amanieu I'm never too clear on exactly under what conditions producing pointers to unallocated (or one past the end of an allocation) memory is fine. I thought using the GEP instruction in LLVM to produce one (which I think includes field access in rust) was UB (as per http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds)

@Amanieu
Copy link
Member

Amanieu commented Jan 7, 2017

The only other option is to make Rc #[repr(C)] and hard-code the offset to the inner T.

@cristicbz
Copy link
Contributor Author

Another option is to only support it for T: Sized (as currently implemented) until some kind of built-in offsetof support is implemented when it could be relaxed.

@cristicbz
Copy link
Contributor Author

Compiled in release mode, the optimised IR looks reasonable:

  %1 = getelementptr inbounds i32, i32* %0, i64 -4
  %2 = ptrtoint i32* %1 to i64
  ret i64 %2

The debug version is less fun to read because it actually invokes .as_ref(). But .as_ref() does appear to call getelementptr inbounds (https://is.gd/EMgsQZ) which, not being an LLVM expert, I think is UB.

@cristicbz
Copy link
Contributor Author

cristicbz commented Jan 7, 2017

@Amanieu I don't think we can hardcode the offset since different T-s might have different alignment requirements, right? Actually I don't really understand how DST-s work in something like RcBox<Debug>---doesn't the offset differ based on T's alignment? Does as_ref become dynamically dispatched somehow?

Edit: What sorcery is this: https://is.gd/2vbxPp? It looks like field offsets are stored in vtable somehow? But then &Trait and &RcBox<Trait> can't have same vtable! What am I missing?

@Amanieu
Copy link
Member

Amanieu commented Jan 7, 2017

@cristicbz The alignment is encoded in the vtable for unsized types. as_ref uses this information to calculate the offset of the Debug object in RcBox<Debug>.

We can still do this alignment in the offset calculation ourselves: mem::align_of_val will also load a value from the vtable for DSTs.

@rkjnsn
Copy link
Contributor

rkjnsn commented Jan 9, 2017

I posted this on the original PR, but here is probably better.

I find @cristicbz's reasoning for initially going with *const T for these convincing, and am not sure how we ended up with *mut T. @brson says "Arc's and Rc's are often filled with mutable things", but as far I as can tell, this is only true when T has interior mutability of some sort. (The exception is get_mut, which is only usable if no other references exist.) What am I missing?

@brson
Copy link
Contributor

brson commented Jan 9, 2017

@rkjnsn Your statement about my statement is correct. Sometimes these types directly point to mutable memory, so the pointer should be mutable. With the ambiguity about the semantics of *const and *mut I'm happy for someone to clarify definitively.

@Amanieu
Copy link
Member

Amanieu commented Jan 10, 2017

@brson I think that the main argument in favor of *const instead of *mut is that Rc and Arc only implement Deref and not DerefMut.

@rkjnsn
Copy link
Contributor

rkjnsn commented Jan 11, 2017

@brson, I seem to recall *const versus *mut raw pointers being more of a hint than any kind of hard rule. Certainly, *const is not meant to indicate that the contents can't be mutated by anyone the way & refs are (with the exception of internal mutability). Clones of an Rc gives out & refs to anyone who wants one, and code holding such a reference (and potentially the optimizer down the line) is supposed to be able to depend on the referenced object not being modified while they hold that reference unless there is internal mutability. Giving out a *mut pointer to something that may well have live & refs seems like a terrible footgun, and seems completely unnecessary given that nothing precludes mutating through a *const pointer if one ensures that it is safe to do so.

@rfcbot
Copy link

rfcbot commented Feb 13, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Feb 13, 2017
@rkjnsn
Copy link
Contributor

rkjnsn commented Feb 13, 2017

I would like to reiterate that I am strongly opposed to these stabilizing using *mut instead of *const. *mut creates a serious foot gun for no benefit versus *const, and violates the semantic hint provided by *mut versus *const.

@cristicbz
Copy link
Contributor Author

@rkjnsn I guess that's what FCP is for.

I still agree with you and am happy to write a quick PR to change the return type if @brson and @alexcrichton are persuaded. For me *const T is for unsafe &T and *mut T is for unsafe &mut T. Rc gives out &T-s so I'd expect it to give *const T for unsafe methods.

For what it's worth, I don't buy the argument about people putting mutable things into an Rc. To me, that's an argument about what RefCell::as_ptr should return, not what Rc should. In fact, as it happens, my use case is about shared immutable buffers (no Cell-s). If you've got an x: Rc<RefCell<T>>, I think unsafe { (*Rc::into_raw(x)).as_ptr() } is a reasonable thing to do to get a *mut T.

@alexcrichton
Copy link
Member

Ah yes that's what FCP is for to figure out details such as this.

So to elaborate on my opinion again first of all I'd like to clarify that *mut T and *const T are warnings at best. I don't believe there's any guarantees associated with them and otherwise are simply advisory and/or type-level distinctions. In that sense making a "wrong" choice here to me is quite minor, it has no practical implication other than it may be slightly annoying for someone at some point.

With that in mind I still believe that *mut is the best choice here. I agree that Rc only hands out & pointers and therefore *const may seem like a natural fit, but @brson is right in that Rc quite commonly (I'd say almost always) contains interior mutability in T itself and is guaranteed to do so right next to T (e.g. the refcounts).

In general we have lots of precedent with *mut along with types like Cell and RefCell. In general it's a marker saying "it's ok to mutate this so long as you externally make sure it's ok". The same statement is true for Rc where it's ok to mutate it so long as you coordinate with other instances in the thread.

@cristicbz
Copy link
Contributor Author

So to elaborate on my opinion again first of all I'd like to clarify that *mut T and *const T are warnings at best. I don't believe there's any guarantees associated with them and otherwise are simply advisory and/or type-level distinctions. In that sense making a "wrong" choice here to me is quite minor, it has no practical implication other than it may be slightly annoying for someone at some point.

Full ack, that's why I didn't put up any kind of "fight" on the original PR.

In general we have lots of precedent with *mut along with types like Cell and RefCell. In general it's a marker saying "it's ok to mutate this so long as you externally make sure it's ok". The same statement is true for Rc where it's ok to mutate it so long as you coordinate with other instances in the thread.

So maybe I'm missing something here, but I see two cases for an Rc<T>:

  1. T doesn't contain an UnsafeCell. I thought that in this case "it's ok to mutate this so long as you externally make sure it's ok" is untrue. In particular that it would be UB just as much as mutating through a &T regardless of what I do externally (even if I ensure that &T is unique, for instance because the compiler is allowed to skip a load).
  2. T contains an UnsafeCell. Then, I can use the as_ptr methods to get a *mut on, say, Cell or RefCell (or even an UnsafeCell).

The reason we have lots of precedent with *mut for Cell and RefCell, to me, is because those types contain an UnsafeCell.

Rc is not like Cell or RefCell, I would expect mutating the value in an Rc<bool> to be UB regardless of what guarantees I uphold externally, while mutating Rc<Cell<bool>> is cool, but then I'd get my *mut bool from the Cell's as_ptr method, not from Rc's API.

@rkjnsn
Copy link
Contributor

rkjnsn commented Feb 14, 2017

For me, the fact that *mut versus *const is mostly a hint is an argument for using *const: if *const doesn't preclude any usages, why not provide the proper hint?

It is true that many usages of Rc involve internal mutability, but my no means do all of them. More importantly, internal mutability of the contained type is in no way inherent.

While *mut and *const don't provide any guarantees, & refs do, and one of those guarantees is that no data behind an & reference can change unless it is contained in an UnsafeCell. Both the compiler and unsafe code are free to rely on this for soundness. Giving out a *mut pointer to memory that likely has live & references is thus quite reckless. *const is the right choice here. The author of unsafe code can still perform mutation, but they have a hint that extra precautions are necessary.

The documentation for UnsafeCell states very clearly:

The UnsafeCell type is the only legal way to obtain aliasable data that is considered mutable.

The contents of an Rc pointers are inherently aliasable, which means that the data should not be considered mutable. Even if the data contains an UnsafeCell, it is only the data inside that UnsafeCell that can be considered mutable, and, as @cristicbz notes, Cell, RefCell, and UnsafeCell already provide methods to obtain a *mut to their inner data.

If the author of safe code ensures there are no aliases and none can be created while mutation is possible (as Cell::get_mut does), they can easily cast the *const to *mut, and that cast provides the ideal spot to document why it is sound to do so in that particular case. If they don't document the cast, or are mutating in an unsound way, the cast makes the potential error much easier to spot in code review.

@alexcrichton
Copy link
Member

The point about UnsafeCell is a good point, and it got me thinking a bit. FWIW @cristicbz I believe your point (1) about UB is correct (at least from my naive understanding).

So #37192 was using *const for the whole time until @brson noticed the inconsistency with the underlying abstraction, Shared. This pointed me back to when Shared was added and reminded me that *const T vs *mut T is also subtly different in terms of variance. IIRC *mut T is invariant and *const T is the variance that allows smaller lifetimes.

I definitely agree that it seems odd to return *mut T from Rc and Arc when otherwise you can only get a shared "equivalent to *const T reference. @brson has a very good point though that we don't buy much from trying to be principled in one place and not in another. I vaguely recall Shared returning *mut T being important, but at this time I can't recall why.

A naive attempt at a reproduction failed, but I'd be worried about a case where:

  • Rc<T> is actually invariant (e.g. lifetime shouldn't change)
  • Rc::into_raw returns *const T
  • Because *const T isn't invariant (I think) lifetimes can change
  • Later on you use Rc::from_raw with the altered lifetime, creating unsoundness

Is such a scenario possible? Or am I just thinking of something totally different?

@cristicbz
Copy link
Contributor Author

@alexcrichton Hadn't thought about variance, why is Rc invariant? It seems to me like it I should be able to coerce an Rc<&'a T> to an Rc<&'b T> with 'a: 'b; I'm sure there's a good reason, but questions of variance always seem to go over my head.

@cristicbz
Copy link
Contributor Author

@alexcrichton Acutally, it looks like Rc is covariant AFAICT: https://is.gd/hqihcD !

(which makes sense to me, it's just like a shared-ref, Rc<Cell<T>> isn't covariant because of Cell not because of Rc)

@alexcrichton
Copy link
Member

Oh yes Rc itself is the "allow shorter lifetimes" variant, and Cell<T> is the invariant one. The Shared type hands out *mut pointers which are invariant while it stores a *const pointer which is covariant (to ensure the type itself is covariant).

I would want to be sure that if we choose *const we don't accidentally open ourselves to covariance holes. I suspect there's a reason that Shared hands out *mut pointers, but I don't recall why.

@rkjnsn
Copy link
Contributor

rkjnsn commented Feb 15, 2017

As I understand it (forgive me if I am mistaken in here somewhere), Shared holds a *const T, so it is covariant if T is. Thus it would be trivial to get a *mut T with a different lifetime just by coercing a Shared instance before calling deref. Since Shared is copy, it is trivial to store a value through pointer with a shorter lifetime and read it back out via the the pointer with a longer lifetime. This is bad! (Yes, this requires unsafe code, but it's a serious footgun, and I believe preventing this type of footgun is the reason *mut is considered by the compiler to be invariant.)

Further, since Shared is Copy (because it is designed to be shared), that means, like Rc itself, the data pointed to should not be considered mutable except for parts contained in an UnsafeCell.

Finally, neither Rc nor Arc have any need for Shared to hand out *mut pointers during active usage, since the counts themselves are already contained in UnsafeCell (as they must be for soundness, with Rc using Cell and Arc using AtomicUsize)

It seems to me, then, that Shared should also definitely be handing out *const and not *mut pointers. The data pointed to is meant to be shared and thus shouldn't be considered mutable; Shared is covariant, making handing out *mut pointers even more dangerous; and neither Rc nor Arc need *mut for their shared operations because they only mutate through UnsafeCell.

If I had to speculate, I would guess that Shared returns a *mut pointer because when the last reference is destroyed, Rc and Arc need a mutable pointer to execute drop. However, I think a cast from *const to *mut should be necessary here as a way to assert, "I know I am currently the sole owner of this data."

EDIT: Rather, I would speculate that Shared returns a *mut because it was replacing NonZero<*mut RcBox>, and that that was originally *mut because that's what Box::into_raw returns and because that's what drop needs. That is to say, it seems like a historical accident, and I would argue that both Shared::deref and Rc::into_raw should return *const

@alexcrichton
Copy link
Member

That seems like a reasonable explanation to me, and if we change Shared as well I think it makes sense to change Rc at the same time.

@brson do you have other specific thoughts here?

@Amanieu
Copy link
Member

Amanieu commented Feb 15, 2017

So what is the consensus regarding support for DSTs? We could start by only implementing into_raw and from_raw for sized types, and later extend them to DSTs.

@brson
Copy link
Contributor

brson commented Feb 15, 2017

I'm not very enthusiastic about stabilizing this with so much confusion about the proper type.

The questions about variance worry me too. If there are variance concerns here let's add tests for them. This all feels risky.

@cristicbz
Copy link
Contributor Author

So #29110 added Shared specifically to make Rc and Arc covariant; looking at that PR it seems that @rkjnsn is right about *mut being due to it replacing a pointer obtained via Box::into_raw.

@brson What sort of tests would you like added? For the variance of Rc and Arc (define some noop fn-s like I did in the above playground link)? I can't think of any way to test it specifically around the new {into,from}_raw methods.

@Amanieu DST-wise yeah I thought that's sort-of where we left it. For DST-s it's a slightly bigger can of worms (see #37197 (comment)) and I'd be too worried about potential UB without an offsetof built into the compiler. Like you said, it will be trivially backwards compatible to extend this to DST-s when we're happy.

@rfcbot
Copy link

rfcbot commented Feb 23, 2017

The final comment period is now complete.

@cristicbz
Copy link
Contributor Author

Since the FCP is over, I'll sum up where I stand:

  1. I'm not worried about variance here: since Rc is covariant, I feel like you can't end up being over-permissive.
  2. I prefer *const T, but I don't massively care. Mutating the data behind the pointer obtained through into_raw is generally UB. If T is a cell, the as_ptr method on the cell is the correct way to get a mutable pointer.
  3. I'd punt on DST-s until there's a built-in offsetof. They're easy to add later backwards-compatibly.

@rkjnsn
Copy link
Contributor

rkjnsn commented Feb 25, 2017

My summary:

  1. I agree that *const being covariant isn't an issue given that Rc is.
  2. I am strongly opposed *mut here and think *const is clearly the way to go. *mut is dangerous both because Rc's covariance would make it too easy to get mut pointers to the same data with different lifetimes and because muting shared data without UnsafeCell is in violation of Rust‘s guarantees and likely to result in unsound behavior. In cases where it is guaranteed that ownership is exclusive, it's easy enough to cast to *mut (with a comment explaining why it's sound in that instance).
  3. I agree that it makes sense to get this in for Sized types and then figure out DSTs.

@alexcrichton
Copy link
Member

Ah so to clarify the FCP ending here is just the standard one week period. In reality it'll "end" a week before the end of the cycle when we merge this.

I'm leaning towards changing to const, pending input from @brson.

@alexcrichton
Copy link
Member

Discussed during @rust-lang/libs triage today, the conclusion was to go with *const with these APIs and update Shared at the same time. If no one else gets around to it sooner I'll update when I do the blanket "stabilize everything" PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants