Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Clone-less converter from Arc to Vec #89215

Closed

Conversation

HeroicKatora
Copy link
Contributor

@HeroicKatora HeroicKatora commented Sep 24, 2021

Adds a new converter to Arc<T: ?Sized>: try_unwrap_as_box(self) -> Result<Box<T>, Self>
and its counterpart Rc<T: ?Sized>: try_unwrap_as_box(self) -> Result<Box<T>, Self>

This method is a safe encapsulation when one wants to extract the (unsized) items behind an Arc into an owned allocation. For example, after a computation phase where a slice of items had been shared with workers. None of the existing methods allow taking ownership of an unsized pointee. The only current alternatives are downcasting to a concrete sized types, or in that specific case of a slice a more expensive cloning of the entire slice which requires an additional bound and might invoke arbitrary code.

Alternatives:

  • The implementation detail leak_as_owning_weak is the 'necessary' operation for performing this in 'userspace'. Concretely, it contains the minimal code around the CAS(1, 0) that moves ownership of the strong count to the caller. It's been factored out as a shared pattern between try_unwrap and the new interface. It could in theory be used by non-std code to perform the same operations or even different ones if it were made pub.
    For example user code might provide Deref<T> and DerefMut<T> in-place by keeping the Weak. This requires a proper newtype that also could be added to std but is another non-trivial addition, I suppose. That pattern alone does not solve moving the pointee to a properly owned Box though.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 24, 2021
@rust-log-analyzer

This comment has been minimized.

@SkiFire13
Copy link
Contributor

Isn't this a bit limited if it works only for Arc<[T]> -> Vec<T>? I would prefer having a try_unwrap_boxed(self) -> Result<Box<T>, Arc<T>> that works similar to try_unwrap, except it works for T: ?Sized since it produces a Box<T>. Then this can work for T=[U], T=str, T=Path, T=OsStr ecc ecc thanks to the various implementations of From<Box<_>>

Also, shouldn't Rc also get a similar method?

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Sep 24, 2021

Also, shouldn't Rc also get a similar method?

Very likely. It's simply that I had the implementation for Arc already around (and that's the more complicated one anyways).

Isn't this a bit limited if it works only for Arc<[T]> -> Vec<T>?

You're probably right. I did paint myself into a bit of a corner with the idea of possibly having a zero-allocation conversion by moving the elements to overwrite the ArcInner—leaving the remaining space for extra capacity. This idea can only work for Vec<_> and not Box<_> but it is very much over-engineered for the implementation of this specifically.

Another idea—although with significant additional API surface—would be to make a concrete new type BoxArc<T: ?Sized> to wrap around the Weak<_> from leak_as_owning_weak, with the semantics of owning the allocation without having modified or moved the data in any way. The different conversion, to Box<T: ?Sized>, Vec<_>, or even drain could then be added on that type Maybe this PR should be dropped in favor of such an idea (that definitely warrants a major library change proposal though).

@HeroicKatora
Copy link
Contributor Author

Maybe in the first step I will reduce this PR to leak_as_owning_weak. This encapsulates the key soundness issue of taking ownership which then allows building the missing abstractions in library space much, much more easily. (Well, except of course the zero-allocation approach because that one specifically would require relying on std's layout of *Inner<T>).

@SkiFire13
Copy link
Contributor

possibly having a zero-allocation conversion by moving the elements to overwrite the ArcInner— leaving the remaining space for extra capacity

This feels very niche. For this to work I think you would need:

  • no outstanding Weak reference;
  • the align of T to be at least 8;
  • the size of T to be a divisor of 16 or be the same as its alignment.

@HeroicKatora
Copy link
Contributor Author

HeroicKatora commented Sep 24, 2021

Okay, we don't need to venture into that niche. So which direction seems most promising to you:

  • Provide safe leak_as_owning_weak and punt the rest to more PRs.
  • Directly propose a wrapper {Arc,Rc}Box<_>.
  • Change the method to returning Box<T: ?Sized>.

@HeroicKatora
Copy link
Contributor Author

Change the method to returning Box<T: ?Sized>.

I just realize that this isn't really feasible. Doing so would require allocating a place with the correct layout and then duplicating the pointer meta data, in the process of which we are asserting that it is valid to do so and that this produces a pointer that is valid to dereference. That's true for the existing fat pointers (slice, dyn) but I don't think it's been fully discussed. I don't feel comfortable at all with adding a new allocation method to Box for the sake of this PR. So I would go with the first option?

@SkiFire13
Copy link
Contributor

SkiFire13 commented Sep 24, 2021

Provide safe leak_as_owning_weak and punt the rest to more PRs.

In my opinion this leaks more implementation details than necessary. For example currently there's no public concept of an "owning Weak", nor it's considered safe to access the value pointed by a Weak whose strong count is 0.

Directly propose a wrapper {Arc,Rc}Box<_>.

Adding a new type for the sake of this PR seems overkill to me.

Doing so would require allocating a place with the correct layout and then duplicating the pointer meta data, in the process of which we are asserting that it is valid to do so and that this produces a pointer that is valid to dereference. That's true for the existing fat pointers (slice, dyn) but I don't think it's been fully discussed.

This is already done in the opposite direction by the implementation of From<Box<T>> for Rc<T> and Arc<T> (with T: ?Sized), so I don't think it would be that problematic to do the opposite.

I don't feel comfortable at all with adding a new allocation method to Box for the sake of this PR.

You can use Box::from_raw with pointers obtained by allocating with the Global allocator. This is publicly documented in the "Memory layout" section of the std::boxed module.

@HeroicKatora HeroicKatora force-pushed the vec-from-reference-counted branch from 4a52be2 to d171a37 Compare September 24, 2021 14:57
@HeroicKatora
Copy link
Contributor Author

I've adjusted everything according to our conversation.

Adding a new type for the sake of this PR seems overkill to me.

Maybe, however I don't fancy contributing to the proliferation of _in methods and duplicate methods to handle allocation failure either. That type would have a significant advantage of separating the point of failure to acquire ownership from the callsite where the allocation can fail—two different errors and two different error sites.

This is already done in the opposite direction by the implementation of From<Box> for Rc and Arc (with T: ?Sized), so I don't think it would be that problematic to do the opposite.

To learn about those pointer operations being allowed through this channel makes me uncomfortable 😬 Anyways, then that concern of course disappears.

@rust-log-analyzer

This comment has been minimized.

@HeroicKatora HeroicKatora force-pushed the vec-from-reference-counted branch from d171a37 to 4ef5f32 Compare September 24, 2021 16:52
@rust-log-analyzer

This comment has been minimized.

Adds methods try_unwrap_as_box that work similar to try_unwrap but
instead of reading the value, they move it a separate allocation.
This allows unwrapping unsized values as well. By extension this
can be used for converting in a vector of slices.
@HeroicKatora HeroicKatora force-pushed the vec-from-reference-counted branch from 4ef5f32 to 7e0034b Compare September 24, 2021 18:05
@HeroicKatora
Copy link
Contributor Author

Okay, I had messed up the combination between no_global_oom_handler and the new feature gate a bunch of times but it should be ready now.

@Mark-Simulacrum
Copy link
Member

Can you update the PR description/title with a summary of the current state of this PR? As far as I can tell, the proposal is no longer to add the Arc -> Vec conversion, but rather something different.

It looks like this is now adding an API for converting Arc<T> to Box<T> -- but this movement will always require a copy of the contained value, right? Can you say more about what the use cases (ideally real, not theoretical) for such an API would be?

In particular, a "userspace" library could be written that verifies the passed Arc is unique (via get_mut) when taking ownership of it and then provides &T and &mut T access with no overhead, via the Deref and get_mut_unchecked APIs on the internal Arc. That would require a little unsafe code, but nothing all that interesting.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 2, 2021
@SkiFire13
Copy link
Contributor

I'm not OP, but I followed the evolution of this PR, and:

It looks like this is now adding an API for converting Arc to Box -- but this movement will always require a copy of the contained value, right?

It kinda depends with what you mean with "copy". There's a memcpy, but it's not a clone, so I would call it a move. Note that this is equivalent to what happens when converting Box<T> to Arc<T>.

Can you say more about what the use cases (ideally real, not theoretical) for such an API would be? In particular, a "userspace" library could be written that verifies the passed Arc is unique (via get_mut) when taking ownership of it and then provides &T and &mut T access with no overhead, via the Deref and get_mut_unchecked APIs on the internal Arc. That would require a little unsafe code, but nothing all that interesting.

There are two differences between the PR proposal and yours:

  • the PR allows to get common types (Box, which can then be turned in a Vec ecc ecc), while yours requires an additional type that doesn't play well with the rest of the ecosystem
  • your approach will fail if there are outstanding Weak, while this PR will just disassociate them (this might need more documentation though). AFAIK this is not something that currently be done in "userspace" code.

@HeroicKatora
Copy link
Contributor Author

Can you say more about what the use cases (ideally real, not theoretical) for such an API would be

It's an inverse of the existing Box<_> -> Arc<_> conversion which wasn't possible for unsized types before. The use case is still the same as in the Vec case (converting the box to vec is almost zero-cost as observed).

In particular, a "userspace" library could be written that verifies the passed Arc is unique […]

No. There is no way for a userspace library to emulate atomically performing the last CAS (1, 0) on the strong count other than try_unwrap which does not work for unsized types (nor for very large types as it also reads them into the stack but this is secondary). The only other comparable function is make_mut which performs another allocation of its own and requires the T: Clone bound. It might be possible for Rc but that still involves a highly non-trivial amount of reasoning for keeping the correct count alive while forgetting the value at the right time etc.

@HeroicKatora
Copy link
Contributor Author

I've updated the description to reflect the changes to the code and some parts of the discussionl

@apiraino apiraino added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 14, 2021
@bors
Copy link
Contributor

bors commented Oct 20, 2021

☔ The latest upstream changes (presumably #90067) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

Ping from triage:
@HeroicKatora If this PR is ready to review, adjust the label with @rustbot ready . Thanks.

@JohnCSimon
Copy link
Member

Ping from triage: I'm closing this one due to inactivity, please feel to reopen when you are ready to continue with this.

@rustbot label: +S-inactive

@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Dec 5, 2021
@JohnCSimon JohnCSimon closed this Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants