Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the uniqueness guarantees of Box and Vec? #326

Open
RalfJung opened this issue Apr 11, 2022 · 61 comments
Open

What are the uniqueness guarantees of Box and Vec? #326

RalfJung opened this issue Apr 11, 2022 · 61 comments
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows)

Comments

@RalfJung
Copy link
Member

RalfJung commented Apr 11, 2022

Box and Vec have internally used the Unique pointer type since Forever (TM). The intention was that that gives some some special aliasing rules. Box is even emitting noalias to LLVM, so we might have optimizations already exploit those aliasing rules.

However, it looks like none of this is actually surfaced in the documentation (Cc @saethlin), leading to a significant risk that code out there might violate whatever the aliasing rules end up being.

For whatever it's worth, Miri with Stacked Borrows does treat Box rather similar to &mut, but does not do anything special with Vec. I have ideas for how the Next Aliasing Model can treat Unique special (so Box itself no longer needs to be special, at least if we decide to "look into" private fields when applying aliasing rules which is its own whole separate discussion), bot nothing concrete yet.

For each of these types we basically have two options:

  • Keep going with the intention of making them have aliasing guarantees. Then we should probably put something into the docs. This might be considered a breaking change by some.
  • Give up on this plan (and, in the case of Box, remove the noalias and the special treatment in Miri). This will make some people very sad. Cc @Gankra @nikomatsakis
@RalfJung RalfJung added the A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) label Apr 11, 2022
@chorman0773
Copy link
Contributor

I think even that if Unique doesn't have alias rules, it's existance is useful. It's a NonNull owning raw pointer, and (aside from any aliasing considerations) can be written in user code.

@Gankra
Copy link

Gankra commented Apr 11, 2022

Unique doesn't "have" to have special semantics. It exists in case it's useful for it to, and to mark all the types that are "like box" if we ever figure out ways to make box magic more available to other types.

I don't have a good sense for how useful it is to make Box "smarter" but certainly we have a lot of anecdotal evidence that it leads to surprises/sadness. In particular I've been told that just moving a Box apparently invalidates even raw internal pointers, which, sounds miserable. Like I can get it from the strictest interpretation of "it's as if it was a local var" and certainly I wouldn't expect borrowck to ever let you get that cute, but it seems like unsafe code should be allowed to rely on Box/Vec having stable addresses since that's like, one of the major features of indirection!

@RalfJung
Copy link
Member Author

RalfJung commented Apr 11, 2022

Okay -- I don't really care if Unique exists or not when it doesn't have special semantics, so I removed that part from the OP.

I remember optimizing some tight Vec loops being a prime motivation for making Unique special (specifically, this might let the compiler know that the length field does not change, or some such thing).

@RalfJung
Copy link
Member Author

I don't have a good sense for how useful it is to make Box "smarter" but certainly we have a lot of anecdotal evidence that it leads to surprises/sadness. In particular I've been told that just moving a Box apparently invalidates even raw internal pointers, which, sounds miserable. Like I can get it from the strictest interpretation of "it's as if it was a local var" and certainly I wouldn't expect borrowck to ever let you get that cute, but it seems like unsafe code should be allowed to rely on Box/Vec having stable addresses since that's like, one of the major features of indirection!

I can imagine relaxing the Box rules by a lot, but still keeping some special status. (That would probably happen with the Next Aliasing Model.)

@saethlin
Copy link
Member

Just to make sure my position is clear here, I am in favor of removing noalias on Box. I'm thinking very much from the perspective of a Rust user, not a Rust implementer, so since the type Unique is an implementation detail I don't really have an opinion on it.

I think that noalias on Box it is distinctly unlike noalias on &mut, which has long been communicated to the community, in the vague terms that there are special optimizations possible on &mut due to its uniqueness guarantee. I never heard of such a thing for Box until I learned about Stacked Borrows. So I think if we need to declare current code UB under a formal memory model in order to provide the promised &mut optimizations, there's an argument that this is a resolution of conflicting promises that Rust made: stability versus this extra optimization power. But that argument does not apply to Box. As far as I can tell, the properties of Unique is something that Rust contributors have kept entirely to them/ourselves. So adding UB here is completely unexpected.

And I want to call out that under Stacked Borrows, if a pointer type behaves like &mut, it doesn't only provide aliasing guarantees as a function argument which is what noalias is traditionally focused on, from the perspective of a user, Box loses address stability. In particular, this program is UB:

fn main() {
    let b = Box::new(0u8);
    let ptr = &*b as *const u8;
    let _a = b;
    unsafe {
        println!("{}", *ptr);
    }
}

Exactly why it is UB depends on if you are considering Stacked Borrows with or without raw pointer tagging. But in any case, this is deeply surprising. What else is "A pointer type for heap allocation." (that's how the current docs describe it) good for, if not address stability?

Also the std::boxed and Box documentation does not mention the word "alias" and in only 2 places does it even mention the word "unique", and in that case it is used in the sense of the C++ std::unique_ptr's uniqueness: If you have 2 instances of this type, they point to different allocations.

@RalfJung
Copy link
Member Author

And I want to call out that under Stacked Borrows, if a pointer type behaves like &mut, it doesn't only provide aliasing guarantees as a function argument which is what noalias is traditionally focused on, from the perspective of a user, Box loses address stability. In particular, this program is UB:
[...]
Exactly why it is UB depends on if you are considering Stacked Borrows with or without raw pointer tagging. But in any case, this is deeply surprising. What else is "A pointer type for heap allocation." (that's how the current docs describe it) good for, if not address stability?

What if the Next Aliasing Model made that not UB (by not doing "retagging" on Box as often as we do that on reference)? There is a huge design space of possible aliasing restrictions, and what Miri currently implements is among the most strict possible. The goal was precisely to get feedback about whether that is too strict and which patterns should be allowed by relaxing the rules.

With regards to keeping noalias or not, the following is most likely UB under LLVM noalias, so if that is also considered "too much UB" then indeed it seems best to just remove all aliasing restrictions entirely:

fn fun(x: Box<i32>, y: *mut i32) -> i32 {
  unsafe { *y = 5; }
  *x
}

let mut b = Box::new(16);
let ptr = *mut *b;
fun(b, ptr);

@saethlin
Copy link
Member

saethlin commented Apr 11, 2022

I wish I knew if such code is written, because I suspect all crates that arrive in this situation are UB according to Stacked Borrows before they get to this call 😩

But from a more theoretical perspective, it's not clear to me that any author of Rust code would expect that function to be UB. I think we've done a pretty good job educating people via the docs that Box is just an RAII allocation. Such a change to retagging does get us out of the situation where it seems like Box isn't a stable pointer. So it might decrease the disbelief people experience when learning that Stacked Borrows thinks this code is invalid, but I'm still at a loss for how to explain to people how they could have possibly known that this code isn't valid.

@Gankra
Copy link

Gankra commented Apr 11, 2022

IMO it might be tolerable for a deref through the box to potentially invalidate pointers (ideally under looser rules based on the compiler "understanding" box and being able to do disjoint reborrows... so maybe it's semantically identical to &mut?), but moving the Box having side-effects is "too weird" without a really good explanation of why that might be problematic to allow.

I would tbh be fine with totally dropping noalias here, but I am not a Performance person.

@CAD97
Copy link

CAD97 commented Apr 11, 2022

Just leaving my 2¢:

Much existing code does treat Box as a way to store a ptr::NonNull<T> which deallocs on drop, and then move it around while storing/using pointers into the heap allocation. (StableDeref, for all of its flaws, exists to encode this usage.) If at all possible, we should avoid making such usage UB.

However, usage that interleaves Box derefs with using internal pointers I don't think exists. At the very weakest, even though dereffing Box is a builtin, I think it's 100% safe to say that semantically it goes through the Deref[Mut] trait, thus applies a Retag (invalidating outstanding pointers via reference rules) when dereferenced.

However, I have a hard time saying that Ralf's example can be UB. I don't recall exactly how SB recurses into reference fields; if it wouldn't be UB with a definition of struct Box(&'static mut T), then it shouldn't be UB with the real Box.

Making moving a box into a function special, but allowing it when moving around in a single stack frame seems like a horrible state, because then refactoring out a function semantically changes the AM semantics.

My personal conclusion is that Box itself should not be noalias, and just apply retagging when derefferenced. (A shared retag when dereferenced to a shared place, a unique Retag when derefferenced to a unique place.)


It gets even more interesting with Vec, as we loosely imply that internal pointers are only invalidated when the Vec is reallocated. I believe making Vec unique would directly contradict this implication, at least if it ever retags stored bytes not directly being modified by a Vec operation.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 11, 2022

However, I have a hard time saying that #326 (comment) can be UB. I don't recall exactly how SB recurses into reference fields; if it wouldn't be UB with a definition of struct Box(&'static mut T), then it shouldn't be UB with the real Box.

It's an open question (Cc #125). We can totally satisfy your axiom by making the Newtype(&'static mut T) version equally UB. ;)

@saethlin
Copy link
Member

I am not a Performance person.

I kind of am. Or at least I fancy myself as. Or I try to be.

I just grabbed a few crates and tried running their benchmarks with -Zmutable-noalias=no then switching to -Zmutable-noalias=yes. On tinyvec and smallvec I see everything between a 37% improvement and a 139% regression. On nom I see between a 2% improvement and an 18% regression. On bincode I see improvements from 0% to 54%. I picked these crates because I wrote the benchmarks for tinyvec/smallvec and bincode, and someone suggested nom.

I looked at the regressions in smallvec and I cannot make any sense of them. The instructions in the hot part of the benchmark loop are almost entirely unchanged, but the structure of the benchmark loop surrounding them is rather different.

The changes in nom weren't big enough to be interesting.

The huge change in bincode definitely has to do with aliasing optimizations. It could be very interesting to keep an eye on that, because the code that basically vanishes from the profile is BufReader::buffer, and BufReader is implemented with a Box<[u8]>.

But overall the fact that we see such colossal regressions on microbenchmarks from providing more information to the compiler makes me question the wisdom of using current LLVM's ability to take advantage of noalias to make decisions about where to place it. It is possible that all those regressions will disappear.


It would be super cool if they were fixed soon, but I do not have the patience at the current time to minimize TinyVec or SmallVec into some LLVM IR that can be submitted in a report.

@chorman0773
Copy link
Contributor

chorman0773 commented Apr 12, 2022

Actually, noting something for lccc's model specifically, it has

An implicit derive operation occurs in the following contexts:

  • ...
  • In a call instruction, when any parameter or corresponding argument is of a pointer type with a non-trivial validity attribute
  • In an exit instruction, when returning from a function with a value of a pointer type with a non-trivial validity attribute or to a call site with a return type of such a pointer type.
  • ...
    This only applies to xir pointer types - Box<T> is represented as a named type that refers to an aggregate definition. As far as lccc is concerned, this is true of all newtypes, with the exception of the internal (and unstable) Unique - which has an internal attribute that get's the frontend to translate the type in fields (although this doesn't affect calls, only loads and stores).

Of course, since rustc translates #[repr(transparent)] far earlier (vs. lccc, that currently handles them all the way in the backend), it may have more interesting concerns wrt. #[repr(transparent)] pub strucct Foo(&'static mut i32);.

(For those who care: a derive operation is xlang's generalization of an SB reborrow - strictly speaking it produces a new pointer from some computation on an existing one, asserting it's non-trivial validity, such as it's valid range or aliasing behaviour, in the process).

@nikomatsakis
Copy link
Contributor

Interesting. To my mind, Box<T> is "claiming ownership" of its contents in much the same way that an &mut and friends do. I'll have to think more about it, but my kind of "rough take" on stacked borrows was that "safe types get their safe guarantees, and you use raw pointers otherwise". From that, I would expect that a function which takes a Box<T> as argument can assume, yes, that others are not accessing that memory.

That said, I also see that it would be nice to be able to allocate a box and move it around as a way to "remember" that the pointer needs to be freed later.

Tying the "uniqueness" to actually using the box may indeed be a reasonable compromise.

@Lokathor
Copy link
Contributor

I've definitely used box as a way to say "handle the stupid allocation crap for me", but otherwise I've wanted only the pointing part and not the uniqueness and invalidating other pointers and all that stuff.

However, we could just add some simpler to use methods to our allocation system and "solve" the problem that way.

@saethlin
Copy link
Member

That said, I also see that it would be nice to be able to allocate a box and move it around as a way to "remember" that the pointer needs to be freed later.

The standard library doesn't provide an AliasableBox, so authors of unsafe code who want this behavior have used Box. Not only as a way to remember that the pointer needs to be freed, but as a way to allocate and deallocate (I don't mean to pick on tokio here, it's just what is at the top of my mind):
https://github.com/tokio-rs/bytes/blob/724476982b35f094b59d160ecc02c042082ac604/src/bytes.rs#L942-L945

unsafe fn rebuild_boxed_slice(buf: *mut u8, offset: *const u8, len: usize) -> Box<[u8]> {
    let cap = (offset as usize - buf as usize) + len;
    Box::from_raw(slice::from_raw_parts_mut(buf, cap))
}

https://github.com/tokio-rs/tokio/blob/252b0fa9d557c241d64d8fcd95747a288429da00/tokio-util/src/sync/cancellation_token.rs#L551-L565

    fn decrement_refcount(&self, current_state: StateSnapshot) -> StateSnapshot {
        let current_state = self.atomic_update_state(current_state, |mut state: StateSnapshot| {
            state.refcount -= 1;
            state
        });

        // Drop the State if it is not referenced anymore
        if !current_state.has_refs() {
            // Safety: `CancellationTokenState` is always stored in refcounted
            // Boxes
            let _ = unsafe { Box::from_raw(self as *const Self as *mut Self) };
        }

        current_state
    }

@Diggsey
Copy link

Diggsey commented Apr 12, 2022

This is something that could be addressed in an edition though, since the compiler can choose to emit noalias on a crate-by-crate basis. We could provide better tools for unsafe code (eg. AliasableBox) for crates on new editions to use.

@Lokathor
Copy link
Contributor

Lokathor commented Apr 12, 2022

Can't we provide better tools in all editions? There's no obvious reason for AliasableBox to not be available in all editions.

@Diggsey
Copy link

Diggsey commented Apr 12, 2022

Of course we can, but we'd need to keep existing code that uses Box working on old editions.

@Lokathor
Copy link
Contributor

Lokathor commented Apr 12, 2022

So wouldn't Box always be noalias, in all editions? And then AliasBox would be a new type people could that's also in all editions? And code could transfer to AliasBox if that's what they actually intend?

It just seems exceedingly confusing to make such a subtle change over an edition when we know it's specifically affecting unsafe code. However it's set up, it should be the same across all editions.

@Diggsey
Copy link

Diggsey commented Apr 12, 2022

Well the problem with that is the examples of existing code that is only correct when Box is not noalias, but that happens to work today.

@saethlin
Copy link
Member

saethlin commented Apr 12, 2022

I worry about that kind of characterization, because that could apply to any increase in information that rustc feeds to LLVM. We shouldn't block all additions of LLVM attributes behind an edition, just because they may produce surprising compilation results for some users. That is why I am focusing on whether or not we should expect users to be familiar with semantics that are communicated to LLVM.

@saethlin
Copy link
Member

The huge change in bincode definitely has to do with aliasing optimizations.

I did this same experiment but with the code that puts noalias on Box removed. I see the same improvements in the bincode benchmarks using BufReader. So whatever is happening there, it looks like noalias on &mut is sufficient.

@nico-abram
Copy link

Would it be useful/make sense to add an unstable flag to disable noalias on everything but &mut?

@CAD97
Copy link

CAD97 commented Apr 26, 2022

While I'm thinking about it: adopting something like the storages API would allow that Box<T, Global> (the current, and the default) could use non-noalias semantics with a handle of ptr::NonNull<T> and Box<T, GlobalUnique> could use noalias semantics with a handle of ptr::Unique<T>, or vice-versa.

@saethlin
Copy link
Member

Would it be useful/make sense to add an unstable flag to disable noalias on everything but &mut?

Who would use such a flag? A skim over the search for Zmutable-noalias results on https://cs.github.com turns up mostly forks of Rust and people turning it on (probably because they want maximum optimization and the code is old or they don't know it is the default?). I can find exactly one repo that sets -Zmutable-noalias=no, libhermit-rs, which has it turned off as a mitigation for UB in the codebase, and fuschia sets -Zmutable-noalias=off because of rust-lang/rust#63818. That's it. I agree that it would be easier for people to do benchmark experiments if they had a flag instead of editing the compiler's source to remove noalias on Box, but we don't even have an ecosystem benchmark suite yet.

@RalfJung
Copy link
Member Author

Unsurprisingly it turns out that Vec is full of aliasing violations, if you interpret its Unique like a suitably sized Box: rust-lang/rust#94421.

@saethlin
Copy link
Member

saethlin commented Apr 30, 2022

Miri surely does not like that PR, but it's not entirely due to the current implementation of Vec. The PR needs a lot of careful attention, and probably a better way to run Miri on it (running miri-test-libstd on a local checkout and manually dumping the cache is the best I could come up with, and that's quite an arduous procedure). For example, this diff trips right over the issue that the comment above it says to avoid:

    pub fn as_mut_ptr(&mut self) -> *mut T {
        // We shadow the slice method of the same name to avoid going through
        // `deref_mut`, which creates an intermediate reference.
-        let ptr = self.buf.ptr();
+        let ptr = self.buf.as_mut_ptr().cast::<T>();
        unsafe {
            assume(!ptr.is_null());
        }

There's also an issue where the extend impl when passed a slice produces an uninitialized Box.

(but yes, among other things, the IntoIter as it stands after that PR is unsound due to Box invalidation, and I expect there are other problems)

(oh whoops I read your comment here instead of commenting on that PR) sigh

@RalfJung
Copy link
Member Author

That example should be fine even under Stacked Borrows since everything is well-nested. You are basically "reborrowing" the Box.

@RalfJung
Copy link
Member Author

@thomcc During an opsem meeting, someone mentioned that you are rather firmly against having any sort of aliasing restriction for Vec. I would be curious to hear more about that. Do you have concrete pieces of code that you think must work and are worried would break due to aliasing rules, or are you fundamentally opposed to any kind of aliasing requirement? Do you know what t-libs-api thinks about this?

Vec is defined via Unique and there at least used to be the intention to make Unique carry some noalias annotation, once we figure out what exactly that means.

@RustyYato
Copy link

I know that crates like typed_arena would break. https://docs.rs/typed-arena/latest/typed_arena/struct.Arena.html

It relies on Vec as the storage mechanism, and allows references into the Vec to persist across Vec::push. Which would break if Vec invalidated references into it after moves/borrows of the Vec.

I know this crate is unsound because it isn't hardened against malicious code, but the basic premise is currently sound.

@thomcc
Copy link
Member

thomcc commented Sep 26, 2023

Unlike with Box, we've never documented that Vec has this restriction, and it's unintuitive. It also breaks a lot of reasonable use cases. I think having it on Vec requires quite a bit more justification than with anything else.

there at least used to be the intention to make Unique carry some noalias annotation

My understanding was it was more an indication that these are types which semantically could be noalias. Certainly for the time I've been working on the stdlib, Unique sticks around not because of noalias at all, but because it solves the common need of: "we want a NonNull<T> that also has PhantomData<T>".

Not for nothing, but understanding of aliasing semantics back then was not very firm, so I don't know that legacy holds much weight in my mind, especially when NonNull didn't exist when it was created.

Do you know what t-libs-api thinks about this?

No.

@RalfJung
Copy link
Member Author

RalfJung commented Sep 26, 2023 via email

@RalfJung
Copy link
Member Author

To elaborate a bit: retagging only happens when a pointer is moved by-value. For arenas, the moment the first element got created, the arena will not be moved any more -- everyone just has references to it. This means the Unique never gets retagged so we never have an aliasing violation.

Furthermore, Unique is different from references in that we can't know how much memory there is behind that pointer, so we only retag memory that is actually being accessed. This makes even the following code legal:

let mut v = vec![0, 1];
let ptr1 = v.as_mut_ptr();
let mut v2 = v;
let ptr2 = v.as_mut_ptr().add(1);
ptr1.write(0);
ptr2.write(0);

The aliasing requirements only start rejecting code when the same element gets accessed again with a freshly retagged pointer (e.g., if ptr2 above also pointed to index 0).

Vecs are moved rather rarely I think, so I don't think a lot of code would be affected by this. Of course that also means not a lot of code would benefit from the potential optimizations here.

Unlike with Box, we've never documented that Vec has this restriction, and it's unintuitive.

That is fair.

It also breaks a lot of reasonable use cases.

That is the part I am disputing. At least I have not seen evidence of that.

johnschug added a commit to gpg-rs/gpgme that referenced this issue Jan 5, 2024
Change to use `NonNull` instead of `Box` due to the requirement
that boxed values are not aliased.

See rust-lang/unsafe-code-guidelines#326.
@ahicks92

This comment was marked as resolved.

@Diggsey

This comment was marked as resolved.

@ahicks92

This comment was marked as resolved.

@digama0

This comment was marked as resolved.

@ahicks92

This comment was marked as resolved.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2024

Yeah I don't think discussing editions is productive here. I don't think there is any future in which Box names a different type depending on the edition (the confusion this causes at API boundary would be endless), and if Box is the same type everywhere then it must have the same opsem behavior everywhere.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2024

Box with a custom allocator should probably not be noalias, or else we run into issues like rust-lang/miri#3341.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 4, 2024

What makes that custom allocator example interesting is that even if we made Unique special rather than making Box special, there would be an issue -- the user there expected to be able to use the data pointer (that got passed to deallocate) to access allocator metadata that's shared among multiple allocations. IOW, they were using a Unique pointer to access shared data. The only way I see to resolve that is to only add noalias for Box<_>, i.e. then the allocator is the default (Global). Obviously that cannot be expressed with Unique.

Making Box<_> and Box<_, A> the same type really doesn't work well for these sorts of things.

@chorman0773
Copy link
Contributor

chorman0773 commented Mar 4, 2024

This just sounds like anothrt argument against Box<T> being special.

Although another fix is to make the Allocator trait special like the global Allocator.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 4, 2024

To me it sounds like another argument showing that custom allocators were added to Box prematurely and without sufficient design. 🤷

Although another fix is to make the Allocator trait special like the global Allocator.

We haven't even figured out how to do that for custom global allocators yet, that's #442. But indeed maybe something similar could work here.

@RalfJung RalfJung added A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows) and removed A-provenance Topic: Related to when which values have which provenance (but not which alias restrictions follow) labels Mar 27, 2024
@RalfJung
Copy link
Member Author

RalfJung commented May 28, 2024

For the record, I am at this point largely ambivalent on this question. The Box noalias is causing enough trouble in Miri that I wouldn't mind see it go, just from an implementation perspective. Giving semantics to Unique via Tree Borrows worked but actually giving it the right semantics turns out to be more tricky than expected. I'm just worried about this being a step we can't take back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows)
Projects
None yet
Development

No branches or pull requests