Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aliasing: Does Vec own its contents? #85697

Closed
vakaras opened this issue May 25, 2021 · 20 comments
Closed

Aliasing: Does Vec own its contents? #85697

vakaras opened this issue May 25, 2021 · 20 comments

Comments

@vakaras
Copy link
Contributor

vakaras commented May 25, 2021

(Opening a new issue to continue the discussion started in #60847.)

The core of the question is whether I can assume that x and v are disjoint (x is not pointing into one of the v's elements):

fn test(v: &mut Vec<u32>, x: &mut u32) {
    // Can I assume here that x is not pointing to v?
}

I see at least two reasons why x should not be allowed to alias an element of v:

  1. It looks very counter-intuitive because it is typically illegal to have active mutable references to a structure and to its part.
  2. Taking into account how pervasive Vec is, this would likely kill most of the benefits for static analysis that Rust has against languages like C. The reason for this is that a sound tool (a static analyser or a verifier) would need to track whether any reference could potentially alias some element of a vector, which typically requires quantifiers. Unfortunately, using quantifiers typically leads to reasoning becoming undecidable. Not sure whether it is possible to somehow avoid the undecidability in this particular case, but scalability and performance will certainly be a huge problem.

At least one reason why aliasing should be allowed is given in the issue 60847. If I am not mistaken, that issue was motivated by this code in typed arena, which uses Vec as a memory allocator. A variation of the code in the issue allows creating a reference that aliases the contents of the vector (playground):

fn main() {
    let mut v = vec![1, 2, 3, 4, 5];
    v.pop();
    let ptr = v.as_mut_ptr();
    let ptr = unsafe { &mut *ptr };
    test(&mut v, ptr);
    println!("{:?}", v);    // [8, 2, 3, 4, 7]
}
fn test(v: &mut Vec<u32>, x: &mut u32) {
    assert!(v.len() < v.capacity());
    v.push(7);
    *x = 8;
}

My personal (highly biased) point of view is that it would be cleaner to separate the two use cases (vector as a collection of things and vector as a memory allocator) because it would allow the majority of Vec users to not worry about potentially problematic aliasing. However, this would require having a new data structure that can be used as an allocator; maybe, RawVec could be adopted for this?

If it is decided that the aliasing should not be allowed, then I have two additional questions:

  • Does Box also give the same guarantee?
  • Is this non-aliasing property a property of Vec or a property of the Unique pointer that is used inside Vec?

cc @RalfJung @matklad

@the8472
Copy link
Member

the8472 commented May 25, 2021

Is this really a matter of Vec and not the unsafe { &mut *ptr };? I.e. isn't this more about the question how far the taint of unsafe blocks can spread? test lives in the same module as main so in this case they have to work together.

If test lived somewhere else then it shouldn't have to be aware of the potential aliasing and be able to make full use of the Vec granted by the exclusive reference.

In the typed arena case the Vec is kept local to the arena, so it is never handed out to code which would be free to exploit Vec's entire API surface.

@vakaras
Copy link
Contributor Author

vakaras commented May 26, 2021

Is this really a matter of Vec and not the unsafe { &mut *ptr };? I.e. isn't this more about the question how far the taint of unsafe blocks can spread? test lives in the same module as main so in this case they have to work together.

This is a very good question. I think that we have to talk about Vec in this case because we must explicitly allow such a use case by promising no to do certain changes to Vec's implementation. If I understand you correctly, the suggestion would be to give a weaker interpretation to Vec in some part of the codebase. The programmer's responsibility then would be to ensure that this weaker interpretation is safely encapsulated within some safe abstraction. If we go this path, my question is how do we signal in code that the type invariant does not hold? In particular, how do we mark that the test function can accept aliasing arguments?

One option would be to simply write that in the documentation. As Ralf suggested, the tools could provide special annotations for this case, for example, #[broken(v)]. However, what I like about Rust is that for the functions that require stronger preconditions to ensure their memory safety, we have a language level keyword unsafe, which allows noticing them. Therefore, I would say that if we allow functions to be called with arguments that do not uphold the conditions required by their types, we should have something similar here.

Another question would be: how do we call such functions as test? I would not call them “super-safe” because at least some of them might be unsafe. For example, if we remove assert in my example, the function test does not violate memory safety only if v.len() < v.capacity(), which means that it should be unsafe:

/// Safety: `v.len() < v.capacity()`
unsafe fn test(v: &mut Vec<u32>, x: &mut u32) {
    v.push(7);
    *x = 8;
}

To sum up, it seems to me that allowing to weaken guarantees given by types brings additional complexity and I am wondering whether that is clearly better than having more, but simpler types, which are much easier to understand.

@the8472
Copy link
Member

the8472 commented May 26, 2021

Therefore, I would say that if we allow functions to be called with arguments that do not uphold the conditions required by their types, we should have something similar here.

We already have that. There are 3 types of unsafe

  • unsafe fn - unsafe to call
  • unsafe { } - calling unsafe things
  • unsafe trait - unsafe to implement, because unsafe code may rely on it in ways that aren't expressed in unsafe blocks

test is the 3rd kind, not the 1st. It's perfectly safe to call by any kind of code that has safely constructed a Vec and a mutable u32. But it's intended to be used in conjunction with code that constructs those things in unsafe ways.

Note that unsafe traits can entirely consist of safe functions, or no functions at all.

But since test is a module sibling of the unsafe code and you can reason about it sortof locally using a trait would be overkill to declare this kind of relation. If it were provided by 3rd-party code then you'd use an unsafe trait for that.

@RalfJung
Copy link
Member

RalfJung commented May 26, 2021

Taking into account how pervasive Vec is, this would likely kill most of the benefits for static analysis that Rust has against languages like C.

FWIW, Vec is not so pervasive as a type at function boundaries; slices are much more common. So I don't think impact is quite as large as you make it sound here.

Aliasing: Does Vec own its contents?

This conflates aliasing requirements and ownership, which are two separate discussions. ;) A Vec definitely owns its contents (as part of its safety invariant), but that does not mean that it is UB to have aliasing as long as the actually accessed memory regions are disjoint (which is a validity invariant question). In fact this is what is meant by 'aliasing' in LLVM: pointers alias when they are used in conflicting ways (e.g. used to access the same location such that at least one access is a write -- as long as everything is read-only, even equal pointers do not 'alias').

Some quotes from #60847 reflecting my position on this question:

These examples exploit deep knowledge of the invariant of Vec. They break the abstraction barrier, taking apart the ownership of the backing buffer. But they don't actually violate the underlying ownership discipline.

For most types, breaking the abstraction barrier like this would be considered a bug. The exact invariant is an internal implementation detail. Vec, however, is a bit special in that it quite explicitly exposes a lot of its implementation details in the documentation. Due to this, I think code like your example is justified.

I think Vec is actually meant to be used [as an allocator], so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.

@RalfJung
Copy link
Member

RalfJung commented May 26, 2021

However, what I like about Rust is that for the functions that require stronger preconditions to ensure their memory safety, we have a language level keyword unsafe, which allows noticing them. Therefore, I would say that if we allow functions to be called with arguments that do not uphold the conditions required by their types, we should have something similar here.

As mentioned in the other thread, functions like test are the "opposite" of unsafe: they are always safe to call when all arguments satisfy their usual invariants, and they are even safe to call when some of the arguments do not satisfy the usual invariants.
This pretty much means such functions will only come up within a modular boundary, which is another argument for why they should not be a big problem for verification.

Another question would be: how do we call such functions as test? I would not call them “super-safe” because at least some of them might be unsafe. For example, if we remove assert in my example, the function test does not violate memory safety only if v.len() < v.capacity(), which means that it should be unsafe:

No, it should not be unsafe. The function is still safe to call for any v, x that satisfy their safety invariant (and bring along the corresponding ownership). That is the very definition of when a function is considered 'safe'.
It's the caller that is doing wacky things here, not test. I would expect verification of this code to have to pretty much inline test into the caller -- the "default spec" for test (all arguments satisfy the safety invariant) is not enough since its precondition cannot be established in main. The failure, if any, is in main, not test!

@RalfJung
Copy link
Member

So, regarding this question:

I think Vec is actually meant to be used [as an allocator], so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.

I think some input from @rust-lang/libs could be useful, since this is also an API design question.

@SimonSapin
Copy link
Contributor

SimonSapin commented May 26, 2021

I think Vec is actually meant to be used [as an allocator]

I’m not sure this was ever by intentional design, but for a time (before std::alloc::alloc) Box / Vec + into_raw was the only way to allocate memory on Stable Rust.

this code in typed arena, which uses Vec as a memory allocator.

Another reason to use Vec this way in in typed-arena is to rely on its Drop impl which has #[may_dangle] which in turn allows arena-allocated items to borrow each other, potentially with cycles. Writing a custom (outside of the standard library) container that can do this would require Nightly since #[may_dangle] is unstable: #34761

@vakaras
Copy link
Contributor Author

vakaras commented May 26, 2021

test is the 3rd kind, not the 1st. It's perfectly safe to call by any kind of code that has safely constructed a Vec and a mutable u32. But it's intended to be used in conjunction with code that constructs those things in unsafe ways.

Thank you, @the8472, for pointing out the relation with unsafe traits! Now this makes much more sense to me.

Taking into account how pervasive Vec is, this would likely kill most of the benefits for static analysis that Rust has against languages like C.

FWIW, Vec is not so pervasive as a type at function boundaries; slices are much more common. So I don't think impact is quite as large as you make it sound here.

That is a fair point. However, if we lost automation each time a Vec was used, I think it would be a huge problem in practice.

Nevertheless, if we take @the8472's proposal that aliasing can happen only in well-encapsulated cases and in general safe code it is forbidden, then we are fine because only these tiny number of cases will be affected (by now rereading your arguments, I think that you had the same point, just I did not understand that; I am sorry about that).

Most of my comments were based on misunderstanding and can be ignored.

I think Vec is actually meant to be used [as an allocator]

I’m not sure this was ever by intentional design, but for a time (before std::alloc::alloc) Box / Vec + into_raw was the only way to allocate memory on Stable Rust.

Vec::into_raw_parts is still unstable, which makes everyone to use Vec::as_mut_ptr. I find this extremely strange API for allocation because Vec::as_mut_ptr takes &mut self while it is used basically for destroying a Vec instance.

this code in typed arena, which uses Vec as a memory allocator.

Another reason to use Vec this way in in typed-arena is to rely on its Drop impl which has #[may_dangle] which in turn allows arena-allocated items to borrow each other, potentially with cycles. Writing a custom (outside of the standard library) container that can do this would require Nightly since #[may_dangle] is unstable: #34761

Thanks for mentioning #[may_dangle]! If I understand correctly, this is an example of unsafe trait (actually, unsafe impl of a safe trait), which weakens the precondition of the drop function, that @the8472 was talking about.

@SimonSapin
Copy link
Contributor

Yes, #[may_dangle] is unique in that using it requires unsafe impl instead of just impl to implement a trait that is not unsafe trait, because those particular impls have additional responsibilities (not dereferencing potentially-dangling references). https://rust-lang.github.io/rfcs/1327-dropck-param-eyepatch.html has the details.

@vakaras
Copy link
Contributor Author

vakaras commented May 27, 2021

So, regarding this question:

I think Vec is actually meant to be used [as an allocator], so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.

I think some input from @rust-lang/libs could be useful, since this is also an API design question.

Based on @SimonSapin comments it seems that at least now Vec is the only way to implement typed arena in stable Rust. Therefore, we cannot discourage using Vec as an allocator at least until it is possible to reimplement Vec in stable Rust and I guess that it will take time to stabilize all the things that are necessary for that (is #[may_dangle] the only one, or are there more?).

I feel that my main questions were answered, so unless someone wants to continue the discussion, I will close the issue.

@vakaras
Copy link
Contributor Author

vakaras commented May 31, 2021

Let me try to summarize my understanding so that in case someone else gets confused about the aliasing of Vec contents, they need to read only this comment and not the entire thread.

My original question can be rephrased as:

If I have an owned vector (Vec<T>) or a mutable reference into a vector (&mut Vec<T>), can I assume that there are no references that alias the contents of it? For example, can I assume that x is not referencing one of v's elements:

fn test(v: &mut Vec<u32>, x: &u32) { /* ... */ }

The answer is: yes, by default it can be assumed that the contents of the vector are not aliased.

Unsafe code is allowed to weaken this type invariant, but the programmer writing unsafe code must ensure that this weakening is properly encapsulated and observable only to code that promises to handle it. A preferred way for a piece of code to indicate that it allows a weaker precondition (or stronger postcondition) than implied by the types is by implementing the unsafe trait that specifies this weakening. However, if all relevant code is inside a single module/crate (that is the code that weakens the type invariant and the code that handles types with weakened invariants are in the same place) and is ensured to not leak weakened types, then just documenting the code could be sufficient.

So, we have a way to weaken an invariant of the type in a well-encapsulated portion of a code. However, do we actually need this additional complexity? In particular, is it necessary to use Vec as an allocator? Cannot we just use the allocation APIs directly? The reason why in some cases it is necessary to use Vec as an allocator is that Vec belongs to the standard library and can be implemented by using still unstable language features. For example, #[may_dangle] attribute is currently unstable and is needed for implementing typed arena.

@SimonSapin
Copy link
Contributor

I think "using Vec as an allocator" is misleading, as it is mostly unrelated to this issue. Before std::alloc was a thing, one might have written code like this:

// Other alignments left as an exercise to the reader
fn alloc_with_align_1(size: usize) -> *mut u8 {
    let mut vec = Vec::with_capacity(size);
    let ptr = vec.as_mut_ptr();
    mem::forget(vec);
    ptr
}

fn dealloc_with_align_1(size: usize, ptr: *mut u8) {
    mem::drop(Vec::from_raw_parts(ptr, 0, size))
}

As far as I understand there’s nothing wrong with this code.

What’s special and potentially dubious about the typed-arena crate is that it uses unsafe code to call Vec::push (which takes &mut Vec<T>) while there still are &T references to items inside the Vec. The reasoning is that &mut Vec is not given to arbitrary caller-defined code. Instead the crate relies on the documented behavior of Vec::capacity and Vec::push to ensure that although that T item is reachable from &mut Vec, which would create mutable aliasing, it’s not actually gonna be reached or moved (reallocated) when not at full capacity.

#[may_dangle] is also besides the point. Even if drop checking was not an issue, saying that typed-arena could be written differently does not change the fact that there may be other authors of unsafe code out there, reading the documentation of Vec::push and Vec::capacity and making the same assumption.

@SimonSapin
Copy link
Contributor

If fact to implement an arena allocator without Vec I might be tempted to write code like this:

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3dbaacd318afc84ab726189d68f5f73f

#![feature(maybe_uninit_extra)]

type Block<T> = Box<std::cell::UnsafeCell<[std::mem::MaybeUninit<T>]>>;

pub struct Arena<T> {
    current_block: Block<T>,
    current_block_len: std::cell::Cell<usize>,
    // ...
}

impl<T> Arena<T> {
    pub fn alloc(&self, value: T) -> &mut T {
        let slice = unsafe { &mut *self.current_block.get() };
        let next = self.current_block_len.get();
        if let Some(slot) = slice.get_mut(next) {
            self.current_block_len.set(next + 1);
            slot.write(value)
        } else {
            // Allocate the next block etc.
            unimplemented!()  
        }
    }
}

… which looks a lot like #60847 since slice can alias with &mut T references returned by previous calls to alloc. Does this unsafe code violate Stacked Borrows?

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2021

@vakaras

A preferred way for a piece of code to indicate that it allows a weaker precondition (or stronger postcondition) than implied by the types is by implementing the unsafe trait that specifies this weakening.

FWIW, I don't think this is a general recommendation. Traits should be used when abstraction over the underlying type is required. But I don't think it makes sense to introduce a trait just to specify the weaker precondition. I'd just document it properly in the functions doc-comment.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2021

@SimonSapin

Does this unsafe code violate Stacked Borrows?

Yes I think it does, right here: &mut *self.current_block.get(). This creates a mutable reference to the entire slice, thereby claiming "I hold the only and unique pointer to all this memory". I don't think you intended to make that claim.

@vakaras
Copy link
Contributor Author

vakaras commented Jun 2, 2021

Sorry, for the late response, I was travelling.

Reopening the issue since it seems the discussion is not over yet. By the way, if you think that this discussion (or part of it) should be moved somewhere else, please let me know.

  let ptr = vec.as_mut_ptr();
  mem::forget(vec);

As far as I understand there’s nothing wrong with this code.

As far as I know, this code is correct and still used in many places. I personally do not like this as_mut_ptr + mem::forget pattern because as_mut_ptr takes &mut self and it feels wrong to me that the returned pointer outlives the borrow. However, until into_raw_parts is stabilized, we do not seem to have a choice.

What’s special and potentially dubious about the typed-arena crate is that it uses unsafe code to call Vec::push (which takes &mut Vec<T>) while there still are &T references to items inside the Vec. The reasoning is that &mut Vec is not given to arbitrary caller-defined code.

I think we are talking about the same thing just in different words. What I am saying is that such uses of Vec::push are fine as long as they are well hidden behind a safe abstraction and &mut Vec is not leaked to arbitrary code, which I think is the same what you say with your last sentence. Or am I completely misunderstanding you?

#[may_dangle] is also besides the point. Even if drop checking was not an issue, saying that typed-arena could be written differently does not change the fact that there may be other authors of unsafe code out there, reading the documentation of Vec::push and Vec::capacity and making the same assumption.

Yes, you are right.

If fact to implement an arena allocator without Vec I might be tempted to write code like this:

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=3dbaacd318afc84ab726189d68f5f73f

@SimonSapin May I ask you why would you prefer to use Box instead of using the allocation API directly? I guess that the latter would require more unsafe code, but also that unsafe code would be easier to reason about because it would not need to ensure to not violate invariants of safe types.

@vakaras

A preferred way for a piece of code to indicate that it allows a weaker precondition (or stronger postcondition) than implied by the types is by implementing the unsafe trait that specifies this weakening.

FWIW, I don't think this is a general recommendation. Traits should be used when abstraction over the underlying type is required. But I don't think it makes sense to introduce a trait just to specify the weaker precondition. I'd just document it properly in the functions doc-comment.

It also feels a bit strange for me to use a trait in this case. However, writing just a doc-comment feels too weak for me. To have a concrete example, let's say we have two crates:

Crate provider-crate:

/// This function appends an element to a vector without creating
/// aliases to already existing elements.
pub fn append(v: &mut Vec<u32>) {
    // ...
}

Crate client-crate:

// ...
    // SAFETY: This call is safe because `append` does not create
    // aliases to already existing elements in `v`.
    unsafe {
        append(v);
    }
// ...

The reason why I do not feel comfortable about such code is that if append implementation violates its contract, this can lead to undefined behaviour, but there is no unsafe keyword in provider-crate to actually indicate that. Therefore, I think it is more likely that some bug will slip through a code review. However, Vec::push is an example of a function that has a weakened precondition and does not rely on an unsafe trait for that, which indicates that you are probably right.

@vakaras vakaras reopened this Jun 2, 2021
@SimonSapin
Copy link
Contributor

personally do not like this as_mut_ptr + mem::forget

Please ignore that part. I used it because when std::alloc didn’t exist neither did ManuallyDrop or into_raw_parts, but it’s not the point of that example.

@SimonSapin May I ask you why would you prefer to use Box instead of using the allocation API directly?

When writing a custom container library, in some ways Box<[MaybeUninit<T>]> is equivalent to ptr: *mut T, capacity: usize but encodes more invariants in the type system, leaving fewer responsibilities to unsafe code which seems like an advantage. But I suppose #85697 (comment) shows that in a model like Stacked Borrow where merely creating aliasing &mut is UB it can be an advantage to stick with raw pointers as long as possible. Though in this case perhaps the problem was the way UnsafeCell was used, more than Box v.s. *mut. (Maybe something like type Block<T> = Box<[std::cell::UnsafeCell<std::mem::MaybeUninit<T>>]>; would be better.)

@vakaras
Copy link
Contributor Author

vakaras commented Jun 2, 2021

@SimonSapin May I ask you why would you prefer to use Box instead of using the allocation API directly?

When writing a custom container library, in some ways Box<[MaybeUninit<T>]> is equivalent to ptr: *mut T, capacity: usize but encodes more invariants in the type system, leaving fewer responsibilities to unsafe code which seems like an advantage. But I suppose #85697 (comment) shows that in a model like Stacked Borrow where merely creating aliasing &mut is UB it can be an advantage to stick with raw pointers as long as possible. Though in this case perhaps the problem was the way UnsafeCell was used, more than Box v.s. *mut. (Maybe something like type Block<T> = Box<[std::cell::UnsafeCell<std::mem::MaybeUninit<T>>]>; would be better.)

Thanks! This is very useful to know.

@RalfJung
Copy link
Member

RalfJung commented Jun 2, 2021

As far as I know, this code is correct and still used in many places. I personally do not like this as_mut_ptr + mem::forget pattern because as_mut_ptr takes &mut self and it feels wrong to me that the returned pointer outlives the borrow. However, until into_raw_parts is stabilized, we do not seem to have a choice.

FWIW, the recommended way to do that is to use ManuallyDrop instead of mem::forget (and the mem::forget docs say as much).

The reason why I do not feel comfortable about such code is that if append implementation violates its contract, this can lead to undefined behaviour, but there is no unsafe keyword in provider-crate to actually indicate that.

I see nothing wrong with that. The same happens, for example, when unsafe code depends on some entirely safe data structure and relies on its functional correctness. That is totally legitimate.

@vakaras
Copy link
Contributor Author

vakaras commented Jun 3, 2021

As far as I know, this code is correct and still used in many places. I personally do not like this as_mut_ptr + mem::forget pattern because as_mut_ptr takes &mut self and it feels wrong to me that the returned pointer outlives the borrow. However, until into_raw_parts is stabilized, we do not seem to have a choice.

FWIW, the recommended way to do that is to use ManuallyDrop instead of mem::forget (and the mem::forget docs say as much).

Not so long time ago, I looked into a random sample of as_mut_ptr uses (from Rust Corpus) and found that at least in that sample mem::forget was more common than ManuallyDrop. So, it seems that there is still some code to migrate.

The reason why I do not feel comfortable about such code is that if append implementation violates its contract, this can lead to undefined behaviour, but there is no unsafe keyword in provider-crate to actually indicate that.

I see nothing wrong with that. The same happens, for example, when unsafe code depends on some entirely safe data structure and relies on its functional correctness. That is totally legitimate.

A fair point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants