Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a noalias language item #1410

Closed
wants to merge 4 commits into from
Closed

Add a noalias language item #1410

wants to merge 4 commits into from

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Dec 15, 2015

Add a noalias language item and wrapper struct (similar to non_zero) to
mark raw pointers as noalias.

@codyps
Copy link

codyps commented Dec 16, 2015

So I presume this is like restrict in C/C++, but for rust?

Q: would it make sense to allow more fine grained statement of the aliasing (or non-aliasing)? This appears to allow us to say one block doesn't alias any other block, but doesn't let us restrict that to something like A doesn't alias C, but might alias B (unless there is some way to use the proposed NoAlias to do that by controlling scopes?). I suppose nothing prevents adding a more controlled approach later, if needed.

@ghost
Copy link

ghost commented Dec 16, 2015

I think that this is a good move for making Box output code that assumes that there are no aliases, although the RFC itself seems like it could go a bit more in depth on how this is useful outside of the standard library. I'd rather not see the RFC get shut down because the solution is too niche, when the problem with Box is a nice one to solve.

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 16, 2015

This applies to every object that owns a pointer.

@codyps
Copy link

codyps commented Dec 16, 2015

@Undeterminant I'm not sure why this would be limited to the standard library. Isn't anyone that needs to write wrapper types similar to Boxes going to run into the same thing?

@huonw
Copy link
Member

huonw commented Dec 16, 2015

It seems this is essentially semantically Unique?

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Dec 16, 2015
@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 16, 2015

@huonw

A wrapper around a raw *mut T that indicates that the possessor of this wrapper owns the referent.

noalias does not imply ownership or mutability.

@steveklabnik
Copy link
Member

noalias does not imply ownership or mutability.

But noalias is usually applied to &mut, in my understanding, but cannot for raw pointers because they may be aliased. Unique brings that property back, which means Unique<T>s could/should also be marked noalias. At least, that's my understanding.

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 16, 2015

Maybe Unique could be implemented as follows

pub struct Unique<T: ?Sized> {
    pointer: NonZero<NoAlias<T>>,
    _marker: PhantomData<T>,
}

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 16, 2015

I've updated the text a bit. I guess the previous version wasn't strict enough to allow for any significant opitimizations.

Julian Orth added 2 commits December 16, 2015 20:31
This reverts commit 55673e1.

This is wrong and the real case is already covered in the section below.
@oli-obk
Copy link
Contributor

oli-obk commented Dec 17, 2015

I'm confused as to why this is happening anyway. Shouldn't all &mut be marked llvm-noalias during trans anyway? Since *some_box = 42 will create an intermediate &mut due to DerefMut, shouldn't this already get optimized correctly? And since it's not, shouldn't we simply fix this bug? I mean we already have a way of unsafely stating a *mut is noalias: turning it into a &mut.

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 17, 2015

I don't think that's how it does or should work.

unsafe fn f(x: *mut u8, y: *mut u8) -> u8 {
    {
        let x = &mut *x;
        *x = 11;
    }
    {
        let y = &mut *y;
        *y = 22;
    }
    {
        let x = &mut *x;
        *x
    }
}

This code should allow x and y to point to the same addresses and return either 11 or 22. I'm sure there is code out there that relies on this.

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 18, 2015

The whole Unique and Shared thing you're doing in the stdlib seems to be a
step in the wrong direction because it doesn't really capture the different
kinds of pointers that can exist. I think there are six types of pointers:

        | mutable memory | mutable object | immutable object |
--------+----------------+----------------+------------------+
noalias |                |                |                  |
--------+----------------+----------------+------------------+
alias   |                |                |                  |
--------+----------------+----------------+------------------+

Where

  • mutable memory means that the memory pointed to can be mutated but doesn't
    necessarily contain a valid object
  • mutable object means that the memory pointed to contains a valid object
    that can be mutated
  • immutable object means that the memory pointed to contains a valid object
    that cannot be mutated (through this pointer)
  • noalias means that all mutation of the pointed to memory is done through a
    pointed based in this pointer
  • alias means that mutation of the pointed to memory can happen through
    other pointers

Here is how those types of pointers are used today:

        | mutable memory | mutable object | immutable object |
--------+----------------+----------------+------------------+
noalias | Vec            | Box            | Rc               |
--------+----------------+----------------+------------------+
alias   | <unused>       | <unused>       | <unused>         |
--------+----------------+----------------+------------------+

Note in particular that Rc<T> is noalias because the contents are never
mutated. This is unless T contains a Cell in which case noalias is always
disabled.

Whether the pointers can be copied:

        | mutable memory | mutable object | immutable object |
--------+----------------+----------------+------------------+
noalias | <no>           | <no>           | Copy             |
--------+----------------+----------------+------------------+
alias   | Copy           | Copy           | Copy             |
--------+----------------+----------------+------------------+

noalias/immutable object can be copied for the same reason Rc can be cloned.

More interestingly, some of those pointers can implement Deref:

        | mutable memory | mutable object | immutable object |
--------+----------------+----------------+------------------+
noalias | <impossible>   | Deref/DerefMut | Deref            |
--------+----------------+----------------+------------------+
alias   | <impossible>   | <dangerous>    | <dangerous>      |
--------+----------------+----------------+------------------+

It is clear that the mutable memory pointers cannot implement it because they
might not point to valid objects.

One big problem is that implementing Deref for alias objects is dangerous
because the created references can have overlapping lifetimes which is not good
when you have aliasing pointers. This problem is closely related to
this
comment on reddit. The author of the blog post complained that using raw
pointers is cumbersome because of the syntax and the linked post then
recommended simply casting the pointers to references. This is incorrect because
of the mentioned aliasing problems.

The solution of this problem is not to implement arrow syntax but to add
UnsafeDeref and UnsafeDerefMut traits. These traits would have the following
properties:

  • Dereferencing is unsafe
  • Auto-deref doesn't happen, even in unsafe blocks

Assuming there are wrappers for the alias/mutable object and alias/immutable object cases, those wrappers would then implement UnsafeDeref and
UnsafeDerefMut.

To come back to this RFC: The noalias wrappers would all contain NoAlias
pointers.

@oli-obk
Copy link
Contributor

oli-obk commented Dec 18, 2015

This code should allow x and y to point to the same addresses and return either 11 or 22. I'm sure there is code out there that relies on this.

Uhm... Your code already does this, and what I said doesn't change it. The &mut you created never alias... They'd only alias if you had them in the same scope.


either 11 or 22

That should most definitely not happen. If the pointers can alias, then changing the order must not be allowed. If the pointers cannot alias, then changing the order must not have any effect on the observable state.

@mahkoh
Copy link
Contributor Author

mahkoh commented Dec 18, 2015

@oli-obk

That should most definitely not happen. If the pointers can alias, then changing the order must not be allowed. If the pointers cannot alias, then changing the order must not have any effect on the observable state.

I didn't mean to imply that the value is unspecified but that the returned value depends on whether the pointers point to the same address.

They'd only alias if you had them in the same scope.

That's exactly my point. The &mut created by dereferencing the boxes don't exist in the same scope because they are not bound to any variables. In the case of

pub extern fn bx(mut x: Box<u8, Dummy>, mut y: Box<u8, Dummy>) -> u8 {
    let x = &mut *x;
    let y = &mut *y;
    *x = 11;
    *y = 22;
    *x
}

I would agree that this proves that the pointers can't point to the same addresses.

@petrochenkov
Copy link
Contributor

Does Rust have a formal aliasing model covering all interaction between references/pointers/cells beyond "LLVM's aliasing model + list of places in rustc where noalias is applied"? I don't think I've seen a strict formalization yet.

@Ericson2314
Copy link
Contributor

unsafe fn f(x: *mut u8, y: *mut u8) -> u8 {
    {
        let x = &mut *x;
        *x = 11;
    }
    {
        let y = &mut *y;
        *y = 22;
    }
    {
        let x = &mut *x;
        *x
    }
}

I rather have a notion of "disjoint lifetimes": it is OK to &mut-borrow multiple times as long they are never "used simultaneously". I am not sure of the exact semantics of LLVM's noalias, but I hope that if re-borrowing doesn't violate any rules, this won't either.

@glaebhoerl
Copy link
Contributor

rust-lang/rust#19733 seems like the relevant issue

@Manishearth
Copy link
Member

There are examples of "aliasable mutable memory"; that's UnsafeCell (and the other cell types). UnsafeCell disables the usual noalias optimizations when behind a reference.

@nikomatsakis
Copy link
Contributor

Why did you opt to use *const T and not *mut T for the wrapped pointer? Given that there are not supposed to be any aliases of the referent except through the wrapped pointer, and given that *const T is supposed to mean "no mutation occurs through this pointer", that effectively means the referent is immutable (shallowly, at least), in which case the aliasing is kind of moot. What am I missing?

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 5, 2016

@nikomatsakis noalias does not imply mutability. The data could be located in immutable memory or the pointer could be handed out through another mechanism that doesn't, in general, allow mutation but guarantees that there are no aliases as long as the pointer is live.

Either way, *mut T would make every type that contains it invariant wrt. T. This decision should be left to higher level data structures.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 5, 2016

and given that *const T is supposed to mean "no mutation occurs through this pointer"

I think you were the one who originally suggested (in the variance rfc) to use *const pointers to get covariance and to not use *muteven if the data is mutable.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 5, 2016

However I agree that having to choose either *const or *mut for all NoAlias users is problematic. Some time ago I though about another way to add an arbitrary set of marker to raw pointers:

Linear markers

Currently there is only marker for raw pointers, namely NonNull<*const T> (or *mut.) Once other types of marker are added, this can become rather ugly:

Dereferencable<NoAlias<NonNull<*const T>>>

Instead, one could allow structs to contain multiple markers that change the behavior of pointers contained in the struct:

struct Uniqe<T> {
    ptr: *const T,
    _marker1: NonNull,
    _marker2: NoAlias,
    _marker3: Dereferencable,
}

Other restrictions could be added, e.g., that structs containing such markers can only contain a single raw pointer or that they cannot contain any fields besides the markers and a single raw pointer field. In this case, one could also add a marker trait that has to be implemented (by Unique) to enable the effects of the markers:

impl<T> PointerMarkers for Unique<T> { }

@nikomatsakis
Copy link
Contributor

On Tue, Jan 05, 2016 at 06:30:41AM -0800, mahkoh wrote:

and given that *const T is supposed to mean "no mutation occurs through this pointer"

I think you were the one who originally suggested (in the variance rfc) to use *const pointers to get covariance and to not use *muteven if the data is mutable.

Yes, ok. And in fact I did the same in Unique<T> for the same
reason.

@nikomatsakis nikomatsakis self-assigned this Jan 7, 2016
@nikomatsakis
Copy link
Contributor

So I think this RFC is pointing in a good direction but is somewhat premature. I feel like before we go off and add more public-facing newtypes around unsafe pointers, we should clarify the aliasing story (#1447). In particular, I think the rules we settle on for what kind of aliasing is legal and what is not will really inform the precise set of wrapper types (or other sorts of unsafe hints) that we eventually want.

Another factor is that we really need to do empirical evaluation of the overall effect of these sorts of changes. For example, as we found in rust-lang/rust#29485, the overall effects of aliasing annotations is often quite small, and it's not clear that they are worth the complexity they bring.

I think my personal feeling at the moment is that we probably want fairly lax aliasing rules by default, with types perhaps like this one that allow you to "ratchet up" the level of optimization (and take on the commensurate risk) in particularly important places. But I think we have to be very, very careful in this area, because experience has shown that it is really hard for people to reason about abstract aliasing rules. (It's all well and good if we can blame bad code on an incorrect annotation, but it's better not to have the bad code in the first place.)

So I'm inclined to close this RFC and "file it" under #1447. Thoughts?

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 11, 2016

I'm fine with this but only if there is actually a plan to address #1447. From what I can tell, https://github.com/rust-lang/rfcs/issues is where ideas go to die (including postponed RFCs.)

@nikomatsakis
Copy link
Contributor

@mahkoh

I'm fine with this but only if there is actually a plan to address #1447.

Well, let me set expectations correctly. I consider #1447 to be a
"rising" priority, but not yet at the top. There is plenty of active
discussion going on, particularly between @arielb1 and @RalfJung, but
I think we're not yet close to a final set of proposals, and I think
nobody on the @rust-lang/lang team has the bandwidth just now to get too
deeply involved. But I expect this to change as some of the current
"high priority" items start to get resolved (e.g., specialization, incr. comp., macros).

I should also say that part of my motivation not to add NoAlias (or
similar optimization hints) is that we already encounter strange
behavior (e.g., rust-lang/rust#29485, which you filed) from the
existing alias hints. Adding the ability for users to supply such
hints could only increase the rate at which we hit such bugs, which
generally takes time away from other things. Given that we don't
really know what shape we want those hints to take, this seems
like a bad tradeoff.

@mahkoh mahkoh closed this Jan 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.