-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite docs for std::ptr
(Take #2)
#51016
Conversation
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Haven't had a chance to read this over, but just popping in to say:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, this looks good to me. If @gankro and everyone else are okay with the claims made, I'm happy.
Thank you so much for keeping up with this, I know it's been a journey :)
I'm pretty sure y'all have better things to do :). Right now the (no longer unstable) pointer methods are just copy-pasted versions of the old docs with a few notes added regarding argument ordering. I can copy them over, preserving the notes in a later PR, or merge them into this one. This thing is already pretty massive though, so I'm inclined to wait. |
Oh and if the style is acceptable, I'll do |
Ping from triage, @steveklabnik ! |
@ecstatic-morse your PR still has |
☔ The latest upstream changes (presumably #51310) made this pull request unmergeable. Please resolve the merge conflicts. |
@shepmaster someone more intimate with the low-level details still needs to review this. I haven't had a chance and I don't know if I will. |
Yes, same here. |
- Add links to the GNU libc docs for `memmove`, `memcpy`, and `memset`, as well as internally linking to other functions in `std::ptr` - List invariants which, when violated, cause UB for all functions - Add example to `ptr::drop_in_place` and compares it to `ptr::read`. - Add examples which more closely mirror real world uses for the functions in `std::ptr`. Also, move the reimplementation of `mem::swap` to the examples of `ptr::read` and use a more interesting example for `copy_nonoverlapping`. - Change module level description - Define what constitutes a "valid" pointer. - Centralize discussion of ownership of bitwise copies in `ptr::read` and provide an example.
This also removes the overlong link that failed tidy xD.
They closely mirror the docs for `copy_nonoverlapping`
I rebased to eliminate the merge commit I introduced by using github's merge conflict resolver. Sorry, I'm sure there was a less messy way to do this. |
In order to make reviewing easier, here's a summary of the changes made in this PR. First and foremost, this PR addresses all the concerns raised in #29371. This includes linking to the C standard library functions, adding links to related functions in This PR also addresses #36450 by specifying the alignment requirements for all functions in While I was addressing those concerns, I noticed that the the conditions which could trigger undefined behavior were not enumerated (as they are in, for example, Valid PointersThe new docs define the concept of a "valid" pointer in the module-level docs, which is a combination of several requirements from the Undefined Behavior section of the reference. A valid pointer is one that:
The We also use validity to define requirements for functions which operate on a sequence of values (e.g. |
std::ptr
(Take #2) std::ptr
(Take #2)
This definition concerns me for a few reasons:
Besides these general points, I think the definition of "valid" needs to say more about Rust aliasing rules. The "general" LLVM aliasing rules are mentioned, but those are far weaker (the reference vaguely gestures at LLVM's "scoped noalias" to kind of cover how safe references behave). The upshot is: you can't really break the rules of safe references with raw pointer accesses. For example:
Of course, the crux is defining what certain key phrases ("while a borrow exists", "unrelated pointer", etc.) mean exactly but even without unsafe code guidelines it's better to give vague guidance than omit it entirely. |
@RalfJung, since you're working to formalize rust's memory model, do you have any thoughts on the proposed wording? I agree with @rkruppe that it's out of scope to exhaustively define what pointers are valid when passed to, for example, |
src/libcore/ptr.rs
Outdated
//! | ||
//! * The pointer is not null. | ||
//! * The pointer is not dangling (it does not point to memory which has been | ||
//! freed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd strengthen this to something like "the pointer points to a live allocation". Arguably, 0x8 as *const u32
does not "point to memory which has been freed", but still it's not a valid pointer.
src/libcore/intrinsics.rs
Outdated
/// | ||
/// * `src.offset(count-1)` must be [valid]. In other words, the region of | ||
/// memory which begins at `src` and has a length of `count * | ||
/// size_of::<T>()` bytes must belong to a single, live allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part after "in other words" says way more than the the first part! Also, Why is "src
must be valid" and "src.offset(count-1)
" split into two conditions? I see one condition here, which is "src.offset(i)
must be valid for reading size_of::<T>()
bytes for 0 <= i < count
", or, equivalently, "src
must be valid for reading size_of::<T>() * count
bytes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see in your comments in the thread that you argue it would otherwise violate aliasing rules. That seems plausible but unnecessarily contorted for a definition.
Given that validity has to be defined for a size anyway, I'd just change this to require validity of the entire affected memory range at once (i.e. length size_of::<T>() * count
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got a bit to clever here :) I was trying to rely exclusively on the definition of validity to define invariants over arrays as well as single accesses. The notion of a "single, live allocation" doesn't appear anywhere in the reference to my knowledge, which is why it went in "in other words".
As I note in my later comment, I think that making pointer validity a function of access size overcomplicates things a bit, and that the type of the pointer should fully specify the number of bytes being accessed when reading or writing.
From this point of view, your src.offset(i)
for all i
in 0..count
works formulation nicely, although it still references offset, which is maybe a bit opaque.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I note in my later comment, I think that making pointer validity a function of access size overcomplicates things a bit, and that the type of the pointer should fully specify the number of bytes being accessed when reading or writing.
In my eyes, making it depend on the whole type makes things more complicated. With your proposal, validity is a function of the pointer and its type, and you should say so clearly ("A pointer pointing to type T
is valid ..."). A type contains much more information than just a size! With my proposal, it is crystal clear that the size of the only part that actually matters ("A pointer is valid for an access of size n
..."). Notice that I use "pointer" here as a run-time concept; pointers themselves (as in, just the raw address in memory -- or whatever abstract representation of pointers you want to think of) are intrinsically untyped. So either way you need to add some extra information to define validity; the question is only if you restrict that to what is actually necessary (the size) or if you add a whole lot of other irrelevant information as well (the type).
Plus, for functions where the type alone does not suffice, we don't have to do awkward things like use .offset
in our definition and rely on that function to be undefined when crossing allocation boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean I would need to qualify that, for example, ptr::read
requires that its argument be valid for a read of at least size_of::<T>()
bytes? This is what I meant by overcomplicates, not that the concept itself wasn't necessary, but that stating it everywhere would be redundant.
Simply declaring that when the size of an access for *const T
is not explicitly stated, we mean valid for size_of::<T>()
bytes would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!
as per #51016 (comment), @gankro and I are basically done with this PR. It seems @RalfJung has done an actual review, but maybe not checked @ecstatic-morse 's latest work. @bors: delegate=ralfjung :) |
✌️ @RalfJung can now approve this pull request |
They can't actually be assigned to the issue since they're not in the org. |
src/libcore/ptr.rs
Outdated
//! | ||
//! There are two types of operations on memory, reads and writes. It is | ||
//! possible for a `*mut` to be valid for one operation and not the other. Since | ||
//! a `*const` can only be read and not written, it has no such ambiguity. For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the types mean much more than they should, I think. One can cast freely and in safe code between *mut
and *const
. For all intents and purposes, these are the same type; *const
just has a "lint" associated with it that you cannot write through it.
src/libcore/ptr.rs
Outdated
//! a `*const` can only be read and not written, it has no such ambiguity. For | ||
//! example, a `*mut` is not valid for writes if a a reference exists which | ||
//! [refers to the same memory][aliasing]. Therefore, each function in this | ||
//! module will document which operations must be valid on any `*mut` arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this last sentence. Do you mean something like, "each operation will document which *mut
arguments it requires to be valid"?
src/libcore/ptr.rs
Outdated
//! | ||
//! For more information on the safety implications of dereferencing raw | ||
//! pointers, see the both the [book] and the section in the reference devoted | ||
//! to [undefined behavior][ub]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can actually spell out the requirements besides aliasing here. As in, can't we start this section with the stuff we know -- that the range covered by [pointer, pointer+size)
must be part of a single allocated memory block, and that the size is implicitly given by the type when not stated explicitly? Then we can say that there are additional requirements about aliasing, and that those rules are not settled yet, referencing the nomicon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some axioms regarding validity, casting a reference to a pointer results in a valid pointer, pointers to ZSTs are valid, that sort of thing, and linked to offset
as a proxy for the single allocated memory block requirement. I also called out aliasing in that final paragraph.
Sorry for the long silence! I have not carefully reviewed every single method doc, but focused on the notion of "validity". Beyond the comments above, I am missing a discussion of null pointers. I think you should do one of the following:
In particular, I don't think zero-sized accesses are that complicated and I think they are relevant enough to be mentioned here. We are, I think, making them maximally undefined (as in, even zero-sized accesses have to be non-NULL and aligned), so we can always weaken that later if people complain. I prefer the first variant because I like the fact that this makes validity hold trivially for zero-sized accesses. However, it leads to non-NULLness being repeated in plenty of places, so others might prefer the second variant. ;) |
Sorry, I've been busy lately. It's a holiday in the U.S., so I'll pick this up again today. |
Also rewrites the reads/writes section to be less reliant on `*const`, `*mut`
I went with the second variant. I should have mentioned this several comments ago, but I don't believe that null pointers need to be a special case. Basically, we have two axioms, all pointers are valid for reads and writes of size zero, and a null pointer is never valid. These two axioms intersect when you have a null pointer to a ZST. Eventually, Rust will need to decide which axiom supersedes the other. However, I don't think this PR is the place to settle this question, and propagating the non-null requirement everywhere is more committal than I'd like. Additionally, I think it's actually clearer to say "a null pointer is never valid" and "all pointers (except a null pointer) are valid for operations of size 0", than introduce "non-nullness" as an orthogonal concept to validity. |
Fine for me. This is also the conservative choice, I think (allowing fewer things). |
//! memory][aliasing]. The set of operations for which a pointer argument must | ||
//! be valid is explicitly documented for each function. This is not strictly | ||
//! necessary for `*const` arguments, as they can only be used for reads and | ||
//! never for writes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the content of this, but the way this is phrased It may be just me but reading this paragraph leaves me pretty confused. For example, the "set of operations" can not be larger than 2, ever; and I also never find myself wondering "what is the set of operations that this pointer is valid for". I rather wonder "can I read through this"? Also, I do not think it is possible for a pointer to be valid for writing, but not valid for reading -- so "any combination" is not really possible.
So, I'd frame this entire thing by starting out saying there are two notions of validity: Valid for reading, and valid for writing. Every pointer operation specifies which kind of validity it requires. I'd put all of that into the first paragraph, and not talk about how Rust (typo: rust) doesn't yet have a fully defined model there. So, the first paragraph sets the stage and introduced the distinction between read and write validity, and does nothing else.
The second paragraph can then lead by saying that unfortunately, Rust doesn't yet have a fully defined memory model and hence the precise set of rules is not known yet. However, there are some things we do know -- e.g., if an &mut
exists and is not currently reborrowed (that restriction is important), then any other aliasing pointer is not valid for reading nor writing. And so on.
It might even be better to move this paragraph down; the sentence right before your enumeration also touches the same topic so that would flow nicely.
//! are a few rules regarding validity: | ||
//! | ||
//! * The result of casting a reference to a pointer is valid for as long as the | ||
//! underlying object is live. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say something like "a pointer obtained by casting a reference"; had to read this two times to parse it. ;)
Also, this is not true. The pointer obtained from the cast is valid immediately after the cast. However, using the reference again will invalidate the pointer, and other things may also invalidate it. For example:
fn foo(x: &mut u32) {
let y = x as *mut;
*x = 13;
// Now y is NOT valid any more because `x` got "re-activated"
}
Ping from triage @ecstatic-morse! It's been a while since we heard from you, will you have time to work on this again? |
Thank you for this PR @ecstatic-morse! Unfortunately we haven't heard from you on this in a while, so I'm closing the PR to keep things tidy. Don't worry though, if you'll have time again in the future please reopen this PR, we'll be happy to review it again! |
I might take over this PR, if I find time. But don't let that stop you, @ecstatic-morse! |
Mostly for my own future reference -- a relevant issue in LLVM's bugtracker: https://bugs.llvm.org/show_bug.cgi?id=38583 |
Rewrite docs for pointer methods This takes over #51016 by @ecstatic-morse. They did most of the work, I just did some editing. However, I realized one problem: This updates the docs for the "free functions" in `core::ptr`, but it does not update the copies of these docs for the inherent methods of the `*const T` and `*mut T` types. These getting out-of-sync is certainly bad, but I also don't feel like copying all this stuff around. Instead, we should remove this redundancy. Any good ideas?
Addresses #29371 and #36450.
This is a continuation of my work in this PR, which was merged prematurely.
The latest changes address all of @gankro's concerns except the one regarding ordering of invariants for
copy_nonoverlapping
. The fact that memory regions must be nonoverlapping depends on the definition of the two memory regions above, and is also repeated in the function summary.@gankro and @rkruppe pointed out that we don't have rigorously defined semantics for uninitialized memory, so all mention of it has been removed. The nomicon implies that the functions in
std::ptr
work with uninitialized memory, so requiring memory that is to be read be initialized is not an invariant. This conflicts with a strict interpretation of the reference, but resolving that is outside the scope of this PR.Before this is merged, I need to resolve the following issues:
drop_in_place
is equivalent to callingread
and discarding the value. Is this correct?write_bytes
andswap_nonoverlapping
require properly aligned pointers?swap_nonoverlapping
Update docs for the unstable unsafe pointer methods RFCWill be done later.In the meantime, I'm in desperate need of feedback. I'm also hosting a rendered version of my changes to make reviewing easier.
r? @steveklabnik