Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document current justification for not requiring recursive reference validity (in particular, &mut uninit not being immediate UB) #346

Open
CAD97 opened this issue Jul 5, 2022 · 19 comments

Comments

@CAD97
Copy link

CAD97 commented Jul 5, 2022

This post is a draft of my understanding of the problem space, and justification for &mut uninit not being UB to hold or pass around between functions.

What is UB?

This is a very brief summary; see the glossary, various blog posts, and #253 for more.

In short, UB is a language-level contract that some situation does not happen. Formally speaking, a program cannot "have UB;" UB is a property of some execution resulting in some behavior considered undefined. Note that this is the formal meaning of undefined; encountering UB retroactively removes any and all guarantees about the program execution1.

Depending on who you ask, UB in C++ may have originally been about allowing implementation-defined behavior and implementations to diverge on how they implement the language. Even if this is the case, though, every commercial C++ compiler uses UB under the modern understanding for optimization, and a language without UB is one that is very difficult if not impossible to optimize2. And more importantly, Rust is not C++.

With the benefit of hindsight from the experience of C/C++ and other language design in the past 50 years, Rust takes a much more deliberate approach to UB. In particular:

  • UB should be detectable. It should be practical to write a perfect sanitizing implementation of the Abstract Machine which can say with certainty whether a specific execution is defined (did not attempt to execute UB).
  • UB should be justified. All other things being equal, it is better for more programs to be defined, because we want it to be reasonable to write programs without UB. As such, making some operation UB should be backed by the properties learned about the program outweighing the cost of developers having to manually prove that invalid operations do not occur.
  • UB should be operational. This ties into the previous two points; an axiomatic assertion of some property prevents clear diagnosis of UB and doesn't serve the language-provided-guarantee property of promising that some set of operations are invalid and will not occur.

Why isn't &mut uninit currently considered UB by Miri?

In short: because it does no operation that is undefined. Expanding on that a little:

  • Validity of memory (e.g. that bytes are not undefined) is asserted when the AM does a typed copy of the memory from one place to another place.
    • The memory making up the reference itself is asserted to be initialized, non-null, and aligned.
  • Validity of borrowing is asserted when references are considered "used."
    • This includes at least when the reference is converted into a place (dereferenced) and when a referenced is used as a function argument, as well as when a function taking the reference returns.
    • Until the memory at the referenced place undergoes a typed copy, its validity is not asserted.

Additionally, writing to an uninitialized place is allowed as this consists of

  • running the drop glue for the place at the given type,
    • If the type does not have drop glue, this trivially does not do a typed copy from the place.
  • then doing a typed copy of the value to write into the place,
  • neither of which assert the validity of the preexisting memory.

Note, however, that writing uninit into &mut init is always UB. This is because this does a typed copy of the written value, which asserts that the written value is initialized.

Why would &mut uninit be considered UB?

There are two operational ways that references to uninitialized memory could be made operationally UB: in borrow validation or during conversion between references and places.

  • Borrow validation: in addition to the retag operations, the memory validity of the place would be asserted.
  • Ref-to-place: whenever a reference is converted into a place (dereferenced), the memory validity of the place would be asserted.
  • Place-to-ref: whenever a place is converted into a reference, the memory validity of the place would be asserted.

But how much memory validity? The easy answer is full memory validity at the referenced type; the minimal answer for the desired property is just that the memory is initialized. However, checking bytes are initialized still requires full type information to know which bytes are potentially padding and thus are allowed to be uninitialized, so full memory validity is simpler to check and cost us nothing extra on the implementation.

There are subtle differences to the properties derivable from when exactly memory validity is asserted, but the purpose of this document is to discuss the fundamental reasons for/against using any of them generally.

What otherwise valid programs are made UB?

There's at least two notable losses, one obvious, and one not so obvious.

The obvious one is just any program using a type like &mut [u8] to reference potentially-uninitialized memory. Many existing implementations of io::Read are written to carefully avoid reading the provided buffer before writing to it, such that they might be used to read into an uninitialized buffer. There is an existing accepted RFC allowing for safely reading into an uninitialized buffer3, but it would be very unfortunate to make nearly all existing code unsound.

The less obvious one is with pointers. Writing into an uninitialized place (via = assignment expression) becomes UB, even if that place is behind a raw pointer. This is because the drop glue semantically calls std::ptr::drop_in_place(&mut place), creating a mutable reference to the place. You can potentially recover writes to places not asserting memory validity of the place by semantically only creating the reference if the place's type has drop glue, but this has further complications around generics (as MIR is produced for the generic function in polymorphic form, and ptr::drop_in_place must be called there). It is perhaps a better idea to use ptr::write for writing into uninitialized memory anyway, but this adds an additional subtle pitfall to what are supposed to be raw pointers with mostly C-like semantics (so long as you don't create any references).

What benefit is to be gained?

What &mut uninit being UB would theoretically provide is that references could be marked as "dereferencable(noundef N)" where they are currently marked dereferencable(N) in the LLVM backend. Pointee memory validation during borrow validation would likely be enough to justify this; validation, and ref-to-place or place-to-ref time could be enough for "dereferencable_on_entry(noundef N)" if reborrowing for a function argument counts as doing a ref-to-place-to-ref conversion (how you would write it in source, &*ref).

However, there is at the time of writing no known optimization benefit to eagerly marking references as pointing to known-init memory, neither practical nor theoretical. This is due to a simple observation: when the memory is read by the source program, it then undergoes a typed copy which asserts that it is memory valid.

So the only potential optimization lost is speculative reads. However, we already justify that references must be dereferencable by the borrow validity rules, so it is perfectly fine to speculatively read memory from behind a reference. It is even valid to make decisions based on the value before it is semantically guaranteed to be read by the source program, so long as the speculative execution can deal with speculation being driven by uninit (e.g. by freezeing it to an arbitrary-but-consistent noundef byte value).

So if there's no optimization benefit to eagerly checking for references pointing to uninit memory, the benefit is solely in diagnosing ill-formed programs. By eagerly checking, the existence of references-to-uninit can be diagnosed when they are created rather than when the uninitialized memory is read4, properly blaming the creator of the reference rather than the place just doing safe reads of a safe reference.

However, this "victim blaming" is actually fundamental to how Miri works to diagnose operational UB. Miri does not (and cannot) understand your library's safety preconditions. The only thing Miri diagnoses is when the code violates the conditions of the Abstract Machine (exhibits UB), and will point at the point where the violation happened as the culprit, even if the bug in the code is instead a far-removed violation of a library's contract invoking "library UB5". Miri only cares about and diagnoses whether a specific execution of a specific complete compilation graph encounters language UB, and using this to show soundness is left to careful application of tests.

Why is this contentious?

It is the author's belief that &mut uninit is somewhat unique in the space of defined-but-unsafe Rust, in that this is a safety invariant on a otherwise very strict primitive type. Of the primitive types,

  • ! is always invalid.
  • () is always valid.
  • Simple numerics iNN, uNN, fNN, only have the trivial noundef validity invariant.
    • (mumbles mumbles provenance; IIUC the current plan is to strip on load and by-value-transmute?)
  • char, bool only have validity invariants.
  • [_; N] and [_] inherit all safety/validity invariants from the contained type.
  • *const _ and *mut _ only have the trivial noundef validity invariant.
    • (mumbles mumbles provenance; IIUC the current plan is to angelically have provided the correct provenance if used?)
  • str has officially been decided to have the validity invariant of [u8] and for valid UTF-8 to be a safety invariant.
    • However, this decision is still somewhat contentious,
    • and technically the decision was made for &str and not str-by-value.
    • The author posits that the reason this is contentious is that it means the str primitive has a nonempty safety invariant
    • and perhaps the resolution to this is to say str isn't (semantically) a primitive, it just looks like one, but is actually just struct str([u8]).
  • Box is... special, but generally not considered as a primitive type outside of the compiler.

References thus are special among primitive types in that they

  • have a nontrivial memory validity invariant,
    • noundef, nonnull, aligned
  • have complicated borrow validity requirements, and
    • see: Stacked Borrows retags
  • have a safety invariant that the pointee is both memory-valid and safe.

This is likely unavoidable, as memory validity being shallow / not following references is itself a very desirable property, both for reasoning about unsafe code and for implementing the sanitizer. But I think this can be resolved as solely a teaching problem. The answer to "can a reference point to uninitialized memory" should be "no*, use MaybeUninit or a wrapper of it," where the asterisk is "unsafe can break the rules, but unless you break the rules, you don't have to worry about it." I think in many ways many people are too eager to be correct to remember when it's okay and even better to put forward a simplifying lie, and then refine later as necessary.

Having an operational model of "what you can do" with unsafe is important for being able to reason about unsafe code and to write complicated unsafe code. But for teaching unsafe, it's almost certainly better to stick to relaxing the safe rules for a significant period.

TL;DR

It is the author's conclusion that:

  • References to uninit primitives are clearly nonproblematic to allow in the opsem.
  • Retags of enums need to do a read if we want "active variant" MaybeUninit tracking.
  • Teaching proper "choose your weakenings" use of unsafe is difficult.

Footnotes

  1. The smallest possible caveat may informally apply here: external synchronization via observable effects. At each point the Abstract Machine does something observable outside of the Abstract Machine (i.e. does FFI e.g. IO), the observable state of the Abstract Machine must be in the state defined by the execution to this point. The only way for UB to retroactively unguarantee the already observed behavior is if it is not a valid execution to stop the AM at the observable point before the UB occurs. This is, however, merely an informal argument; the guarantee does in fact no longer exist, it is just that there is no known way for a compiler to take advantage of this. However, note that not all behavior you may expect to be externally observable necessarily is, and neither is all behavior implemented via FFI. So long as the Abstract Machine has a definition of the operation which does not leave the AM, it is not considered externally observable. The canonical example of this is that while allocation is implemented by calling into the host OS, the AM has an internal definition of allocation and an implementation may arbitrarily call the OS allocation APIs in any defined manner.

  2. You may disagree here, and point to scripting languages like ECMAScript or even Safe Rust as languages that can be optimized while not having UB. But the insight here is that they still have UB; the difference is solely that the UB is statically prevented from happening in the surface language. As soon as you go to lower the higher-level language to another target, the set of syntactically valid possibilities extends beyond that of the surface language, and any operation which would be forbidden in the surface language is UB.

  3. In general, you should prefer using types like &mut [MaybeUninit<u8>] rather than &[u8] for writing into potentially uninitialized memory. Even if the existence of the reference is not in and of itself UB, it is still wildly unsafe, and using types that allow and potentially track uninitialized memory much better describes the semantics of your program and prevent accidentally exposing references to uninit to downstream code (which is still unsound).

  4. It is for this reason that if one of the measures for making &mut uninit UB eagerly were to be taken, the author suggests putting the check on place-to-ref conversions.

  5. We say an execution's use of a library API exhibits "library UB" if it violates the documented preconditions of the library functions. If "library UB" is caused, the library has full "permission" to cause language-level UB at any later point. The program's behavior is still defined unless language UB is encountered.

@CAD97
Copy link
Author

CAD97 commented Jul 5, 2022

This writeup is also relevant, as is the tower of weakenings: it could be a useful position to say that memory validity is checked when retagging (insert a read on Retag AM operations; my asserting memory validity on borrow validation) at the clean machine-checked level, but offer the standard real-world relaxation that this is actually sometimes not immediate UB for {some|all} types, roughly the same way Miri handles the difference between strict and permissive provenance.

@JakobDegen
Copy link
Contributor

So I'm going to contest the claim that there's no known optimization benefit to this. The code below gives an example. It uses & instead of &mut but presumably the intent is that this should apply there equally. The code is nearly identical to code I gave in the write-up linked above, although the motivation is different.

enum E {
    A(UnsafeCell<u8>),
    B(u8),
}

let b = E::B(0);
opaque_func(&b);
assert_eq!(b, E::B(0));

The goal is to optimize out the assert. However, it is unclear how to justify this optimization. Specifically, we pass a &E to the opaque function, but we must somehow know that if the "active variant" of x is A, then the passed reference has write permission to the second byte of the enum, but if the active variant is B, then it only has read permission. The obvious way to implement this is to read from memory and check every time we do a retag operation.

This necessarily means that retagging (which happens basically every time a reference is copied) must assert validity conditions in at least some cases. We have some control over when this happens though. For example, there's no need to assert validity when retagging a &u8.

There are alternatives to this model as well - for example, we could try and adjust the aliasing model to make this unnecessary, by having some rule like ("retagged references get the same permissions as their parents" or such). It's unclear how precisely to do this though - a couple naive ideas I've thought of don't work - and so this would probably need quite some investigation.

cc @joshtriplett who suggested that we should commit to allowing invalid memory behind references. Given the complexity here, I don't think this is such a good idea right now.

@saethlin
Copy link
Member

saethlin commented Jul 8, 2022

It should be practical to write a perfect sanitizing implementation of the Abstract Machine which can say with certainty whether a specific execution is defined (did not attempt to execute UB).

Maybe I'm just being too picky, but I think this wording is way too strong. Stacked Borrowed with Untagged and Stacked Borrows with Wildcard both miss UB (or at least I'm sure the latter does, the former is just so strange that maybe it just prescribes wacky semantics). So with the best checker we have now you can choose a model farther up the tower, thus with false positives for UB, or the "actual" model with false negatives. I'm not sure there is a path forward to a perfect sanitizing checker like you describe.

str has officially been decided... ...However, this decision is still somewhat contentious,

Is it really decided if it is still being contended? I don't want to become the guy who argues about str, I'm just not sure this adds value to your summary in light of the number of asterisks you have to add. If you want an example of safety but not validity requirements, maybe Vec is a good choice?

@CAD97
Copy link
Author

CAD97 commented Jul 8, 2022

Note that str was mentioned because it is a primitive. The argument boils down to really just that "&_ is a primitive type with nontrivial safety invariants" may be harder to learn than "&_ is a primitive type with nontrivial invariants", and that the safe/valid invariant split is much more intuitive for non-primitive types (and in this fashion, str's UTF-8 wf invariant being a safety invariant being contentious helps the intended argument).

(Also, I use contentious maybe a little loosely in this case; roughly everyone now supports wf UTF-8 being not a compiler-required property, and the contention is more along the lines of primitives having a language-prescribed safety invariant at all.)

I added a clarification to the OP to that point.

@CAD97
Copy link
Author

CAD97 commented Jul 8, 2022

The above clarification has led me to an interesting realization: &_'s Retag-derived invariants might be considered somewhere between the "usual" validity and safety invariants.

Roughly speaking, validity invariants are enforced on MIR _2 = move _1. Safety invariants are environmentally relied on. But the borrowing rules are enforced on MIR Retag(_2). These two can be assumed equivalent iff doing a typed copy at reference type (thus asserting validity invariants) always performs a retag (thus asserting borrow invariants), but if it's possible to do a typed copy without a retag (e.g. with the copy_nonoverlapping intrinsic?), then these are distinct.

trying to get typed copy without retag

[godbolt]

Rust

#![feature(intrinsics, raw_ref_op)]

// NB: std::intrinsics doesn't expose the raw intrinsic; link it here
extern "rust-intrinsic" {
    pub fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize);
}

pub fn example_retag(_1: &i32) {
    let mut _2 = _1;
    let _3 = unsafe {
        copy_nonoverlapping(
            &raw const _1 /* _4 */,
            &raw mut _2 /* _5 */,
            1,
        )
    };
}

MIR -Zalways-encode-mir -Zmir-emit-retag -Zmir-opt-level=0

fn example_retag(_1: &i32) -> () {
    debug _1 => _1;                      // in scope 0 at /app/example.rs:8:22: 8:24
    let mut _0: ();                      // return place in scope 0 at /app/example.rs:8:32: 8:32
    let mut _2: &i32;                    // in scope 0 at /app/example.rs:9:9: 9:15
    let mut _4: *const &i32;             // in scope 0 at /app/example.rs:12:13: 12:26
    let mut _5: *mut &i32;               // in scope 0 at /app/example.rs:13:13: 13:24
    scope 1 {
        debug _2 => _2;                  // in scope 1 at /app/example.rs:9:9: 9:15
        let _3: ();                      // in scope 1 at /app/example.rs:10:9: 10:11
        scope 2 {
            debug _3 => _3;              // in scope 2 at /app/example.rs:10:9: 10:11
        }
        scope 3 {
        }
    }

    bb0: {
        Retag([fn entry] _1);            // scope 0 at /app/example.rs:8:1: 17:2
        StorageLive(_2);                 // scope 0 at /app/example.rs:9:9: 9:15
        _2 = _1;                         // scope 0 at /app/example.rs:9:18: 9:20
        Retag(_2);                       // scope 0 at /app/example.rs:9:18: 9:20
        StorageLive(_3);                 // scope 1 at /app/example.rs:10:9: 10:11
        StorageLive(_4);                 // scope 3 at /app/example.rs:12:13: 12:26
        _4 = &raw const _1;              // scope 3 at /app/example.rs:12:13: 12:26
        Retag([raw] _4);                 // scope 3 at /app/example.rs:12:13: 12:26
        StorageLive(_5);                 // scope 3 at /app/example.rs:13:13: 13:24
        _5 = &raw mut _2;                // scope 3 at /app/example.rs:13:13: 13:24
        Retag([raw] _5);                 // scope 3 at /app/example.rs:13:13: 13:24
        copy_nonoverlapping(src=move _4, dst=move _5, count=const 1_usize); // scope 3 at /app/example.rs:11:9: 15:10
        StorageDead(_5);                 // scope 3 at /app/example.rs:15:9: 15:10
        StorageDead(_4);                 // scope 3 at /app/example.rs:15:9: 15:10
        _0 = const ();                   // scope 0 at /app/example.rs:8:32: 17:2
        StorageDead(_3);                 // scope 1 at /app/example.rs:17:1: 17:2
        StorageDead(_2);                 // scope 0 at /app/example.rs:17:1: 17:2
        return;                          // scope 0 at /app/example.rs:17:2: 17:2
    }
}

@JakobDegen
Copy link
Contributor

JakobDegen commented Jul 8, 2022

The trick there is that copy_nonoverlapping is not a typed copy :P . That being said, until very recently it was very easy to do a typed copy without a retag - fields of structs were not retagged when doing a copy at the struct type. So even wrapping your reference in a newtype avoided this. Ralf changed that a week or two ago though.

Edit: I have been informed by Saethlin that this is still off by default and so you should be able to recreate it

@CAD97
Copy link
Author

CAD97 commented Jul 8, 2022

The trick there is that copy_nonoverlapping is not a typed copy :P

c.f. #330 rust-lang/rust#63159 rust-lang/rust#97712

@RalfJung
Copy link
Member

@CAD97
Wow, what a writeup!

the minimal answer for the desired property is just that the memory is initialized

We never special case "(un)initialized" in the operational semantics, so I don't think we should do this here. The fact that uninit memory is UB at almost every type simply falls out of the general framework of validity invariants (which I say in turn falls out of "value representations").

This is because the drop glue semantically calls std::ptr::drop_in_place(&mut place), creating a mutable reference to the place.

Interesting. Miri does not run this conceptual call, and I had not thought of it as even existing -- I was thinking drop only gets called for types that actually need it, as part of Rust → MIR / MiniRust lowering. But as you say this leads to inconsistencies around generics. We should keep this in mind for when we come to specifying that lowering.

Also when you said "Writing into an uninitialized place becomes UB" I thought you meant ptr.write. Maybe clarify this to explicitly state that you mean a write via an assignment statement.

Having an operational model of "what you can do" with unsafe is important for being able to reason about unsafe code and to write complicated unsafe code. But for teaching unsafe, it's almost certainly better to stick to relaxing the safe rules for a significant period.

So are you suggesting that we should have -Zmiri-stricter-checks-for-teaching? And that we maybe should even enable it by default? Is being Miri-checkable a goal for the "teaching model"?

So far I only saw it as our task to figure out the actual authoritative spec of Rust, not to also figure out a separate, more restrictive set of rules that we present as "the model to teach to unsafe Rust authors by default". I am not opposed to having both of them, and for them to be different (as long as we never state any falsehoods), but I think we should clearly separate those discussions.

@JakobDegen

So I'm going to contest the claim that there's no known optimization benefit to this. The code below gives an example. It uses & instead of &mut but presumably the intent is that this should apply there equally. The code is nearly identical to code I gave in the write-up linked above, although the motivation is different.

Indeed this is an optimization we lose with Miri's current approach to #236. I was aware of that when changing this in Miri (Miri used to consider that program UB at some point) and considered it acceptable, and I still hold that position. YMMV. :)

Another example of this is rust-lang/rust#97161:

#[rustc_layout_scalar_valid_range_start(1)]
pub struct NonNull {
    pointer: *const (),
}

#[no_mangle]
pub fn use_nonnull(p: &NonNull) {
    std::hint::black_box(p.pointer);
}

Without requiring validity behind p, we never learn that p.pointer is nonnull.

However this is a rather special case that arises due to directly accessing a field of a rustc_layout_scalar_valid_range_start struct behind a reference. Using black_box({ let p = *p; p.pointer }) would allow a nonnull to be added. Stable code cannot even do this since only the stdlib defines rustc_layout_scalar_valid_range_start types and all their fields are private.

@saethlin

Maybe I'm just being too picky, but I think this wording is way too strong. Stacked Borrowed with Untagged and Stacked Borrows with Wildcard both miss UB (or at least I'm sure the latter does, the former is just so strange that maybe it just prescribes wacky semantics). So with the best checker we have now you can choose a model farther up the tower, thus with false positives for UB, or the "actual" model with false negatives. I'm not sure there is a path forward to a perfect sanitizing checker like you describe.

Yes, such is the tradeoff of permissive provenance. But that is a very special case, and one that we have realistic hopes of restricting to a small set of (hopefully mostly legacy) code through the promotion of Strict Provenance APIs.

It is something very different to have practically uncheckable UB on an operation that is used all the time by everyone.

That said, I am not sure how bad the impact of checking validity behind references would be. If we only go one level down, it might be feasible to check this. But just conceptually I don't think it is the right thing to do. Validity is not something that is artifically added by "checking" something, it falls out entirely naturally because we round-trip some bytes through a higher-level representation and back.

@CAD97

Note that str was mentioned because it is a primitive. The argument boils down to really just that "&_ is a primitive type with nontrivial safety invariants" may be harder to learn than "&_ is a primitive type with nontrivial invariants", and that the safe/valid invariant split is much more intuitive for non-primitive types (and in this fashion, str's UTF-8 wf invariant being a safety invariant being contentious helps the intended argument).

The full invariant of &mut is hideously complicated. I don't think we can make it a validity invariant, it is not even operational! And the full invariant for & is even worse -- the safety invariant for &T is defined basically arbitrarily by T. It is, for example, very different for &i32 and &Cell<i32>, but also very different between &Cell<i32> and &AtomicI32.

So, there is just no chance that for reference types, their safety invariant and validity invariant will be the same. Sorry.

Roughly speaking, validity invariants are enforced on MIR _2 = move _1. Safety invariants are environmentally relied on. But the borrowing rules are enforced on MIR Retag(_2).

Indeed that is also how I think about it. (And dereferenceable is part of the borrowing rules.)

And yeah I have not entirely made up my mind whether I think each typed copy should be a retag. But my current inclination is no, that just seems like a too big operation. I am curious what others think about this. :)

@Lokathor
Copy link
Contributor

Is being Miri-checkable a goal for the "teaching model"?

Absolutely. Despite potential the false positives and false negatives, beginners and experts alike use miri on a regular basis even when just showing examples to each other.

@CAD97
Copy link
Author

CAD97 commented Jul 12, 2022

This is because the drop glue semantically calls std::ptr::drop_in_place(&mut place), creating a mutable reference to the place.

Interesting. Miri does not run this conceptual call, and I had not thought of it as even existing -- I was thinking drop only gets called for types that actually need it, as part of Rust → MIR / MiniRust lowering. But as you say this leads to inconsistencies around generics. We should keep this in mind for when we come to specifying that lowering.

There's been very recent discussion on Zulip that shows that I'm over simplifying here, that I just became aware of. Specifically, it actually is the case that Drop in MIR is semantically different from drop_in_place.

In safe code, this is easy to show:

let complex_type = ( Some, Nontrivial, Data );
let moved_out = complex_type.0;
// drop glue for complex_type allows partially deinitializing it and only drops the other stuff

but there's a more subtle difference, specifically w.r.t. generics:

unsafe fn drop_uninit<T>() {
    let mut x = std::mem::zeroed::<T>();
    let p = addr_of_mut!(x) as *mut MaybeUninit<u8>;
    p.write(MaybeUninit::uninit());
    // Drop(x) <- current MIR
    // drop_in_place(&mut x) <- proposal
}

fn main() {
    unsafe { drop_uninit::<bool>(); }
}

Miri currently accepts this code, as MIR Drop(x) is more like if perfect_needs_drop::<typeof(x)>() { drop_in_place(&mut x) }. (At least in a world where place-to-ref requires place byte validity, which is not what Miri currently implements.)

This is interestingly related to invalid not-used-again values (#84). Drop glue is I think at the moment perhaps underspecified; I personally think it would be cleaner to say that drop elaboration always calls drop_in_place (only on fields if partially initialized), and inlining happens from there, rather than only calling drop_in_place on "necessary" locals. (Of course, #[may_dangle] is required to drop_in_place locals with invalidated lifetimes/loans, and drop_in_place can't actually be written for such drop glue.) Specifically, this contains the "what does dropping a type mean" question to just the drop_in_place intrinsic, and local drop glue doesn't have to care (except just a little bit to handle partial initialization).

An alternative formulation would be to say that the MIR Drop operation always requires a byte-valid place. I think this is strong enough to allow replacing MIR Drop with drop_in_place in all cases.

So are you suggesting that we should have -Zmiri-stricter-checks-for-teaching? And that we maybe should even enable it by default? Is being Miri-checkable a goal for the "teaching model"?

Yes; for the same reason having Miri able to check the AM opsem is desirable, it is also beneficial to have a machine-checkable mid-level subset. This is effectively what safe rust is; the statically enforceable model. I think it would be beneficial to have a mid-level subset to allow incrementally allowing unsafe superpowers, because the less you allow, the more confident you can be in API soundness. The first such relaxation (from safe Rust) would be purely allowing calling pure-precondition unsafe upstream functions, but dynamically verifying that the preconditions of such are followed. (And I hope this can be the level 80+% of unsafe code authors can stay at!)

So far I only saw it as our task to figure out the actual authoritative spec of Rust, not to also figure out a separate, more restrictive set of rules that we present as "the model to teach to unsafe Rust authors by default". I am not opposed to having both of them, and for them to be different (as long as we never state any falsehoods), but I think we should clearly separate those discussions.

I do agree that whatever this "teaching model" is, it's a fully distinct question from the AM opsem. But it is also clearly relevant to this specific choice about &undef, because the (intended) thesis of my argument is that the primary benefit of forbidding &undef is in diagnosing broken expectations earlier, and not to the opsem (unsafecell tracking notwithstanding).

That said, I am not sure how bad the impact of checking validity behind references would be. If we only go one level down, it might be feasible to check this. But just conceptually I don't think it is the right thing to do. Validity is not something that is artifically added by "checking" something, it falls out entirely naturally because we round-trip some bytes through a higher-level representation and back.

This is true w.r.t. the AM opsem, but to a developer it does act more like the AM asserting certain properties at defined points. (e.g. the Retag operation does follow more closely a if !everything_good() { throw_ub!() } pattern.)

I do think that retag operations (or place operations) asserting byte-validity of the place is at least a reasonable opsem, and that validity being one level deep in this way is at least as teachable as it being zero levels deep.

The full invariant of &mut is hideously complicated. I don't think we can make it a validity invariant, it is not even operational! [...] So, there is just no chance that for reference types, their safety invariant and validity invariant will be the same. Sorry.

Aren't the &mut rules definitionally operational? To be explicit: I'm not (intending to) suggesting that &mut should be strictly validity invariants. I'm more just observing that it is significantly complicated, and to a first order of approximation, it is easier to write user code around axiomatic "this is always true" than operational rules. That this is the case is likely at least partially the reason existing language specs use much more axiomatic language than operational. But as soon as you get past that first 90% or so the balance swings aggressively towards preferring opsem because when doing anything interesting it's easier to show conformance to the operational model than an axiomatic one; axiomatic guarantees are a useful high-level abstraction but fall off when managing minutia.

And yeah I have not entirely made up my mind whether I think each typed copy should be a retag. But my current inclination is no, that just seems like a too big operation. I am curious what others think about this. :)

I think w.r.t. teaching, asserting byte-validity and Retag in the same set of cases is better. This is part of (if not just entirely) why retagging struct fields is potentially desirable.


And just apologies again for the brain-dump format writeup. If I had more time I'd've written a shorter letter etc.

@CAD97 CAD97 changed the title Document justification for &mut uninit not being immediate UB Document current justification for &mut uninit not being immediate UB Jul 12, 2022
@RalfJung
Copy link
Member

RalfJung commented Jul 12, 2022

I personally think it would be cleaner to say that drop elaboration always calls drop_in_place (only on fields if partially initialized), and inlining happens from there, rather than only calling drop_in_place on "necessary" locals.

Agreed, that would be rather clean. It is also not called on variables that have been moved out of (both fully and partially moved).

/me briefly ponders the interaction of partial initialization and rustc_layout_scalar_valid_range_start, then decides not to.

An alternative formulation would be to say that the MIR Drop operation always requires a byte-valid place. I think this is strong enough to allow replacing MIR Drop with drop_in_place in all cases.

Yeah but then we'd have to find a way to preserve all those Drop for Miri. Currently I think many get removed pretty early on, in drop elaboration.
Also if we combine that with "retagging whenever we check validity [aka on every typed copy]", we have a disaster, since references are often "dropped" way after they stopped being alias-valid.

I think it would be beneficial to have a mid-level subset to allow incrementally allowing unsafe superpowers, because the less you allow, the more confident you can be in API soundness.

We have to be careful here since this is not monotone. The less you allow the environment/client/caller to do, the more you can do yourself.

I'm more just observing that it is significantly complicated, and to a first order of approximation, it is easier to write user code around axiomatic "this is always true" than operational rules. That this is the case is likely at least partially the reason existing language specs use much more axiomatic language than operational. But as soon as you get past that first 90% or so the balance swings aggressively towards preferring opsem because when doing anything interesting it's easier to show conformance to the operational model than an axiomatic one; axiomatic guarantees are a useful high-level abstraction but fall off when managing minutia.

I am not opposed to axiomatic semantics, as long as they are proven sound against an operational ground truth. ;)

I do think that retag operations (or place operations) asserting byte-validity of the place is at least a reasonable opsem, and that validity being one level deep in this way is at least as teachable as it being zero levels deep.

I think it has some serious problems. Consider this example:

// This function can be called with `rc` and `unique` pointing to
// overlapping regions of memory.
fn evil_alias<T>(rc: &RefCell<T>, unique: &mut T) {}

fn proof<T>(x: T) {
    let rc = RefCell::new(x);
    let mut bmut = rc.borrow_mut();
    evil_alias(&rc, &mut *bmut);
}

Now when evil_alias is called, to validate that rc points to a valid RefCell<T>, we have to read the T in there. But that means we are reading from memory that unique points to, with a different pointer, which violates the uniqueness assumptions of &mut!

So, at least inside UnsafeCell, we cannot require validity recursively.

@steffahn
Copy link
Member

Regarding optimization, I feel like there’s some potential gains from knowing that &! is uninhabited. Allowing & uninit means that &! is no longer uninhabited. The shallow “the memory behind the reference is not uninitialized” property would fix that particular case, but that doesn’t transfer well: Regarding the necessary properties to be able to deduce that not only &! but also &&! and &&&! are uninhabited, too, all that is necessary would be the property that for a &T to be valid, T needs to be inhabited, and then this “inhabited”ness property transfers through references, too.

@RalfJung
Copy link
Member

RalfJung commented Jul 12, 2022

Regarding optimization, I feel like there’s some potential gains from knowing that &! is uninhabited.

Yes, and for that reason Miri (and MiniRust) carve out a special case of reference to uninhabited types; for Miri that happens here, for MiniRust here. Note that this can be checked without even looking at the pointed-to memory, which makes this somewhat different from validating, say, that &bool points to 0x00 or 0x01.

This then in turn lets us declare &! itself uninhabited (not sure if rustc actually does that), and &&!, and so on.

@CAD97
Copy link
Author

CAD97 commented Jul 12, 2022

The &UnsafeCell example (which I've seen before I know) does show that a simple read-on-retag isn't possible.

My current position is I think the rough consensus: retag does not do a read, and &uninit is fine in the opsem.

I could go either way on &!; both positions are logically consistent, either that since ! is uninhabitable there's no valid reference or that &uninit is a byte-valid reference independent of what the pointee type is. Given ! already isn't powerful enough to delete enum variants when its alongside another non-ZST type1, it doesn't seem unreasonable that the same holds for when it's covered by a reference.

&enum { !Freeze, Freeze } is the remaining (opsem) reason to do some amount of read on retag, namely to extract the variant and track UnsafeCell more granularly (and Miri used to do this). However, I think I agree that UnsafeCell being infectious is the simpler (and thus likely better) opsem (and makes it behave like PhantomPinned does today2). UnsafeCell just being a barrier makes programmer modular reasoning easier3... but I think that falling out of safety rules is enough?

It's also worth noting that the canonical example of passing &enum::Freeze can be expressed in the type system if/once we have enum variant types. Doing refinement typing to get the guarantee automatically might be impossible4, but allowing the developer to express the guarantee themselves (IIUC it works so long as we have &enum::Freeze <: &enum and retag-for-argument happens before the upcast coercion) lessens the need to do so automatically.

(Adide: I'm apparently fond of byte-valid[ity] for clarifying that it's just about the value representation.)

This then in turn lets us declare &! itself uninhabited (not sure if rustc actually does that)

Not currently (and this is a point of vocabulary contention between "byte inhabited" and "safely inhabited" being distinct concepts... but this (or something very closely related) also occurs for e.g. (!, u8) which is not safely inhabited but does have size; I guess the difference is that it's invalid to do a typed copy of a partially initialized (!, u8), so this ties again into &_'s typed-copy-validity and retag-validity being distinct).

Footnotes

  1. It might be possible to still delete the second variant in e.g. enum E { A(u64), B(!, u64) } if we say it's not possible to predict/set a #[repr(Rust)] enum discriminant except by writing the entire enum, but this has two caveats arguing against increasing !'s power this way: enum { A(u8), B(!, u64) } still probably needs to be at least u64 big, and enum variant types means the initial necessary condition doesn't quite hold since now you can ptr::write in MaybeUninit<E::B>.

  2. Not saying that this works but it'd be interesting if the opsem worked out such that PhantomPinned = UnsafeCell<()> up to the type system.

  3. And for that same reason having PinnedCell/UnsafeMutCell/etc is beneficial for modular reasoning.

  4. I have a comment somewhere on one of the issues about factors of this illustrating that pointers tracking the refined variant type they're "actually" pointing at isn't really tractable, but I couldn't find it quickly. It may be tractable to track for some through-unique place assignment (maybe just stack locals).

@RalfJung RalfJung changed the title Document current justification for &mut uninit not being immediate UB Document current justification for not requiring recursive reference validity (in particular, &mut uninit not being immediate UB) Jun 6, 2023
workingjubilee added a commit to workingjubilee/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, so LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states. If using `&mut [u8]` to write uninitialized
memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it
could even result in safe code incorrectly reading uninitialized memory!

To guarantee program soundness, code using nontemporal stores must
currently use sfence in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and a motivation to choose lowerings that do not require explicit sfence.
workingjubilee added a commit to workingjubilee/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, so LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states. If using `&mut [u8]` to write uninitialized
memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it
could even result in safe code incorrectly reading uninitialized memory!

To guarantee program soundness, code using nontemporal stores must
currently use sfence in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and a motivation to choose lowerings that do not require explicit sfence.
workingjubilee added a commit to workingjubilee/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, so LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states. If using `&mut [u8]` to write uninitialized
memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it
could even result in safe code incorrectly reading uninitialized memory!

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
workingjubilee added a commit to workingjubilee/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states. If using `&mut [u8]` to write uninitialized
memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it
could even result in safe code incorrectly reading uninitialized memory!

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
workingjubilee added a commit to workingjubilee/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states, especially as the number of writes, thus
the number of write buffers that may begin retiring simultaneously,
thus the chance of them resolving in an unfortunate order, increases.
If using `&mut [u8]` to write uninitialized memory is permitted ( per
rust-lang/unsafe-code-guidelines#346 ), it could even result in an access
to `&[u8]` actually being reading uninitialized memory in safe code!

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
alexcrichton pushed a commit to alexcrichton/memset-nt that referenced this issue Aug 8, 2023
MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states, especially as the number of writes, thus
the number of write buffers that may begin retiring simultaneously,
thus the chance of them resolving in an unfortunate order, increases.
If using `&mut [u8]` to write uninitialized memory is permitted ( per
rust-lang/unsafe-code-guidelines#346 ), it could even result in an access
to `&[u8]` actually being reading uninitialized memory in safe code!

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
@scottmcm
Copy link
Member

scottmcm commented Nov 11, 2023

However, there is at the time of writing no known optimization benefit to eagerly marking references as pointing to known-init memory, neither practical nor theoretical. This is due to a simple observation: when the memory is read by the source program, it then undergoes a typed copy which asserts that it is memory valid.

I have one I find persuasive in rust-lang/rust#117800.

Eliminating short-circuiting requires that it be legal to read memory as !noundef even when it's only conditionally read -- conditionally because of the short-circuiting. This distinction means, for example, that equality for [u32; 2] and (u32, u32) have different opsem (https://rust.godbolt.org/z/a59r5b3o3) due to the former not short-circuiting. Was it a breaking change when we made equality for &[u32; 2] not short-circuit?

Interestingly, though, any case I can think of for this just cares about initializedness (so as not to produce poisons) but not validity. I don't know if it'd make it easier, but for example I think it'd be fine for this to allow &bool to be &4 in the opsem, just not &undef. In the LLVM sense, I think it needs the !nounint but if it lost the !range that'd be fine.

(I came here from https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Poison-ness.20behind.20references/near/401455323)

@RalfJung
Copy link
Member

Was it a breaking change when we made equality for &[u32; 2] not short-circuit?

No. It is library UB to call that function when the array is not fully initialized.

This is just not a transformation the compiler can make.

Interestingly, though, any case I can think of for this just cares about initializedness (so as not to produce poisons) but not validity. I don't know if it'd make it easier, but for example I think it'd be fine for this to allow &bool to be &4 in the opsem, just not &undef. In the LLVM sense, I think it needs the !nounint but if it lost the !range that'd be fine.

If you want !noundef reads now, then I am certain that in a few years someone will want reads with a suitable value range. IMO special-casing initializedness makes no sense.

@CAD97
Copy link
Author

CAD97 commented Nov 12, 2023

As @comex1 noted, the compiler actually should be able to turn (informal) ((*arr).0 == (*arr).0) && ((*arr).1 == (*arr).1) into ((*arr).0 == (*arr).0) & (freeze((*arr).1) == freeze((*arr).1)) (which should lower to the desired assembly) with just !dereferencable even without !noundef.

That said, it does seem like it would be beneficial to have some way to eagerly directly assert byte validity behind a reference, although based on the load-first impl not getting the better codegen it wouldn't help in this case. (forget(ptr::read(reference)) should do it IIUC but is at best heavily opaque in its purpose.) This general shape of optimization metadata showing up "too late" has shown up multiple times.

Footnotes

  1. there's \@ in the markdown, why did this make a link and presumably ping... sorry

github-merge-queue bot pushed a commit to bevyengine/bevy that referenced this issue Nov 14, 2023
(This is my first PR here, so I've probably missed some things. Please
let me know what else I should do to help you as a reviewer!)

# Objective

Due to rust-lang/rust#117800, the `derive`'d
`PartialEq::eq` on `Entity` isn't as good as it could be. Since that's
used in hashtable lookup, let's improve it.

## Solution

The derived `PartialEq::eq` short-circuits if the generation doesn't
match. However, having a branch there is sub-optimal, especially on
64-bit systems like x64 that could just load the whole `Entity` in one
load anyway.

Due to complications around `poison` in LLVM and the exact details of
what unsafe code is allowed to do with reference in Rust
(rust-lang/unsafe-code-guidelines#346), LLVM
isn't allowed to completely remove the short-circuiting. `&Entity` is
marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8
bytes -- and does so -- but it has to assume that the `index` might be
undef/poison if the `generation` doesn't match, and thus while it finds
a way to do it without needing a branch, it has to do something slightly
more complicated than optimal to combine the results. (LLVM is allowed
to change non-short-circuiting code to use branches, but not the other
way around.)

Here's a link showing the codegen today:
<https://rust.godbolt.org/z/9WzjxrY7c>
```rust
#[no_mangle]
pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool {
    a == b
}
```
ends up generating the following assembly:
```asm
demo_eq_ref:
        movq    xmm0, qword ptr [rdi]
        movq    xmm1, qword ptr [rsi]
        pcmpeqd xmm1, xmm0
        pshufd  xmm0, xmm1, 80
        movmskpd        eax, xmm0
        cmp     eax, 3
        sete    al
        ret
```
(It's usually not this bad in real uses after inlining and LTO, but it
makes a strong demo.)

This PR manually implements `PartialEq::eq` *without* short-circuiting,
and because that tells LLVM that neither the generations nor the index
can be poison, it doesn't need to be so careful and can generate the
"just compare the two 64-bit values" code you'd have probably already
expected:
```asm
demo_eq_ref:
        mov     rax, qword ptr [rsi]
        cmp     qword ptr [rdi], rax
        sete    al
        ret
```

Since this doesn't change the representation of `Entity`, if it's
instead passed by *value*, then each `Entity` is two `u32` registers,
and the old and the new code do exactly the same thing. (Other
approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect
this case.)

This should hopefully merge easily with changes like
#9907 that also want to change
`Entity`.

## Benchmarks

I'm not super-confident that I got my machine fully consistent for
benchmarking, but whether I run the old or the new one first I get
reasonably consistent results.

Here's a fairly typical example of the benchmarks I added in this PR:

![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1)

Building the sets seems to be basically the same. It's usually reported
as noise, but sometimes I see a few percent slower or faster.

But lookup hits in particular -- since a hit checks that the key is
equal -- consistently shows around 10% improvement.

`cargo run --example many_cubes --features bevy/trace_tracy --release --
--benchmark` showed as slightly faster with this change, though if I had
to bet I'd probably say it's more noise than meaningful (but at least
it's not worse either):

![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0)

This is my first PR here -- and my first time running Tracy -- so please
let me know what else I should run, or run things on your own more
reliable machines to double-check.

---

## Changelog

(probably not worth including)

Changed: micro-optimized `Entity::eq` to help LLVM slightly.

## Migration Guide

(I really hope nobody was using this on uninitialized entities where
sufficiently tortured `unsafe` could could technically notice that this
has changed.)
@Nadrieril
Copy link
Member

This is a gorgeous writeup 😍 I think we should keep your "what is UB" intro for posterity somewhere. I particularly liked the clarity of "UB should be detectable/justified/operational". Is there a better place than the glossary for that? If not I'll PR this into the glossary.

@RalfJung
Copy link
Member

The glossary is meant for definitions, not rationale. But I don't know a good place for rationale, unfortunately.

rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this issue Jan 9, 2024
(This is my first PR here, so I've probably missed some things. Please
let me know what else I should do to help you as a reviewer!)

# Objective

Due to rust-lang/rust#117800, the `derive`'d
`PartialEq::eq` on `Entity` isn't as good as it could be. Since that's
used in hashtable lookup, let's improve it.

## Solution

The derived `PartialEq::eq` short-circuits if the generation doesn't
match. However, having a branch there is sub-optimal, especially on
64-bit systems like x64 that could just load the whole `Entity` in one
load anyway.

Due to complications around `poison` in LLVM and the exact details of
what unsafe code is allowed to do with reference in Rust
(rust-lang/unsafe-code-guidelines#346), LLVM
isn't allowed to completely remove the short-circuiting. `&Entity` is
marked `dereferencable(8)` so LLVM knows it's allowed to *load* all 8
bytes -- and does so -- but it has to assume that the `index` might be
undef/poison if the `generation` doesn't match, and thus while it finds
a way to do it without needing a branch, it has to do something slightly
more complicated than optimal to combine the results. (LLVM is allowed
to change non-short-circuiting code to use branches, but not the other
way around.)

Here's a link showing the codegen today:
<https://rust.godbolt.org/z/9WzjxrY7c>
```rust
#[no_mangle]
pub fn demo_eq_ref(a: &Entity, b: &Entity) -> bool {
    a == b
}
```
ends up generating the following assembly:
```asm
demo_eq_ref:
        movq    xmm0, qword ptr [rdi]
        movq    xmm1, qword ptr [rsi]
        pcmpeqd xmm1, xmm0
        pshufd  xmm0, xmm1, 80
        movmskpd        eax, xmm0
        cmp     eax, 3
        sete    al
        ret
```
(It's usually not this bad in real uses after inlining and LTO, but it
makes a strong demo.)

This PR manually implements `PartialEq::eq` *without* short-circuiting,
and because that tells LLVM that neither the generations nor the index
can be poison, it doesn't need to be so careful and can generate the
"just compare the two 64-bit values" code you'd have probably already
expected:
```asm
demo_eq_ref:
        mov     rax, qword ptr [rsi]
        cmp     qword ptr [rdi], rax
        sete    al
        ret
```

Since this doesn't change the representation of `Entity`, if it's
instead passed by *value*, then each `Entity` is two `u32` registers,
and the old and the new code do exactly the same thing. (Other
approaches, like changing `Entity` to be `[u32; 2]` or `u64`, affect
this case.)

This should hopefully merge easily with changes like
bevyengine#9907 that also want to change
`Entity`.

## Benchmarks

I'm not super-confident that I got my machine fully consistent for
benchmarking, but whether I run the old or the new one first I get
reasonably consistent results.

Here's a fairly typical example of the benchmarks I added in this PR:

![image](https://github.com/bevyengine/bevy/assets/18526288/24226308-4616-4082-b0ff-88fc06285ef1)

Building the sets seems to be basically the same. It's usually reported
as noise, but sometimes I see a few percent slower or faster.

But lookup hits in particular -- since a hit checks that the key is
equal -- consistently shows around 10% improvement.

`cargo run --example many_cubes --features bevy/trace_tracy --release --
--benchmark` showed as slightly faster with this change, though if I had
to bet I'd probably say it's more noise than meaningful (but at least
it's not worse either):

![image](https://github.com/bevyengine/bevy/assets/18526288/58bb8c96-9c45-487f-a5ab-544bbfe9fba0)

This is my first PR here -- and my first time running Tracy -- so please
let me know what else I should run, or run things on your own more
reliable machines to double-check.

---

## Changelog

(probably not worth including)

Changed: micro-optimized `Entity::eq` to help LLVM slightly.

## Migration Guide

(I really hope nobody was using this on uninitialized entities where
sufficiently tortured `unsafe` could could technically notice that this
has changed.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants