Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(Rust) unions (including MaybeUninit) do not preserve padding bytes #99604

Open
5225225 opened this issue Jul 22, 2022 · 10 comments
Open

repr(Rust) unions (including MaybeUninit) do not preserve padding bytes #99604

5225225 opened this issue Jul 22, 2022 · 10 comments
Labels
T-opsem Relevant to the opsem team

Comments

@5225225
Copy link
Contributor

5225225 commented Jul 22, 2022

The following program isn't detected to have any UB in miri, and doesn't hit the panic, but does panic when ran (with or without optimizations) normally (using LLVM)

use std::mem::MaybeUninit;

unsafe fn print(x: MaybeUninit::<(u16, u8)>) {
    let value: u32 = std::mem::transmute(x);
    if value != 0xaabbccdd {
        panic!("got {value:x}");
    }
}

fn main() {
    let mut x = MaybeUninit::<(u16, u8)>::uninit();
    unsafe {
        x.as_mut_ptr().cast::<u32>().write_unaligned(0xaabbccdd);
        print(x);
    }
}

Panics with "got bbccdd", so the padding byte, presumably at aa, got zeroed.

The LLVM IR for print says it passes it as 2 arguments, so it makes sense that the padding is getting zeroed.

define void @_ZN10playground5print17hfd0afe434a63e3b7E(i16 %x.0, i8 %x.1) unnamed_addr #0 [...]

We (might?) already promise that MaybeUninit is just a bag of bytes that doesn't zero any padding, but we also promise that it has the ABI of T, and those two seem to be in conflict.

Relates to rust-lang/unsafe-code-guidelines#354 (comment)

@5225225
Copy link
Contributor Author

5225225 commented Jul 22, 2022

This also comes up as an issue for functions that happen to use MaybeUninit internally, see

fn main() {
    const MARKER: u32 = 0x11223344;
    let mut x = MARKER;
    let mut y = 0_u32;
    unsafe { 
        std::ptr::swap_nonoverlapping(
            &mut x as *mut u32 as *mut (u16, u8), 
            &mut y as *mut u32 as *mut (u16, u8),
            1
        );
    }
    if y != MARKER {
        println!("{y:x} != {MARKER:x}");
    }
}

swap_nonoverlapping is explicitly mentioned as being untyped, so this failing is wrong, IMO.

@workingjubilee
Copy link
Member

My understanding was that at some point it had been decided that we didn't have to promise anything regarding the padding byte in exactly these cases.

@workingjubilee
Copy link
Member

swap_nonoverlapping is explicitly mentioned as being untyped, so this failing is wrong, IMO.

In the models where memory accesses are typed, this too does have a type! It's just suggesting the type of "a blob of bytes". But this would not be the first time we wrote a check that (our lowering to) LLVM can't cash.

@RalfJung
Copy link
Member

RalfJung commented Jul 22, 2022

We (might?) already promise that MaybeUninit is just a bag of bytes that doesn't zero any padding, but we also promise that it has the ABI of T, and those two seem to be in conflict.

Ah, yeah, good point. :(

swap_nonoverlapping is explicitly mentioned as being untyped, so this failing is wrong, IMO.

I agree, the implementation of swap_nonoverlapping should be fixed, or the compilation of repr(transparent) unions needs to be adjusted.

In the models where memory accesses are typed, this too does have a type!

The docs have recently been adjusted to say "The operation is untyped in the sense that data may be uninitialized or otherwise violate the requirements of T. The initialization state is preserved exactly". So what you say is not the intention of the implementation. See #97712.

Some other functions (ptr::copy, ptr::copy_nonoverlapping) have documented for a long while that they act like memcpy/memmove, which also preserve all bytes disregarding the type -- IMO it would be strange for ptr::swap_nonoverlapping to behave differently here.

@workingjubilee
Copy link
Member

workingjubilee commented Jul 22, 2022

We also claim that mem::transmute will simply be "like a memcpy", which leads to #96140 where LLVM disagrees with us on our interpretation of our own claims. While I am all for telling LLVM to take a hike, I must stress that committing to preserving some bits what we claim we really care about is a u16 and a u8 is asking lowering and the optimizer to reason about a very different version of the code than what is written.

@5225225
Copy link
Contributor Author

5225225 commented Jul 22, 2022

Yeah, the only reason I used swap_nonoverlapping here instead of any other one is that swap and copy don't actually move a MaybeUninit, they just use one as storage space on the stack, and use the copy_nonoverlapping intrinsic (which is presumably memcpy-like in that it doesn't care about the type).

However, swap_nonoverlapping seems to eventually call core::mem::swap_simple (stdrs.dev) on MaybeUninit, which does do a move at MaybeUninit<(u16, u8)>, which loses the padding.

So it seems to be purely by accident that swap_nonoverlapping does this, and other functions don't.

Honestly I think we might need two MaybeUninit's. One that promises ABI transparency, and one that promises that it's just an untyped bag of bytes that happens to have the size and align of T, and as such can be trusted to store bytes that are padding in T reliably.

@RalfJung
Copy link
Member

We also claim that mem::transmute will simply be "like a memcpy"

Well, we say it's like memcpy "under the hood", which is not wrong, but then the inputs and outputs are fundamentally "clamped" by this being a by-value operation. memcpy does not act on values. The type-changing memcpy is not what makes the padding disappear here, it's the passing-args-to-transmute and passing-ret-from-transmute (and it'd be the same with any safe function as well).

But probably we should just remove that mention of memcpy if it confuses people more than it helped...

Honestly I think we might need two MaybeUninit's. One that promises ABI transparency, and one that promises that it's just an untyped bag of bytes that happens to have the size and align of T, and as such can be trusted to store bytes that are padding in T reliably.

I have no idea how we could possibly teach this. :(

@5225225
Copy link
Contributor Author

5225225 commented Jul 22, 2022

I have no idea how we could possibly teach this. :(

Yep. I think that's a Terrible idea, come to think of it.

So.... Well, we don't promise what MaybeUninit actually is, so it could be Compiler Magic

that knows if it needs to be the ABI of an untyped blob, or if it needs to be the ABI of T.... based on the calling convention of the function it's passed to?

that seems extremely strange (and would mean passing a MaybeUninit through a C function defined in rust would drop the padding but that's arguably correct? it is following the rules it laid out).

@rdrpenguin04
Copy link

That breaks down if it's passed to multiple functions though. Also, @chorman0773 would probably have some choice words about making a fundamentally "normal" type be magic.

@RalfJung
Copy link
Member

A relevant recent Zulip discussion.

We guarantee that MaybeUninit<T> is ABI-compatible with T, which pretty much inevitably leads to these problems. This was maybe premature as we didn't realize at the time what that would imply. But whether it is possible and a good idea to change this now -- I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

5 participants