Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Who is responsible for preventing reentrancy issues through the allocator? #506

Open
Tracked by #51245 ...
RalfJung opened this issue Apr 17, 2024 · 10 comments
Open
Tracked by #51245 ...

Comments

@RalfJung
Copy link
Member

RalfJung commented Apr 17, 2024

Calling the global allocator can in turn call arbitrary user-defined code

  • if there is a custom global allocator (needs unsafe)
  • if allocation fails, and there is one of
    • a custom alloc hook (unstable)
    • -Zoom=panic (unstable) plus a custom panic hook
    • a custom #[alloc_error_handler] (unstable, requires no_std)
    • a no_std library using alloc's default error hook (which panics) with a #[panic_handler] (may be possible on stable?)

It may just barely be the case that this cannot be triggered on stable today, depending on whether there is any way to actually run the liballoc default error hook on stable. I think the __rust_no_alloc_shim_is_unstable symbol makes that impossible though? @bjorn3 might know.

Similar to #505, this can lead to reentrancy. That has caused real soundness questions: is it the responsibility of the allocator and the alloc error hook to never call into data structures except if they are declared reentrant, or is it the responsibility of every data structure to be resilient against reentrancy through the allocator? If they anyway have to be able to deal with reentrant panic hooks (due to #505), then is the extra possibility of reentrancy via the allocator even relevant?

@CAD97
Copy link

CAD97 commented Apr 17, 2024

Actually IIUC handle_alloc_error with -Zoom=abort still calls the panic hook, it just does so with can_unwind=false so no unwinding occurs.

A more niche but more obviously suspect case of potential surprise single thread reentrancy is signal handlers. At least there the usual model is that they're a separate logical thread of execution even if physically they run within an existing thread.

I think "global allocator implementation shouldn't perform allocation" is a common rule, which would include forbidding panics. But I think any scenario which says panicking at all is unsound without even being able to cover it with catch_unwind is untenable.

@RalfJung
Copy link
Member Author

swc-project/swc#8362 is likely still an issue in the latest version of that crate. If there is a custom allocator, and that allocator is allowed to panic, then one can cause an aliasing violation in entirely safe code:

  • call AtomStoreCell::atomic
  • that will create an &mut and pass it to AtomStore::atom (meaning it is protected and invalidation is insta-UB)
  • that will call some things in the hstr crate which end up doing an allocation
  • that will call our allocator, which panics
  • that will call our panic hook, which calls AtomStoreCell::atomic a second time which creates an aliasing &mut which is immediate UB due to the protector created the first time around

Actually IIUC handle_alloc_error with -Zoom=abort still calls the panic hook, it just does so with can_unwind=false so no unwinding occurs.

No, the alloc error handler in std will call rtprintpanic! which does not call the panic hook.

signal handlers

Those are very different; they may only do async-signal-safe things and they are generally accepted as a very non-standard execution environment.

I think "global allocator implementation shouldn't perform allocation" is a common rule, which would include forbidding panics. But I think any scenario which says panicking at all is unsound without even being able to cover it with catch_unwind is untenable.

So we should forbid panics but also forbidding panics is untenable...?

@CAD97
Copy link

CAD97 commented Apr 22, 2024

To restate hopefully a bit clearer:

I believe "global allocators don't reentrantly call the global allocator" is a common property that people expect to be upheld. However, I don't think that treating this as a soundness property that must be upheld by the developer is a viable option (c.f. allocation size fitting isize). Since arbitrary panic hooks may of course include allocation, such a requirement means writing entirely panic-free code, elevating accidental panics from a likely fatal error to unsoundness.

There's two sides to the coin here: reentrancy in code because the allocator reentered it, and reentrancy of the allocator itself. My observation is more about the latter, not the former. Making code no-global-alloc is a reasonable ask (the allocator won't enter your data structure that uses the allocator), but no-panic isn't really, not without better compiler assistance anyway. So perhaps this ends up more about #505 (reentrancy via panic hook) than this issue (reentrancy via alloc hook).

However, it's possible to imagine the Rust runtime enforcing alloc reentrancy freedom in the global allocator dynamically, by manipulating the alloc and/or panic hook. (But if safe APIs must tolerate reentrancy from the panic hook anyway, preventing alloc reentrancy doesn't seem actually beneficial.)

@jwong101
Copy link

jwong101 commented Oct 13, 2024

Making the global allocator be the one for ensuring reentrancy seems to be the most reasonable option. Yes it's difficult, but it seems natural that the "code in charge of global dynamic memory allocation" should shoulder more responsibility, rather than having all other unsafe code/data structures have to worry about being reentered whenever they allocate memory.

If they anyway have to be able to deal with reentrant panic hooks (due to #505), then is the extra possibility of reentrancy via the allocator even relevant?

Yes, since the allocator currently isn't allowed to panic.

Otherwise this code:

https://github.com/rust-lang/rust/blob/ef4e8259b5016d85e261587b605028b2ff06c13d/library/alloc/src/rc.rs#L1877

is unsound if Global.alloc() is allowed to soundly call into the Rc:

use std::alloc::{GlobalAlloc, Layout, System};
use std::cell::Cell;
use std::mem::ManuallyDrop;
use std::rc::{Rc, Weak};
use std::sync::atomic::{AtomicBool, Ordering};

type RcBox = Rc<Box<i32>>;
type WeakBox = Weak<Box<i32>>;
struct Demonic;

thread_local! {
    static STRONG: Cell<Option<RcBox>> =  const { Cell::new(None) };
    static WEAK: Cell<WeakBox> =  const { Cell::new(Weak::new()) };
}

static SIDE_CHANNEL: AtomicBool = AtomicBool::new(false);

fn dupe_rc() {
    // `ManuallyDrop` used to avoid an overflow check in debug mode
    STRONG.set(ManuallyDrop::new(WEAK.take()).upgrade());
}

unsafe impl GlobalAlloc for Demonic {
    unsafe fn alloc(&self, l: Layout) -> *mut u8 {
        if SIDE_CHANNEL.load(Ordering::Relaxed) {
            SIDE_CHANNEL.store(false, Ordering::Relaxed);
            dupe_rc();
        }
        System.alloc(l)
    }

    unsafe fn dealloc(&self, c: *mut u8, l: Layout) {
        System.dealloc(c, l);
    }
}

#[global_allocator]
static D: Demonic = Demonic;

fn main() {
    let mut oblivious = RcBox::default();
    WEAK.set(Rc::downgrade(&oblivious));
    SIDE_CHANNEL.store(true, Ordering::Relaxed);
    Rc::make_mut(&mut oblivious);
    drop(oblivious);

    // double free. whose fault is it?
    STRONG.set(None);
}
miri backtrace
jwong3@JWF3QD9WNV galloc % MIRIFLAGS=-Zmiri-backtrace=full cargo miri run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
     Running `/Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/bin/cargo-miri runner target/miri/aarch64-apple-darwin/debug/galloc`
error: Undefined Behavior: memory access failed: alloc1836 has been freed, so this pointer is dangling
  --> /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/alloc/unix.rs:48:18
   |
48 |         unsafe { libc::free(ptr as *mut libc::c_void) }
   |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: alloc1836 has been freed, so this pointer is dangling
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc1836 was allocated here:
  --> src/main.rs:28:9
   |
28 |         System.alloc(l)
   |         ^^^^^^^^^^^^^^^
help: alloc1836 was deallocated here:
  --> src/main.rs:32:9
   |
32 |         System.dealloc(c, l);
   |         ^^^^^^^^^^^^^^^^^^^^
   = note: BACKTRACE (of the first span):
   = note: inside `std::sys::alloc::unix::<impl std::alloc::GlobalAlloc for std::alloc::System>::dealloc` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys/alloc/unix.rs:48:18: 48:54
note: inside `<Demonic as std::alloc::GlobalAlloc>::dealloc`
  --> src/main.rs:32:9
   |
32 |         System.dealloc(c, l);
   |         ^^^^^^^^^^^^^^^^^^^^
note: inside `_::__rust_dealloc`
  --> src/main.rs:37:11
   |
36 | #[global_allocator]
   | ------------------- in this procedural macro expansion
37 | static D: Demonic = Demonic;
   |           ^^^^^^^
   = note: inside `std::alloc::dealloc` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/alloc.rs:119:14: 119:64
   = note: inside `<std::alloc::Global as std::alloc::Allocator>::deallocate` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/alloc.rs:269:22: 269:51
   = note: inside `<std::boxed::Box<i32> as std::ops::Drop>::drop` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1679:17: 1679:66
   = note: inside `std::ptr::drop_in_place::<std::boxed::Box<i32>> - shim(Some(std::boxed::Box<i32>))` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1: 574:56
   = note: inside `<std::rc::Rc<std::boxed::Box<i32>> as std::ops::Drop>::drop` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/rc.rs:2256:17: 2256:66
   = note: inside `std::ptr::drop_in_place::<std::rc::Rc<std::boxed::Box<i32>>> - shim(Some(std::rc::Rc<std::boxed::Box<i32>>))` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1: 574:56
   = note: inside `std::ptr::drop_in_place::<std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>> - shim(Some(std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>))` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1: 574:56
   = note: inside `std::cell::Cell::<std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>>::set` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/cell.rs:428:26: 428:27
   = note: inside closure at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:349:17: 349:45
   = note: inside `std::thread::LocalKey::<std::cell::Cell<std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>>>::initialize_with::<{closure@std::thread::LocalKey<std::cell::Cell<std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>>>::set::{closure#0}}, ()>` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:311:9: 311:27
   = note: inside `std::thread::LocalKey::<std::cell::Cell<std::option::Option<std::rc::Rc<std::boxed::Box<i32>>>>>::set` at /Users/jwong3/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/local.rs:344:9: 351:11
note: inside `main`
  --> src/main.rs:47:5
   |
47 |     STRONG.set(None);
   |     ^^^^^^^^^^^^^^^^

EDIT: made the example less pathological/more realistic

@CAD97
Copy link

CAD97 commented Oct 13, 2024

Yes, since the allocator currently isn't allowed to panic.

This isn't fully clear in the current docs; they say that

It’s undefined behavior if global allocators unwind. This restriction may be lifted in the future, but currently a panic from any of these functions may lead to memory unsafety.

but this may refer either to any attempt to initiate a panic within the scope of allocation, or only to an attempt to unwind from (out of) an allocation function.

Otherwise this code […] is unsound if Global.alloc() is allowed to soundly call into the Rc:

Good proof of concept. It looks like Rc::make_mut should .dec_strong() earlier to pre-poop its pants guard against reentrancy despite the comment's assertion that this branch doesn't need panic protection.

But this does clearly illustrate the difficulty of accounting for allocation being a potential reentrancy point and I think put it on a similar footing as the isize allocation max where treating it as a global concern is basically untenable.


Additionally, I think this illustrates that if the panicking restriction isn't total and only raises library UB at the point of an unwinding out of an alloc function, then we need to have a flag in the panic count that prevents calling into the panic hook for a panic in the allocator.

It's probably also a good idea to put a helpful assert_unsafe_precondition! reentrancy guard in Global. The associated cost should be dominated easily by the actual allocation work.

@RalfJung
Copy link
Member Author

That is a neat example. :) However, your code link points at a ptr::copy_nonoverlapping, so I was confused at first... did you mean to point at this instead?

https://github.com/rust-lang/rust/blob/ef4e8259b5016d85e261587b605028b2ff06c13d/library/alloc/src/rc.rs#L1872

That's a call to UniqueRcUninit::new, so allocation is involved.

Cc @rust-lang/wg-allocators for some more input on the intended soundness contract here.

It's probably also a good idea to put a helpful assert_unsafe_precondition! reentrancy guard in Global. The associated cost should be dominated easily by the actual allocation work.

The same example could be constructed using custom local allocators, right? We can't automatically add a reentrancy guard to those, I think...

@RalfJung
Copy link
Member Author

RalfJung commented Oct 13, 2024

but this may refer either to any attempt to initiate a panic within the scope of allocation, or only to an attempt to unwind from (out of) an allocation function.

Yeah... also, if we take away their ability to panic, we should give them some other way to halt execution. And even then it's an enormous footgun as rustc will insert a bunch of panics automatically... so probably what we should rather do is have a function like with_immediate_abort_on_panic(|| { /* code here */ }) such that if the closure starts a panic, this does not invoke the panic hook, and only calls code that we control and carefully ensure will not allocate. All allocators should then be implemented with that function.

@jwong101
Copy link

However, your code link points at a ptr::copy_nonoverlapping, so I was confused at first... did you mean to point at this instead?

Yeah, sorry for the confusion. I didn't know which part to point out, so I highlighted the part where the Rc copies the Box into the new allocation, because it assumes that it's still the only strong reference after the allocation in UniqueRcUninit::new. It probably would've been clearer to highlight the allocation(and the possibility of it running arbitrary code that could make a new strong reference) instead.

@bjorn3
Copy link
Member

bjorn3 commented Oct 13, 2024

so probably what we should rather do is have a function like with_immediate_abort_on_panic(|| { /* code here */ }) such that if the closure starts a panic, this does not invoke the panic hook, and only calls code that we control and carefully ensure will not allocate.

There is no way to implement such a function for no_std programs. It did need a thread local variable, which requires std.

@comex
Copy link

comex commented Oct 13, 2024

There is no way to implement such a function for no_std programs. It did need a thread local variable, which requires std.

The cost of toggling a thread-local variable, while small, is not ideal for something as hot as an allocator. I can imagine schemes where it would instead just be checking a global variable, but that would still be a measurable expense.

I wish Rust had an effect system (although I like @nikomatsakis' "function flavor" term better). In this case, allocators could use a "no panic" flavor to ensure they never called into any code that could panic. There's been a drumbeat of requests for a "no panic" feature over the years, and this would be one more use case. Admittedly, a flavor system would not be a panacea; we would still be left with the problem that writing no-panic code is (currently) extremely unergonomic. But that's solvable. Without a flavor system, though, we can't even get started on solving it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants