Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasi threads mpsc channel hang on sched_yield only on debug rust build #117440

Open
cimacmillan opened this issue Oct 31, 2023 · 7 comments
Open
Labels
C-bug Category: This is a bug. O-wasi Operating system: Wasi, Webassembly System Interface T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@cimacmillan
Copy link

There is an issue with rust wasi threads where using mpsc channel can hang / deadlock only when compiled in debug mode

With this code, with wasm32-wasi-preview1-threads target:

use std::sync::mpsc;
use std::thread;

fn main() {
    let (send, recv) = mpsc::channel();
    let thread = thread::spawn(move || {
        let v = vec![1, 2, 3];
        println!("\tsending {:?}", v);
        send.send(v).unwrap();
        println!("\tsent");
    });


    println!("receiving");
    
    let v= recv.recv().unwrap();

    println!("received {:?}", v);

    thread.join().unwrap();
}

I've created this repository, which provides an easy way of reproducing the bug:

https://github.com/cimacmillan/WasiThreadsSchedYieldBug

I expected to see this happen:

receiving
        sending [1, 2, 3]
        sent
received [1, 2, 3]

Instead, this happened:

receiving
        sending [1, 2, 3]

I've reproduced this on wasmtime and wamr.

Notes

If vec is placed in a box, it succeeds on both debug and release

let v = Box::new(vec![1, 2, 3]);
println!("\tsending {:?}", v);
send.send(v).unwrap();

Using a stack allocated array, it succeeds on release but crashes on debug

thread 'main' panicked at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/thread/mod.rs:1439:40:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: failed to run main module `target/wasm32-wasi-preview1-threads/debug/wasi_channel.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x27b6d - panic_abort::__rust_start_panic::abort::hbcfb3c5651cde4fc
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/panic_abort/src/lib.rs:84:17              - __rust_start_panic
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/panic_abort/src/lib.rs:38:5
           1: 0x27622 - rust_panic
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:757:25
           2: 0x275a9 - std::panicking::rust_panic_with_hook::h6c1623ae744881b6
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:729:5
           3: 0x26536 - std::panicking::begin_panic_handler::{{closure}}::h8474ba3fc2a2f64f
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:597:13
           4: 0x2649b - std::sys_common::backtrace::__rust_end_short_backtrace::h2d8a3004b92235a1
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
           5: 0x26efc - rust_begin_unwind
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
           6: 0x2d42e - core::panicking::panic_fmt::h4d6ec91b4ec3c102
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
           7: 0x2d9b9 - core::panicking::panic::h034632322060c32c
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:117:5
           8: 0x17e9c - core::option::Option<T>::unwrap::h1863121d2cc0b0ed
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/option.rs:935:21              - std::thread::JoinInner<T>::join::hd31db5945ffb3f2c
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/thread/mod.rs:1439:40
           9: 0x18045 - std::thread::JoinHandle<T>::join::h4dbbb2a53ac5f6d8
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/thread/mod.rs:1571:9
          10: 0x24cb - wasi_channel::main::h820cc1b51aa05b11
                           at /home/ANT.AMAZON.COM/cmmacmil/Workplace/WasiThreadsSchedYieldBug/src/main.rs:20:5
          11: 0x13576 - core::ops::function::FnOnce::call_once::hb0a25fce8d6fee0e
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:250:5
          12: 0xbd86 - std::sys_common::backtrace::__rust_begin_short_backtrace::hcb7642304aef178a
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:154:18
          13: 0xfca9 - std::rt::lang_start::{{closure}}::hff3d73af1b02f1d1
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:166:18
          14: 0x2331a - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h419298144f8c584c
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:284:13              - std::panicking::try::do_call::h5a71f345767668ef
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40              - std::panicking::try::h5a7441a0b5ca82dd
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19              - std::panic::catch_unwind::h4144d543aad3e5b6
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14              - std::rt::lang_start_internal::{{closure}}::h8e9bcf681666a716
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:148:48              - std::panicking::try::do_call::h61bb14bafca96d71
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40              - std::panicking::try::h011f4feb1eca4f94
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19              - std::panic::catch_unwind::hf0cc29067d9aa254
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14              - std::rt::lang_start_internal::hee10ca7e79926ed8
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:148:20
          15: 0xfc46 - std::rt::lang_start::h261b20e17f46f2bb
                           at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:165:17
          16: 0x253c - <unknown>!__main_void
          17:  0x719 - <unknown>!_start
    2: wasm trap: wasm `unreachable` instruction executed

Tracing using wasm-micro-runtime wasm debugger shows both threads paused on exception
Main thread

std::sys::wasi::alloc::_$LT$impl$u20$core..alloc..global..GlobalAlloc$u20$for$u20$std..alloc..System$GT$::alloc::haa63c39ee08f2074 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/alloc.rs:14)
__rdl_alloc (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/alloc.rs:380)
 (:0)
alloc::alloc::alloc::h42374113b40c16ed (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98)
alloc::alloc::Global::alloc_impl::h9090710806748dc2 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181)
_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$::allocate::h71d2a35315712f79 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241)
alloc::raw_vec::finish_grow::ha5efa382d4be5115 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:485)
alloc::raw_vec::RawVec$LT$T$C$A$GT$::grow_amortized::hf692d68114f3755b (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:404)
alloc::raw_vec::RawVec$LT$T$C$A$GT$::reserve_for_push::h2a968592231ee383 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:302)
alloc::vec::Vec$LT$T$C$A$GT$::push::ha56c219806196dc1 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1829)
std::sync::mpmc::waker::Waker::register_with_packet::h415bb41d0e6ccf14 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/waker.rs:50)
std::sync::mpmc::waker::Waker::register::hece34a0b16653117 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/waker.rs:44)
std::sync::mpmc::waker::SyncWaker::register::h7c5d2119ed1febeb (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/waker.rs:155)
std::sync::mpmc::list::Channel$LT$T$GT$::recv::_$u7b$$u7b$closure$u7d$$u7d$::hdc64ce948da1d89a (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/list.rs:436)
std::sync::mpmc::context::Context::with::_$u7b$$u7b$closure$u7d$$u7d$::ha81a21b45c2235c8 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:50)
std::sync::mpmc::context::Context::with::_$u7b$$u7b$closure$u7d$$u7d$::he57ca4f13f038b26 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:58)
std::thread::local::LocalKey$LT$T$GT$::try_with::h54045f9a24ff4e0f (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:270)
std::sync::mpmc::context::Context::with::h9f6c91eccd7e1618 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/context.rs:53)

Spawned thread

std::sys::wasi::alloc::_$LT$impl$u20$core..alloc..global..GlobalAlloc$u20$for$u20$std..alloc..System$GT$::alloc::haa63c39ee08f2074 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/alloc.rs:14)
__rdl_alloc (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/alloc.rs:380)
 (:0)
alloc::alloc::alloc::h42374113b40c16ed (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98)
alloc::alloc::Global::alloc_impl::h9090710806748dc2 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181)
_$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$::allocate::h71d2a35315712f79 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241)
alloc::alloc::exchange_malloc::h4e69b088f2a2057f (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:330)
alloc::boxed::Box$LT$T$GT$::new::h12d236dc4a77b137 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:217)
std::sync::mpmc::list::Channel$LT$T$GT$::start_send::h8f9ef938a1752225 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/list.rs:209)
std::sync::mpmc::list::Channel$LT$T$GT$::send::hd1588ed8e20b059b (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/list.rs:402)
std::sync::mpmc::Sender$LT$T$GT$::send::hdc2374ba0d3a3e44 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpmc/mod.rs:128)
std::sync::mpsc::Sender$LT$T$GT$::send::h4cdca901b00daa1f (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sync/mpsc/mod.rs:613)
wasm_hello_world::main::_$u7b$$u7b$closure$u7d$$u7d$::hce684d33494d1d0c (/home/ANT.AMAZON.COM/cmmacmil/Workplace/rustc/test/wasm_hello_world/src/main.rs:9)
std::sys_common::backtrace::__rust_begin_short_backtrace::h22749124b985e533 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:154)
std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h25bdfa1e0f7ce137 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:529)
_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h9940955c78c8f231 (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271)
std::panicking::try::do_call::h032c93401696362e (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:502)
std::panicking::try::h20a3486b63cbd1af (/home/ANT.AMAZON.COM/cmmacmil/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:466)

Meta

rustc --version --verbose:

rustc 1.73.0 (cc66ad468 2023-10-03)
binary: rustc
commit-hash: cc66ad468955717ab92600c770da8c1601a4ff33
commit-date: 2023-10-03
host: x86_64-unknown-linux-gnu
release: 1.73.0
LLVM version: 17.0.2
@cimacmillan cimacmillan added the C-bug Category: This is a bug. label Oct 31, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 31, 2023
@Jules-Bertholet
Copy link
Contributor

@rustbot label T-libs O-wasi

@rustbot rustbot added O-wasi Operating system: Wasi, Webassembly System Interface T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 31, 2023
@cimacmillan
Copy link
Author

I had a thought that it might be due to the child thread panicking, it not being handled, and then deadlocking on the channel recv function on main thread. However, I replaced with this snippet and it's ending the program on debug and release, which is what I would have expected.

fn main() {
    let (send, recv) = mpsc::channel();
    let thread = thread::spawn(move || {
        panic!("Test");
    });


    println!("receiving");
    
    let v: Vec<i32> = recv.recv().unwrap();

    println!("received {:?}", v);

    thread.join().unwrap();
}

Output

thread '<unnamed>' panicked at src/main.rs:7:9:
Test
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: error while executing at wasm backtrace:
    0: 0x22c0f - <unknown>!__rust_start_panic
    1: 0x226c4 - <unknown>!rust_panic
    2: 0x2264b - <unknown>!std::panicking::rust_panic_with_hook::h6c1623ae744881b6
    3: 0x215d8 - <unknown>!std::panicking::begin_panic_handler::{{closure}}::h8474ba3fc2a2f64f
    4: 0x2153d - <unknown>!std::sys_common::backtrace::__rust_end_short_backtrace::h2d8a3004b92235a1
    5: 0x21f9e - <unknown>!rust_begin_unwind
    6: 0x284d0 - <unknown>!core::panicking::panic_fmt::h4d6ec91b4ec3c102
    7: 0x184e4 - <unknown>!wasi_channel::main::{{closure}}::h1f870b2c7467241a
    8: 0x9499 - <unknown>!std::sys_common::backtrace::__rust_begin_short_backtrace::h7abdc32d1316f53c
    9: 0x1603d - <unknown>!std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hb1e093dccf3ac99c
   10: 0x16e25 - <unknown>!<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h31a22dd1c8113e01
   11: 0x16335 - <unknown>!std::panicking::try::do_call::h304a47943491d6f1
   12: 0x1627e - <unknown>!std::panicking::try::h6afed2d90df2a9cf
   13: 0x15f3c - <unknown>!std::thread::Builder::spawn_unchecked_::{{closure}}::h595109464b33a0af
   14: 0x1019b - <unknown>!core::ops::function::FnOnce::call_once{{vtable.shim}}::hcd7cecffb6a6cde8
   15: 0x22b95 - <unknown>!std::sys::wasi::thread::Thread::new::thread_start::h3c4c4269952012d9
   16: 0x27048 - <unknown>!__wasi_thread_start_C
   17: 0x27b9a - <unknown>!wasi_thread_start
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information

Caused by:
    wasm trap: wasm `unreachable` instruction executed

However, running this on x86 native is leading to a main thread deadlock on debug and release:

> ./target/debug/wasi_channel
receiving
thread '<unnamed>' panicked at src/main.rs:7:9:
Test
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Where shell control isn't returned

@cimacmillan
Copy link
Author

If I comment out thread.join().unwrap(); from the original example, then sleep in main thread enough time for the thread to complete, debug and release builds succeed on wasmtime & native x86.

fn main() {
    let (send, recv) = mpsc::channel();
    let thread = thread::spawn(move || {
        let v = vec![1, 2, 3];
        println!("\tsending {:?}", v);
        send.send(v).unwrap();
        println!("\tsent");
    });


    println!("receiving");
    
    let v= recv.recv().unwrap();

    println!("received {:?}", v);

    // thread.join().unwrap();

    thread::sleep(time::Duration::from_millis(10));
}

Output

receiving
        sending [1, 2, 3]
        sent
received [1, 2, 3]

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 31, 2023
@cimacmillan
Copy link
Author

Gathered some logs through WAMR during execution. I added backtraces to all the atomic wait and notify calls:

The main thread is stuck on atomic wait on address 1073480

[140303800735552][NATIVE]:: WAIT32 0x7f9ac6105158 addr (1073480) expect 4294967295 
[140303800735552][NATIVE]:: dumping call waiting stack ---
...
[140303800735552] #00 _ZN3std3sys4wasi5futex10futex_wait17h63210e8a59785ea1E

There isn't a corresponding notify for that address, which is the reason for the deadlock. I've built the standard library to include some extra logging, however printing changes the behaviour of the program. For example, running this example without any of the prints causes both threads to stuck on the futex wait.

@cimacmillan
Copy link
Author

Integrated this change into wasi-libc locally and added some checking to WAMR to print when the stack limit is exceeded WebAssembly/wasi-libc@41da013.

I can see in debug build only that it stack overflows in the thread

        sending [1, 2, 3]
[139947305289280][NATIVE]:: NOTIFY 0x7f47e1304ad4 addr (1071812) for notify count: 1 
[139947305289280][NATIVE] notifying native address 0x7f47e1304ad4
[139947305289280][NATIVE] no waiters on 0x7f47e1304ad4 
[139947305289280] global stack pointer is less than limit
[139947305289280] global stack pointer is 1071920 and limit is 1072960 
[139947305289280] stack size would then be 4294966256

[139947305289280] #00 _ZN3std4sync4mpmc4list16Channel$LT$T$GT$10start_send17h56be03f6a6aff50bE
[139947305289280] #01 _ZN3std4sync4mpmc4list16Channel$LT$T$GT$4send17h0fad186987baff15E
[139947305289280] #02 _ZN3std4sync4mpmc15Sender$LT$T$GT$4send17h82c526a45245a044E
[139947305289280] #03 _ZN3std4sync4mpsc15Sender$LT$T$GT$4send17h0520177199918e07E
[139947305289280] #04 _ZN12wasi_channel4main28_$u7b$$u7b$closure$u7d$$u7d$17h63edfb57dc34f107E
[139947305289280] #05 _ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h7e366921d5ace574E
[139947305289280] #06 _ZN3std6thread7Builder16spawn_unchecked_28_$u7b$$u7b$closure$u7d$$u7d$28_$u7b$$u7b$closure$u7d$$u7d$17h7e1d121769764acaE
[139947305289280] #07 _ZN115_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$9call_once17he92498e3a64f36ddE
[139947305289280] #08 _ZN3std9panicking3try7do_call17h418517eb737574e3E
[139947305289280] #09 _ZN3std9panicking3try17ha9e1eeeae677da17E
[139947305289280] #10 _ZN3std6thread7Builder16spawn_unchecked_28_$u7b$$u7b$closure$u7d$$u7d$17h63e52ee5eb0faf7eE
[139947305289280] #11 _ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17he63f502be4f0ead0E
[139947305289280] #12 _ZN3std3sys4wasi6thread6Thread3new12thread_start17h61d3e1b4a59f5450E
[139947305289280] #13 __wasi_thread_start_C
[139947305289280] #14 wasi_thread_start

For the release this doesn't happen and program exits without the overflow error. I increased the DEFAULT_MIN_STACK_SIZE in library/std/src/sys/wasi/thread.rs and no longer get the error.

@cimacmillan
Copy link
Author

It' possible to set env variable to increase the thread stack size RUST_MIN_STACK where it can also be set in the Rust example with:

std::env::set_var("RUST_MIN_STACK, "8196"");

Setting it larger for the debug build resolves the issue. Will check with WASI threads maintainers what's the process of setting the minimum stack size. There could be a static check that estimates the required thread stack size in rust compiler.

For handling overflows at runtime, discussed with @loganek that this can be handled by adding an LLVM pass that adds runtime checks comparing the thread stack size to the thread stack pointer in WASM. See WebAssembly/wasi-threads#12

@g0djan
Copy link
Contributor

g0djan commented Nov 12, 2023

@cimacmillan thanks for finding the root cause and a solution. I think it would make sense to increase the default value at least for wasm32-wasi-preview1-threads target as it overflows on such a simple example. Could you open a PR for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-wasi Operating system: Wasi, Webassembly System Interface T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants