Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo test with RUST_BACKTRACE causes SIGBUS #45731

Closed
behnam opened this issue Nov 3, 2017 · 18 comments
Closed

cargo test with RUST_BACKTRACE causes SIGBUS #45731

behnam opened this issue Nov 3, 2017 · 18 comments
Assignees
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS

Comments

@behnam
Copy link
Contributor

behnam commented Nov 3, 2017

On nightly channel, having RUST_BACKTRACE=1 or RUST_BACKTRACE=full when running cargo test throws this error on any panic:

error: process didn't exit successfully: `/.../target/debug/deps/test_a-...`
(signal: 10, SIGBUS: access to undefined memory)

Repro

#[test]
fn test_1() {
    assert!(false);
}

Setup

  • Rust version: rustc 1.23.0-nightly (2be4cc0 2017-11-01)
  • OS and version: Darwin Marv.local 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64
@euclio
Copy link
Contributor

euclio commented Nov 3, 2017

I have also been able to reproduce this. The same code got a SIGBUS on Mac OS, but the backtrace was printed as expected on Linux. Perhaps related to #44251?

@kennytm kennytm self-assigned this Nov 3, 2017
@kennytm kennytm added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS labels Nov 3, 2017
@kennytm
Copy link
Member

kennytm commented Nov 3, 2017

Stack trace of the SIGBUS:

* thread #2, name = 'test_1', stop reason = EXC_BAD_ACCESS (code=10, address=0x1001ef000)
  * frame #0: 0x0000000100055201 a-cfbe649daab6d4af`macho_get_commands + 145
    frame #1: 0x00000001000564cc a-cfbe649daab6d4af`macho_try_dwarf + 396
    frame #2: 0x0000000100056f86 a-cfbe649daab6d4af`macho_try_dsym + 726
    frame #3: 0x0000000100057519 a-cfbe649daab6d4af`macho_add + 1225
    frame #4: 0x0000000100057707 a-cfbe649daab6d4af`backtrace_initialize + 263
    frame #5: 0x0000000100054a9e a-cfbe649daab6d4af`fileline_initialize + 558
    frame #6: 0x0000000100054b7d a-cfbe649daab6d4af`backtrace_syminfo + 45
    frame #7: 0x000000010003fd52 a-cfbe649daab6d4af`core::slice::{{impl}}::position::{{closure}}<std::sys_common::backtrace::Frame,closure> [inlined] std::sys_common::gnu::libbacktrace::resolve_symname<closure> at libbacktrace.rs:79 [opt]
    frame #8: 0x000000010003fd1f a-cfbe649daab6d4af`core::slice::{{impl}}::position::{{closure}}<std::sys_common::backtrace::Frame,closure> [inlined] std::sys::imp::backtrace::printing::resolve_symname<closure> at mod.rs:36 [opt]
    frame #9: 0x000000010003fd1f a-cfbe649daab6d4af`core::slice::{{impl}}::position::{{closure}}<std::sys_common::backtrace::Frame,closure> [inlined] std::sys_common::backtrace::filter_frames::{{closure}} at backtrace.rs:108 [opt]
    frame #10: 0x000000010003fd18 a-cfbe649daab6d4af`core::slice::{{impl}}::position::{{closure}}<std::sys_common::backtrace::Frame,closure> at mod.rs:1204 [opt]
    frame #11: 0x0000000100047df1 a-cfbe649daab6d4af`std::sys_common::backtrace::_print [inlined] core::slice::{{impl}}::search_while<std::sys_common::backtrace::Frame,core::option::Option<usize>,closure> at mod.rs:1271 [opt]
    frame #12: 0x0000000100047daf a-cfbe649daab6d4af`std::sys_common::backtrace::_print [inlined] core::slice::{{impl}}::position<std::sys_common::backtrace::Frame,closure> at mod.rs:1203 [opt]
    frame #13: 0x0000000100047da3 a-cfbe649daab6d4af`std::sys_common::backtrace::_print [inlined] std::sys_common::backtrace::filter_frames at backtrace.rs:106 [opt]
    frame #14: 0x0000000100047d7f a-cfbe649daab6d4af`std::sys_common::backtrace::_print at backtrace.rs:71 [opt]
    frame #15: 0x000000010004d224 a-cfbe649daab6d4af`std::panicking::default_hook::{{closure}} [inlined] std::sys_common::backtrace::print at backtrace.rs:58 [opt]
    frame #16: 0x000000010004d203 a-cfbe649daab6d4af`std::panicking::default_hook::{{closure}} at panicking.rs:381 [opt]
    frame #17: 0x000000010004cd73 a-cfbe649daab6d4af`std::panicking::default_hook at panicking.rs:391 [opt]
    frame #18: 0x000000010004d703 a-cfbe649daab6d4af`std::panicking::rust_panic_with_hook at panicking.rs:577 [opt]
    frame #19: 0x0000000100000ffe a-cfbe649daab6d4af`std::panicking::begin_panic::hcb69775319227ce6 + 110
    frame #20: 0x0000000100000f26 a-cfbe649daab6d4af`a::test_1::h499e1b269bece65b + 38
    frame #21: 0x0000000100014a62 a-cfbe649daab6d4af`test::{{impl}}::call_box<(),closure> [inlined] test::run_test::{{closure}} at lib.rs:1480 [opt]
    frame #22: 0x0000000100014a5d a-cfbe649daab6d4af`test::{{impl}}::call_box<(),closure> [inlined] core::ops::function::FnOnce::call_once<closure,(())> at function.rs:223 [opt]
    frame #23: 0x0000000100014a5d a-cfbe649daab6d4af`test::{{impl}}::call_box<(),closure> at lib.rs:141 [opt]
    frame #24: 0x000000010005876f a-cfbe649daab6d4af`panic_unwind::__rust_maybe_catch_panic at lib.rs:99 [opt]
    frame #25: 0x0000000100006ce1 a-cfbe649daab6d4af`std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()> [inlined] std::panicking::try<(),std::panic::AssertUnwindSafe<closure>> at panicking.rs:459 [opt]
    frame #26: 0x0000000100006c9c a-cfbe649daab6d4af`std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()> [inlined] std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure>,()> at panic.rs:361 [opt]
    frame #27: 0x0000000100006c9c a-cfbe649daab6d4af`std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()> [inlined] test::run_test::run_test_inner::{{closure}} at lib.rs:1419 [opt]
    frame #28: 0x0000000100006b49 a-cfbe649daab6d4af`std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()> at backtrace.rs:134 [opt]
    frame #29: 0x000000010000f508 a-cfbe649daab6d4af`std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()> [inlined] std::thread::{{impl}}::spawn::{{closure}}::{{closure}}<closure,()> at mod.rs:402 [opt]
    frame #30: 0x000000010000f4f2 a-cfbe649daab6d4af`std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()> [inlined] std::panic::{{impl}}::call_once<(),closure> at panic.rs:296 [opt]
    frame #31: 0x000000010000f4f2 a-cfbe649daab6d4af`std::panicking::try::do_call<std::panic::AssertUnwindSafe<closure>,()> at panicking.rs:480 [opt]
    frame #32: 0x000000010005876f a-cfbe649daab6d4af`panic_unwind::__rust_maybe_catch_panic at lib.rs:99 [opt]
    frame #33: 0x00000001000209c2 a-cfbe649daab6d4af`alloc::boxed::{{impl}}::call_box<(),closure> [inlined] std::panicking::try<(),std::panic::AssertUnwindSafe<closure>> at panicking.rs:459 [opt]
    frame #34: 0x0000000100020989 a-cfbe649daab6d4af`alloc::boxed::{{impl}}::call_box<(),closure> [inlined] std::panic::catch_unwind<std::panic::AssertUnwindSafe<closure>,()> at panic.rs:361 [opt]
    frame #35: 0x0000000100020989 a-cfbe649daab6d4af`alloc::boxed::{{impl}}::call_box<(),closure> [inlined] std::thread::{{impl}}::spawn::{{closure}}<closure,()> at mod.rs:401 [opt]
    frame #36: 0x000000010002093c a-cfbe649daab6d4af`alloc::boxed::{{impl}}::call_box<(),closure> at boxed.rs:762 [opt]
    frame #37: 0x000000010004c83c a-cfbe649daab6d4af`std::sys::imp::thread::{{impl}}::new::thread_start [inlined] alloc::boxed::{{impl}}::call_once<(),()> at boxed.rs:772 [opt]
    frame #38: 0x000000010004c839 a-cfbe649daab6d4af`std::sys::imp::thread::{{impl}}::new::thread_start [inlined] std::sys_common::thread::start_thread at thread.rs:24 [opt]
    frame #39: 0x000000010004c7be a-cfbe649daab6d4af`std::sys::imp::thread::{{impl}}::new::thread_start at thread.rs:90 [opt]
    frame #40: 0x00007fff6d23e6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #41: 0x00007fff6d23e56d libsystem_pthread.dylib`_pthread_start + 377
    frame #42: 0x00007fff6d23dc5d libsystem_pthread.dylib`thread_start + 13

No repro when using a custom stage1 instead of nightly 😕

@kennytm
Copy link
Member

kennytm commented Nov 3, 2017

@JohnColanduoni Any idea?

Repro steps:

  1. Create the program as shown above
  2. rustc --test -g 1.rs
  3. RUST_BACKTRACE=1 ./1

I suspect it has something to do with threading.

This bug is no-repro on beta, so #44251 is not the sole cause of the SIGBUS. Maybe we could first rust-bisect it.

@alexcrichton
Copy link
Member

Regression was introduced in this range -- 4279e2b...8493813

Between nightly-2017-10-22 and nightly-2017-10-23

@alexcrichton
Copy link
Member

So this implies to me two things:

  1. ThinLTO broke debug info, this is probably ThinLTO generates bad DWARF for libtest #45511
  2. The backtrace implementation can segfault on malformed debug info which means that OSX is likely susceptible to libbacktrace crash and os::self_exe_name race condition #21889.

I'm going to send a PR to disable libbacktrace on OSX due to the latter point.

@JohnColanduoni
Copy link
Contributor

@kennytm What makes you suspect threading? IIRC libbacktrace functions are protected on the Rust side with a global lock.

@kennytm
Copy link
Member

kennytm commented Nov 4, 2017

@JohnColanduoni because I can't repro it if I used #[bench] (everything runs in the main thread) instead of #[test] (the code is run in worker thread).

That said, the following isn't causing SIGBUS either:

fn main() {
    std::thread::spawn(|| panic!()).join().unwrap();
}

@JohnColanduoni
Copy link
Contributor

JohnColanduoni commented Nov 4, 2017

This is definitely related to loading libtest in some way, I'm able to reproduce with the following:

extern crate libc;

use std::{env};
use std::ffi::CString;

fn main() {
    env::set_var("RUST_BACKTRACE", "1");

    let libtest_path = CString::new("/Users/john/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/libtest-bbaa058d367874e2.dylib").unwrap();
    let handle = unsafe {
        libc::dlopen(libtest_path.as_ptr(), libc::RTLD_NOW | libc::RTLD_GLOBAL)
    };

    if handle.is_null() {
        panic!("failed to open libtest");
    }

    assert!(false);
}

It causes the appropriate crash on both nightly and beta (as long as you dlopen the nightly libtest).

@JohnColanduoni
Copy link
Contributor

The offending line is

switch (*(uint32_t *) file_header_view.data)

Which means backtrace_get_view (which mmaps the file) is silently returning an invalid address. Now I'm very confused, because it's a bog-standard call to mmap that doesn't touch any non-constant parameters except the file descriptor (which if invalid would just cause mmap to fail). Am I missing something?

@JohnColanduoni
Copy link
Contributor

JohnColanduoni commented Nov 4, 2017

Something really hinky is definitely going on here. The memory access failure isn't the usual KERN_INVALID_ADDRESS or KERN_PROTECTION_FAILURE, it's KERN_MEMORY_ERROR (code 10) which is documented as:

/* During a page fault, the memory object indicated
		 * that the data could not be returned.  This failure
		 * may be temporary; future attempts to access this
		 * same data may succeed, as defined by the memory
		 * object.
		 */

Sure enough, according to vmmap, the given address is mapped correctly (with read access) to the contents of the dSYM for the main executable, exactly what the program is intending. It turns out that file has size 0 and apparently OS X will happily entrap you by letting you mmap past the end of a file. Sure enough, the man page does not list that case as an error condition (it is on e.g. Linux).

I have no idea why (a) the dSYM is empty (b) loading libtest has anything to do with it (it's probably something more general) but it appears this isn't actually a memory safety issue (since the access will always either succeed or cause SIGBUS). The most immediate solutions are to either:

  • Change backtrace_get_view to check the file length before mapping the file. This is inherently racy, but ultimately not that dangerous since it'll just result in a clean (though confusing) SIGBUS.
  • Disable the mmap based backtrace_get_view on macOS, since libbacktrace has a simple fallback that just reads from the file, though I half expect to get a SIGPIPE or something due to some obscure behavior.

In any case I don't think #45760 is necessary, since this is a clean crash.

@sfackler
Copy link
Member

sfackler commented Nov 4, 2017

Nice! It does seem weird that the dSYM is empty.

The SIGBUS doesn't happen upon mapping the file, but upon access, right? It seems that even if we add a length check in backtrace_get_view there's still an issue where something is performing an out of bounds read on the mapped region.

@JohnColanduoni
Copy link
Contributor

@sfackler Yes. The read is "in bounds" in the sense that (a) it is the right address and (b) there is a virtual memory mapping at that location that will prevent something else from getting mapped there. There's no danger that e.g. a malloc will use that memory for something else until munmap is called. It's essentially a guard page, with a SIGBUS instead of SIGSEGV. That said, I think the second option is better since producing panic backtraces is not a performance-sensitive area.

@sfackler
Copy link
Member

sfackler commented Nov 4, 2017

Ah, sure - I was thinking more of a "semantically" out of bounds read. The underlying file is N bytes, and libbacktrace is trying to ready byte N + 10 or whatever. It's either going to fault or you'll be reading bogus zeroed data.

I agree that turning off the mmap implementation is a reasonable approach (maybe everywhere?).

@JohnColanduoni
Copy link
Contributor

I've figured out (broadly) what produces the empty dSYM: any (AFAICT) compilation with rustc --test -g. The example I posted above was a fluke; if you run cargo test your target/debug directory now has an empty dSYM with the name of the test executable in it, which will cause this crash whenever any Rust executable in the same directory tries to produce a panic backtrace.

I opened up a separate issue for that problem here.

@SimonSapin
Copy link
Contributor

This also affects Servo (in a #[should_panic] unit test, on OS X). I can reproduce with nightly-2017-10-23 but not nightly-2017-10-22. Corresponding range: 4279e2b...8493813

Strangely, after seeing a failure if I downgrade the compiler to a normally not-affected version I’ll still see the same failure until I run cargo clean.

@kennytm
Copy link
Member

kennytm commented Nov 7, 2017

@SimonSapin may be that was rust-lang/cargo#4699

bors pushed a commit that referenced this issue Nov 8, 2017
Fixes #45731

libbacktrace uses mmap if available to map ranges of the files containing debug information. On macOS `mmap` will succeed even if the mapped range does not exist, and a SIGBUS (with an unusual EXC_BAD_ACCESS code 10) will occur when the program attempts to page in the memory. To combat this we force `libbacktrace` to be built with the simple `read` based fallback on Apple platforms.
bors added a commit that referenced this issue Nov 8, 2017
Disable `mmap`  in `libbacktrace` on Apple platforms

Fixes #45731

libbacktrace uses mmap if available to map ranges of the files containing debug information. On macOS `mmap` will succeed even if the mapped range does not exist, and a SIGBUS (with an unusual EXC_BAD_ACCESS code 10) will occur when the program attempts to page in the memory. To combat this we force `libbacktrace` to be built with the simple `read` based fallback on Apple platforms.
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Nov 12, 2017
Fixes rust-lang#45731

libbacktrace uses mmap if available to map ranges of the files containing debug information. On macOS `mmap` will succeed even if the mapped range does not exist, and a SIGBUS (with an unusual EXC_BAD_ACCESS code 10) will occur when the program attempts to page in the memory. To combat this we force `libbacktrace` to be built with the simple `read` based fallback on Apple platforms.
@orivej
Copy link

orivej commented Nov 27, 2017

The test run-pass/issue-45731.rs consistently fails on macOS 10.13 with Rust 1.22.1. (I have not checked 1.22.0, stable, or master. Please tell if I should.) Here is the output: run-pass-issue-45731.log. Discovered in NixOS/nixpkgs#31982 (comment).

@kennytm
Copy link
Member

kennytm commented Nov 27, 2017

@orivej Hi could you file a new issue for this? The log doesn't indicate a SIGBUS happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-macos Operating system: macOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants