Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failing example with regex #56

Conversation

ian-h-chamberlain
Copy link
Member

@ian-h-chamberlain ian-h-chamberlain commented Apr 12, 2022

Hey @AzureMarker and @Meziu !

I found a strange example that I can reliably reproduce a segfault with seemingly safe code, and I'm wondering if you have any ideas to help debug it as I've gotten a bit stuck.

Running gdb attached to a device I get this as the location of the crash: https://github.com/rust-lang/regex/blob/master/src/exec.rs#L299

GDB backtrace
(gdb) bt full 
#0  0x00148980 in regex::exec::ExecBuilder::build (self=...)
    at /Users/ianchamberlain/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.5.5/src/exec.rs:299
No locals.
#1  0x0013dc98 in regex::re_builder::unicode::RegexBuilder::build (self=0x8007bc8)
    at /Users/ianchamberlain/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.5.5/src/re_builder.rs:70
No locals.
#2  0x001948c0 in regex::re_unicode::Regex::new (re=...)
    at /Users/ianchamberlain/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.5.5/src/re_unicode.rs:175
No locals.
#3  0x00100728 in regex::main () at ctru-rs/examples/regex.rs:13
        _console = ctru::console::Console {context: 0x800c8b0, _screen: core::cell::RefMut<dyn ctru::gfx::Screen> {value: &mut dyn ctru::gfx::Screen {pointer: 0x8007c5c, vtable: 0x3a7068}, borrow: core::cell::BorrowRefMut {borrow: 0x8007c58}}}
        gfx = ctru::gfx::Gfx {top_screen: core::cell::RefCell<ctru::gfx::TopScreen> {borrow: core::cell::Cell<isize> {value: core::cell::UnsafeCell<isize> {value: 0}}, value: core::cell::UnsafeCell<ctru::gfx::TopScreen> {value: ctru::gfx::TopScreen}}, bottom_screen: core::cell::RefCell<ctru::gfx::BottomScreen> {borrow: core::cell::Cell<isize> {value: core::cell::UnsafeCell<isize> {value: -1}}, value: core::cell::UnsafeCell<ctru::gfx::BottomScreen> {value: ctru::gfx::BottomScreen}}, _service_handler: ctru::services::reference::ServiceReference {counter: 0x442018 <ctru::gfx::GFX_ACTIVE+4>, close: alloc::boxed::Box<(dyn core::ops::function::Fn<(), Output=()> + core::marker::Send + core::marker::Sync), alloc::alloc::Global> {pointer: 0x1, vtable: 0x3fc06c}}}

The self pointer seems to be optimized out at the call site, so I'm wondering what kind of address violation might be happening here... perhaps UB introduced by some call before the regex builder?

Let me know if you have any ideas or suggestions for tracking down this kind of issue. I couldn't find any references to issues like this upstream, etc. so I'm a bit stumped.


Edit: Note, I'm using a rebased version of feature/horizon-threads onto latest upstream master, but it seemed to reproduce on a different toolchain version I had as well. Going to to try and rebuild my toolchain again to see if it makes any difference, in case this is a miscompilation kind of thing...

@ian-h-chamberlain
Copy link
Member Author

One other data point I found, when running in Citra the log gets spammed with a ton of these:

[  10.232196] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x08007AB8 @ 0x07FFF8DC at PC 0x0014896C
[  10.232204] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x08006568 @ 0x07FFF8E0 at PC 0x0014896C
[  10.232205] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x0800553C @ 0x07FFF8E4 at PC 0x0014896C
[  10.232206] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x08003A5C @ 0x07FFF8E8 at PC 0x0014896C
[  10.232207] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x080022FF @ 0x07FFF8EC at PC 0x0014896C
[  10.232208] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x08000BA4 @ 0x07FFF8F0 at PC 0x0014896C
[  10.232210] HW.Memory <Error> core/memory.cpp:Write:358: unmapped Write32 0x08007B14 @ 0x07FFF8F4 at PC 0x0014896C
[  10.232211] HW.Memory <Error> core/memory.cpp:Read:323: unmapped Read32 @ 0x07FFF8F4 at PC 0x0014896C

The calls that are failing:

at PC 0x00000000
at PC 0x0014896C
at PC 0x00199D4C
at PC 0x00199D5C
at PC 0x00199D64
at PC 0x002E5994

Which correspond to regex::exec::ExecBuilder::build and various parts of alloc::vec::Vec<T,A>::is_empty / alloc::vec::Vec<T,A>::len, which I suspect come from the first couple of calls within ExecBuilder::build... so maybe something is freeing this struct before its method is called?

@AzureMarker
Copy link
Member

I haven't looked closely yet, but stack overflow maybe?

@ian-h-chamberlain
Copy link
Member Author

I haven't looked closely yet, but stack overflow maybe?

Huh, good hunch! I rebuilt with -C opt-level=1 and no longer see the crash. That might also explain why the Luma crash dump had no stack dump, only a code dump... I also recall seeing something on Zulip about a stack size regression in the compiler for debug mode, so perhaps that's part of the issue here too.

So I guess the question is – what's the best way to document this or avoid it in the future? I could see this being an easy stumbling block for users to hit – should we just recommend opt-level=1 as a minimum across the board? Or is there any way for us to increase the stack size, or something else?

@Meziu
Copy link
Member

Meziu commented Apr 12, 2022

I believe there are ways to actually lock the opt-level to a minimum (or at least a default value). My question is whether we actually want to do that? We absolutely will need to have a Wiki at some point, so this could just be something to add to it. The stack size isn’t variable, not in ctru’s standards at least.

@AzureMarker
Copy link
Member

I'm still interested in debugging the crash to get a real root cause, since the stack overflow thing was just the first thing that comes to mind when I hear of memory issues :).

@Meziu
Copy link
Member

Meziu commented Apr 13, 2022

Hmm, big problems. I tried running a very simple test with the rapier2d physics engine, and it looks to have the same issue. The app closes (not crash, just immediately back to the hbmenu) in debug mode, but works fine in release...

Edit: even weirder, the error isn’t cast in the main from an overflow, but in the thread_info::set function in the rt::init preconfiguration. This is very peculiar, and reminds me of the time we looked into the problems in this same area caused by faulty Mutexes. Have there been any changes to that since we last looked?

Edit 2: I don’t even know if what I’m doing makes sense, but the closing of the app is in thread_info.rs:44, as the rtassert! fails and automatically closes the app. Will look into why thread_info could be not None.

Edit 3: found absolutely nothing. I don't have the expertise to look into the built exec, but I can tell the problem only arises when a specific function (which has nothing to do with system calls) is linked. It may be an issue with something in the backstage, but I don't know. I'll just put this off for now, tell me if you find anything relative to your issue.

@Meziu
Copy link
Member

Meziu commented Apr 14, 2022

Related to my issue: the THREAD_INFO key is already Some before initialization:

(gdb) p *thread_local $10 = RefCell(borrow=0) = { value = core::option::Option<std::sys_common::thread_info::ThreadInfo>::Some(std::sys_common::thread_info::ThreadInfo {stack_guard: core::option::Option<core::ops::range::Range<usize>>::None, thread: std::thread::Thread {inner: <error reading variable: Cannot access memory at address 0x10>alloc::sync::Arc<std::thread::Inner> {ptr: core::ptr::non_null::NonNull<alloc::sync::ArcInner<std::thread::Inner>> {pointer: 0x0}, phantom: core::marker::PhantomData<alloc::sync::ArcInner<std::thread::Inner>>}}}), borrow = 0}

In this text, the only reason I can see for the mistake is <error reading variable: Cannot access memory at address 0x10> in the thread.inner field. I don't know exactly how empty enums work in Rust (under the compiler I mean), but maybe the invalid pointer is to blame for the type confusion? In either case, it shouldn't have any value at all in that stage, since it isn't yet set. This only happens in debug mode.

Edit: @AzureMarker @ian-h-chamberlain I pushed my simple example in the test/rapier-physics branch. Could you please test it in both release and debug mode with a rebased feature/horizon-threads standard llibrary? I want to know if this is a real problem or it is due to my toolchain, as it could be obstructing if it doesn't get fixed.

@AzureMarker
Copy link
Member

AzureMarker commented Apr 15, 2022

I see all the same issues with rapier-physics on feature/horizon-threads. It looks like THREAD_INFO gets initialized with garbage data, though I see the same values you see.

thread_info as *const () in thread_info::set gives 0x2d7e24, which points to an empty chunk of memory:
image

Of course, after borrowing, the borrow state gets changed, but the value itself stays zero (second green marker):
image

I did check the thread-locals example and it still ran correctly. I even tried modifying it to be like THREAD_INFO (using RefCell<Option<usize>> set to const { RefCell::new(None) }) but didn't see any issues there.

Also, do we need this commit in feature/horizon-std?
AzureMarker/rust-horizon@59aacda
I see this error otherwise, when I try to compile the example:

error[E0432]: unresolved import `crate::sys::thread_local_dtor::register_dtor`
   --> /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/thread/local.rs:906:9
    |
906 |     use crate::sys::thread_local_dtor::register_dtor;
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no `register_dtor` in `sys::unix::thread_local_dtor`

@AzureMarker
Copy link
Member

AzureMarker commented Apr 15, 2022

By the way, when I started testing the regex example I saw the thread info RefCell now has a 02 00 00 00 value instead of all zeros, and is properly noted as being None:
image

Ignore the /3ds/, I think that's because I switched to feature/horizon-std while testing the physics example. I don't think it affects anything, since I saw the same results before and after the switch.

Memory after line 45:
image

@AzureMarker
Copy link
Member

I took a look at the assembly for the regex example, but it looks like it's just doing normal stack things. Maybe you see something interesting here?
image

@AzureMarker
Copy link
Member

Actually, it looks like it might be trying to write above/before the stack limit? See how some of the registers start with 0x800.

@Meziu
Copy link
Member

Meziu commented Apr 15, 2022

Also, do we need this commit in feature/horizon-std? AzureMarker/rust-horizon@59aacda I see this error otherwise, when I try to compile the example:

error[E0432]: unresolved import `crate::sys::thread_local_dtor::register_dtor`
   --> /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/thread/local.rs:906:9
    |
906 |     use crate::sys::thread_local_dtor::register_dtor;
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no `register_dtor` in `sys::unix::thread_local_dtor`

Yes we do, as it enables std’s thread locals implementation. I don’t have your compilation problems, are you sure your toolchain is fine?

@ian-h-chamberlain
Copy link
Member Author

Actually, it looks like it might be trying to write above/before the stack limit? See how some of the registers start with 0x800.

Yeah, I noticed this as well in my testing last night, and I think it's a good thread to pull on. I found these two tools that might help us track it down, but also might need a little extra work to build for our target:


Re: AzureMarker/rust-horizon@59aacda – I was under the impression it was only needed in the branches where we have has_thread_local: true set for the target.

@ian-h-chamberlain
Copy link
Member Author

Ok, I was able to get stack-sizes working (unfortunately looks like cargo-call-stack is broken in its current state).

Here's what it's emitting for debug mode as the largest stack sizes (I just took the top several):

$ stack-sizes ../target/armv6k-nintendo-3ds/debug/examples/regex.elf | sort -rhk 2
0x00100b84      464     regex::main::h4ec025b9c4c07000
0x00101c68      128     core::ptr::read_unaligned::h06b307bd2134608a
0x00101d3c      116     core::ptr::read::h40a0bb50fca12507
0x00101820      104     core::array::iter::<impl core::iter::traits::collect::IntoIterator for [T; N]>::into_iter::h050cdbb11bd2be52
0x001009a8      96      <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::get_unchecked_mut::hb8947fdd604471eb
0x00101f8c      88      core::fmt::Arguments::new_v1::h8159bce875fcdaa4
0x001007e8      88      core::mem::forget::h2c6012ffc22f159c
0x00101630      64      core::result::Result<T,E>::expect::ha8b0307823cdaee5
0x001014d0      64      core::result::Result<T,E>::expect::h7b8c4899aa1b7bd8
0x00101428      64      core::result::Result<T,E>::expect::h33070bba6568947b
0x00101e8c      56      core::cell::RefCell<T>::try_borrow_mut::h91102446d0f89e5a

No regex::Regex in there, but it is a pretty large stack for main, so I wonder if the compiler is just inlining all the builder code aggressively even in debug mode?

Release mode for comparison is different, but doesn't point to much for regex::Regex either:

$ stack-sizes ../target/armv6k-nintendo-3ds/release/examples/regex.elf | sort -rhk 2
0x00105da0      456     regex::main::hb81208fa7f12a3ce
0x00103a20      120     core::ptr::read_unaligned::h6080a76fb361ad17
0x001040e4      108     core::ptr::read::he33a6018295191fa
0x00105b58      104     core::array::iter::<impl core::iter::traits::collect::IntoIterator for [T; N]>::into_iter::hc0354f1079d62757
0x00106ee0      96      hashbrown::raw::RawTable<T,A>::drop_elements::h6a67b3a9dc2dcab6
0x00106dec      96      hashbrown::raw::RawTable<T,A>::drop_elements::h4bd587b2874cb657
0x00106694      96      hashbrown::raw::TableLayout::calculate_layout_for::h8e29b46c93e3cd82
0x0010720c      88      hashbrown::raw::RawTableInner<A>::free_buckets::h54c346a81d333764
0x00105994      88      core::mem::forget::h18df62b3c1555c91
0x0010b6f4      80      alloc::alloc::box_free::he219f9cfd791c76e
0x0010b1c4      80      alloc::alloc::box_free::h2af0f7fe7fac992a
0x0010aff8      80      alloc::alloc::box_free::h043a15a07652b969
0x0010c598      72      core::fmt::Arguments::new_v1::h784dfd6d275d372f
0x0010b610      72      alloc::alloc::box_free::hd8f1d3461ad9b4ba
0x0010b2b0      72      alloc::alloc::box_free::h589057dd38770327
0x0010b0e4      72      alloc::alloc::box_free::h12198c424597ffcc
0x0010b7e0      64      alloc::alloc::box_free::hfd908456505d4fef

Not much to go off, but an extra couple of data points at least...

@Meziu
Copy link
Member

Meziu commented Apr 16, 2022

By the way, when I started testing the regex example I saw the thread info RefCell now has a 02 00 00 00 value instead of all zeros

I’ve tried editing the memory to be that way before the read, but the result is the same. These two issues we are having both have in common only the difference between release and debug builds, with the specific case being debug not working. We’ve had similar issues in the past, linked to differences between the two builds (like integer operations overflows or debug-only macros).

What are other differences, in a fully clean and working environment, between these two build modes?

In my specific case, my issues seem to be generated once a specific function call is present (and specifically not called) in the main function. The stack isn’t yet affected by my work (being the case where my main doesn’t even run), and it looks to be barely a difference in the stack of the regex test.

@ian-h-chamberlain could you tell me the size of the final executable sent to 3dslink?
The size of the executable has a notable difference between builds, and my “bad” build is way bigger than builds without the breaking function call… This is a wacky shot in the dark though…

@AzureMarker
Copy link
Member

@Meziu what's the function that triggers the issue for you when it gets linked in?

@Meziu
Copy link
Member

Meziu commented Apr 16, 2022

b417e32#diff-a02ab88c764a3d8abd0605777eb0ae6bde90d3c1c8d81ff972f8e4c9447fc6eaR19

This line breaks my program, yet it doesn't ever run, since the abort happens in the rt::init configuration. This means the presence of this specific call breaks something in the building process.

@ian-h-chamberlain
Copy link
Member Author

@ian-h-chamberlain could you tell me the size of the final executable sent to 3dslink?
The size of the executable has a notable difference between builds, and my “bad” build is way bigger than builds without the breaking function call… This is a wacky shot in the dark though…

In my case, the "bad" debug build is also a bit larger, but that is to be expected for debug vs release builds, I would think?

$ stat -x target/armv6k-nintendo-3ds/debug/examples/regex.elf
  File: "target/armv6k-nintendo-3ds/debug/examples/regex.elf"
  Size: 31504108     FileType: Regular File
  Mode: (0775/-rwxrwxr-x)         Uid: (  501/ianchamberlain)  Gid: (   20/   staff)
Device: 1,7   Inode: 12951274290    Links: 1
Access: Fri Apr 15 20:51:19 2022
Modify: Fri Apr 15 20:49:25 2022
Change: Fri Apr 15 20:49:25 2022

$ stat -x target/armv6k-nintendo-3ds/release/examples/regex.elf 
  File: "target/armv6k-nintendo-3ds/release/examples/regex.elf"
  Size: 3025148      FileType: Regular File
  Mode: (0775/-rwxrwxr-x)         Uid: (  501/ianchamberlain)  Gid: (   20/   staff)
Device: 1,7   Inode: 12951276389    Links: 1
Access: Fri Apr 15 20:55:30 2022
Modify: Fri Apr 15 20:54:16 2022
Change: Fri Apr 15 20:54:16 2022

@ian-h-chamberlain
Copy link
Member Author

Interesting... my example in https://github.com/ian-h-chamberlain/console_3ds that I mentioned in #58 seems to have this issue as well. I was running it fine (albeit slow) in debug mode before, but now I get a crash in debug mode. I haven't changed toolchains or anything.

One thing I did try changing was the opt-level of the dependency that has the crash, but it doesn't seem to matter, unless I change the opt-level of the whole executable.

The backtrace, interestingly has a null pointer for self, which seems to be the root of the issue:

(gdb) p &self.linebreaker
$7 = (*mut fontdue::unicode::Linebreaker) 0x8007d32
(gdb) down
#0  0x0012b668 in fontdue::unicode::Linebreaker::next (self=0x0, codepoint=21) at /Users/ianchamberlain/.cargo/registry/src/github.com-1ecc6299db9ec823/fontdue-0.7.2/src/unicode/mod.rs:124
124         pub fn next(&mut self, codepoint: char) -> LinebreakData {
(gdb) p self 
$8 = (*mut fontdue::unicode::Linebreaker) 0x0

This seems a bit strange to me, does it match anything you've seen in your crashes? I have to wonder if we're introducing undefined behavior somewhere in init() or similar code, that's allowing the compiler to optimize out some of these calls or something.

In this case the executables are way more different in size:

$ ls target/armv6k-nintendo-3ds/*/examples/basic.elf
-rwxrwxr-x  1 ianchamberlain  staff    22M Apr 16 17:54 target/armv6k-nintendo-3ds/debug/examples/basic.elf*
-rwxrwxr-x  1 ianchamberlain  staff   1.7M Apr 16 17:51 target/armv6k-nintendo-3ds/release/examples/basic.elf*

I will still continue to investigate...

@Meziu
Copy link
Member

Meziu commented Apr 22, 2022

I've been looking at the disassembly (objdump) of builds (rapier-physics) with/without that pesky line.
The result (about the THREAD_INFO related parts) is in this gist:
https://gist.github.com/Meziu/80276423f72c3ba364543354bc6d971e

The issue is immediate to the eye:
The command andeq r0, r0, ip, lsr ip is replaced by an ; <UNDEFINED> instruction: 0x00000db8. So the problem is in the compilation, and not the runtime.

@AzureMarker @ian-h-chamberlain Any ideas at all at why it could be happening? This may be the same issue as the regex test…

Since this looks to be related to the compiler/llvm we may try opening a Zulip thread.

@AzureMarker
Copy link
Member

AzureMarker commented Apr 24, 2022

My first thought is that it got replaced with an abort call (which usually uses an undefined/illegal instruction). Either way it would be good to confirm that it's not a miscompilation.

(was busy this past week - sorry for the silence)

@AzureMarker
Copy link
Member

It would be interesting to look at the MIR for that code. I'll try to work on that soon.

@Meziu
Copy link
Member

Meziu commented Apr 24, 2022

Great. Still, I don’t think it is an abort, as the code never aborts in my example, and it wouldn’t be able to know whether it needs to panic at compile-time.

@ian-h-chamberlain
Copy link
Member Author

Yeah, I was thinking it might be useful to look at the generated IR as well (cargo 3ds rustc -- --emit=llvm-ir seems to mostly work?), but I haven't had a chance to dig in yet.

Regarding the abort call, I would expect a UDF to get generated rather than some unknown undefined instruction, right? It seems like that would be more stable in the case where the compiler knows it wants to generate an abort?

I also wondered if this might simply be inline data rather than instructions? as it seems like <UNDEFINED> instruction is actually fairly common in objdump output, from a bit of googling, but I have very little experience with ARM so I'm not too sure what's expected here.

@Meziu
Copy link
Member

Meziu commented Apr 25, 2022

I also wondered if this might simply be inline data rather than instructions? as it seems like <UNDEFINED> instruction is actually fairly common in objdump output, from a bit of googling, but I have very little experience with ARM so I'm not too sure what's expected here.

I don’t know what it actually means by UNDEFINED instruction, but the simple fact it is only present on the bad build means it is at the very least connected to the original issue, though maybe it isn’t by itself.

@Meziu
Copy link
Member

Meziu commented Apr 26, 2022

Info I forgot to mention: trying out different opt-levels for the release profile, I can state the issue only arises with level of optimization 0. The rest of the release building process has no impact on the issue then, as it is only how it is optimized to give problems. This also reminds me of how libctru examples always use -O2, but that might be unrelated as there isn't a direct way to build in debug mode.

@AzureMarker
Copy link
Member

I looked back on what I wrote for the rapier-physics problem (and did some experimentation with mir and llvm-ir). I think it is correctly zero-initializing the thread info, but for some reason it thinks it's a Some instead of a None.

The MIR and LLVM-IR both show the thread info getting zero-initialized. I checked the RefCell struct's const new function and this looks correct.

Here's the is_some code (which is_none calls and returns negated). See how r0 was 0 but it still went down the true/1 branch? It seems to be comparing it (the discriminant) against 2, but I thought the None discriminant is 0? (it's the first variant listed in the enum)
image

@AzureMarker
Copy link
Member

AzureMarker commented Apr 27, 2022

Here's the MIR for is_some:

fn option::<impl at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:533:1: 1684:2>::is_some(_1: &Option<T>) -> bool {
    debug self => _1;                    // in scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:553:26: 553:31
    let mut _0: bool;                    // return place in scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:553:36: 553:40
    let mut _2: isize;                   // in scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:554:25: 554:32

    bb0: {
        _2 = discriminant((*_1));        // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:554:18: 554:23
        switchInt(move _2) -> [1_isize: bb2, otherwise: bb1]; // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/macros/mod.rs:344:9: 344:9
    }

    bb1: {
        _0 = const false;                // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/macros/mod.rs:346:18: 346:23
        goto -> bb3;                     // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/macros/mod.rs:346:18: 346:23
    }

    bb2: {
        _0 = const true;                 // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/macros/mod.rs:345:48: 345:52
        goto -> bb3;                     // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/macros/mod.rs:345:48: 345:52
    }

    bb3: {
        return;                          // scope 0 at /home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/option.rs:555:6: 555:6
    }
}

And the LLVM-IR for good measure:

; core::option::Option<T>::is_some
; Function Attrs: inlinehint uwtable
define dso_local zeroext i1 @"_ZN4core6option15Option$LT$T$GT$7is_some17h8bc3b46d96e72dd1E"(%"core::option::Option<sys_common::thread_info::ThreadInfo>"* align 4 %self) unnamed_addr #0 !dbg !72789 {
start:
  %self.dbg.spill = alloca %"core::option::Option<sys_common::thread_info::ThreadInfo>"*, align 4
  %0 = alloca i8, align 1
  store %"core::option::Option<sys_common::thread_info::ThreadInfo>"* %self, %"core::option::Option<sys_common::thread_info::ThreadInfo>"** %self.dbg.spill, align 4
  call void @llvm.dbg.declare(metadata %"core::option::Option<sys_common::thread_info::ThreadInfo>"** %self.dbg.spill, metadata !72791, metadata !DIExpression()), !dbg !72792
  %1 = bitcast %"core::option::Option<sys_common::thread_info::ThreadInfo>"* %self to i32*, !dbg !72793
  %2 = load i32, i32* %1, align 4, !dbg !72793, !range !11363, !noundef !44
  %3 = sub i32 %2, 2, !dbg !72793
  %4 = icmp eq i32 %3, 0, !dbg !72793
  %_2 = select i1 %4, i32 0, i32 1, !dbg !72793
  %5 = icmp eq i32 %_2, 1, !dbg !72794
  br i1 %5, label %bb2, label %bb1, !dbg !72794

bb2:                                              ; preds = %start
  store i8 1, i8* %0, align 1, !dbg !72794
  br label %bb3, !dbg !72794

bb1:                                              ; preds = %start
  store i8 0, i8* %0, align 1, !dbg !72794
  br label %bb3, !dbg !72794

bb3:                                              ; preds = %bb2, %bb1
  %6 = load i8, i8* %0, align 1, !dbg !72795, !range !6228, !noundef !44
  %7 = trunc i8 %6 to i1, !dbg !72795
  ret i1 %7, !dbg !72795
}

Edit: and the assembly from Rust:

_ZN4core6option15Option$LT$T$GT$7is_some17h8bc3b46d96e72dd1E:
.Lfunc_begin3920:
	.loc	100 553 0
	.fnstart
	.cfi_startproc
	.pad	#8
	sub	sp, sp, #8
	.cfi_def_cfa_offset 8
	str	r0, [sp, #4]
.Ltmp15817:
	.loc	100 554 18 prologue_end
	ldr	r0, [r0]
	.loc	100 554 9 is_stmt 0
	cmp	r0, #2
	beq	.LBB3920_2
	b	.LBB3920_1
.LBB3920_1:
	.loc	100 0 9
	mov	r0, #1
	.loc	100 554 9
	strb	r0, [sp, #3]
	b	.LBB3920_3
.LBB3920_2:
	.loc	100 0 9
	mov	r0, #0
	.loc	100 554 9
	strb	r0, [sp, #3]
	b	.LBB3920_3
.LBB3920_3:
	.loc	100 555 6 is_stmt 1
	ldrb	r0, [sp, #3]
	add	sp, sp, #8
	bx	lr
.Ltmp15818:
.Lfunc_end3920:
	.size	_ZN4core6option15Option$LT$T$GT$7is_some17h8bc3b46d96e72dd1E, .Lfunc_end3920-_ZN4core6option15Option$LT$T$GT$7is_some17h8bc3b46d96e72dd1E
	.cfi_endproc
	.fnend

@AzureMarker
Copy link
Member

AzureMarker commented Apr 27, 2022

Here's another interesting screenshot of debugging the release build (with debug = true for release profile in Cargo.toml). The option's discriminant is set to 2 whereas in debug mode it was set to 0:
image

A second view, from my IDE:
image

And here's the relevant LLVM-IR (for thread_info::set):

thread_info::set
; std::sys_common::thread_info::set
; Function Attrs: uwtable
define dso_local void @_ZN3std10sys_common11thread_info3set17hd2f3cc918c8dadbfE(%"core::option::Option<core::ops::range::Range<usize>>"* noalias nocapture noundef readonly dereferenceable(12) %stack_guard, i64* noundef nonnull %thread) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
start:
  %e.i.i = alloca %"thread::local::AccessError", align 1
  %_27.i.i.i = alloca %"core::fmt::Arguments", align 4
  %_23.i.i.i = alloca [1 x { i8*, i32* }], align 4
  %_16.i.i.i = alloca %"core::fmt::Arguments", align 4
  %_13.i.i.i = alloca %"core::result::Result<(), io::error::Error>", align 4
  %_15.i.i = alloca %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]", align 4
  %_5.i = alloca %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]", align 4
  %_5.sroa.0.0..sroa_cast5 = bitcast %"core::option::Option<core::ops::range::Range<usize>>"* %stack_guard to i8*
  %0 = bitcast %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]"* %_5.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %0), !noalias !22640
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 4 dereferenceable(12) %0, i8* noundef nonnull align 4 dereferenceable(12) %_5.sroa.0.0..sroa_cast5, i32 12, i1 false)
  %_5.sroa.6.0..sroa_idx9 = getelementptr inbounds %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]", %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]"* %_5.i, i32 0, i32 1
  store i64* %thread, i64** %_5.sroa.6.0..sroa_idx9, align 4
  tail call void @llvm.experimental.noalias.scope.decl(metadata !22643)
  %1 = load i8, i8* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5STATE17h6afa64b14c91fc0fE.0, align 1, !noalias !22646
  switch i8 %1, label %bb7.i.i [
    i8 0, label %bb5.i.i.i
    i8 1, label %bb4.i.i
  ]

bb5.i.i.i:                                        ; preds = %start
; invoke std::sys::unix::thread_local_dtor::register_dtor
  invoke void @_ZN3std3sys4unix17thread_local_dtor13register_dtor17heedc3195ae8052f8E(i8* getelementptr inbounds (<{ [8 x i8], [12 x i8] }>, <{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE, i32 0, i32 0, i32 0), void (i8*)* noundef nonnull @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit7destroy17h9dcf8fbb41c9d82aE)
          to label %.noexc.i.i unwind label %bb11.i.i, !noalias !22646

.noexc.i.i:                                       ; preds = %bb5.i.i.i
  store i8 1, i8* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5STATE17h6afa64b14c91fc0fE.0, align 1, !noalias !22646
  br label %bb4.i.i

cleanup.body.i.i:                                 ; preds = %cleanup2.i.i.i, %bb23.thread.i.i.i
  %.pn.pn.pn35.i.i.i = phi { i8*, i32 } [ %5, %bb23.thread.i.i.i ], [ %6, %cleanup2.i.i.i ]
; call core::ptr::drop_in_place<std::thread::Thread>
  call void @"_ZN4core3ptr40drop_in_place$LT$std..thread..Thread$GT$17hb1fe3db54645647cE"(i64** nonnull %_5.sroa.6.0..sroa_idx10) #51, !noalias !22646
  br label %bb10.i.i

bb4.i.i:                                          ; preds = %.noexc.i.i, %start
  %2 = bitcast %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]"* %_15.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %2), !noalias !22646
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 4 dereferenceable(12) %2, i8* noundef nonnull align 4 dereferenceable(12) %_5.sroa.0.0..sroa_cast5, i32 12, i1 false)
  %_5.sroa.6.0..sroa_idx10 = getelementptr inbounds %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]", %"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]"* %_15.i.i, i32 0, i32 1
  store i64* %thread, i64** %_5.sroa.6.0..sroa_idx10, align 4
  %out.i.i.i = bitcast %"thread::local::AccessError"* %e.i.i to %"sys::unix::stdio::Stderr"*
  %borrow.val.i.i.i.i.i.i = load i32, i32* bitcast (<{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE to i32*), align 4, !noalias !22647
  %3 = icmp eq i32 %borrow.val.i.i.i.i.i.i, 0
  br i1 %3, label %bb3.i.i.i, label %bb1.i.i.i.i.i

bb1.i.i.i.i.i:                                    ; preds = %bb4.i.i
  %4 = bitcast %"thread::local::AccessError"* %e.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %4), !noalias !22647
  %_6.0.i.i.i.i.i = bitcast %"thread::local::AccessError"* %e.i.i to {}*
; invoke core::result::unwrap_failed
  invoke void @_ZN4core6result13unwrap_failed17h87ebc7c661748118E([0 x i8]* noalias noundef nonnull readonly align 1 bitcast (<{ [16 x i8] }>* @alloc18182 to [0 x i8]*), i32 16, {}* noundef nonnull align 1 %_6.0.i.i.i.i.i, [3 x i32]* noalias noundef readonly align 4 dereferenceable(12) bitcast (<{ i8*, [8 x i8], i8* }>* @vtable.r to [3 x i32]*), %"core::panic::location::Location"* noalias noundef nonnull readonly align 4 dereferenceable(16) bitcast (<{ i8*, [12 x i8] }>* @alloc18905 to %"core::panic::location::Location"*)) #50
          to label %.noexc.i.i.i unwind label %bb23.thread.i.i.i, !noalias !22652

.noexc.i.i.i:                                     ; preds = %bb1.i.i.i.i.i
  unreachable

bb23.thread.i.i.i:                                ; preds = %bb1.i.i.i.i.i
  %5 = landingpad { i8*, i32 }
          cleanup
  br label %cleanup.body.i.i

cleanup2.i.i.i:                                   ; preds = %bb10.i.i.i, %bb9.i.i.i
  %6 = landingpad { i8*, i32 }
          cleanup
; call core::ptr::drop_in_place<core::cell::RefMut<core::option::Option<std::sys_common::thread_info::ThreadInfo>>>
  call fastcc void @"_ZN4core3ptr115drop_in_place$LT$core..cell..RefMut$LT$core..option..Option$LT$std..sys_common..thread_info..ThreadInfo$GT$$GT$$GT$17h1e9efa271063f7a2E"(i32* nonnull bitcast (<{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE to i32*)) #51, !noalias !22652
  br label %cleanup.body.i.i

bb3.i.i.i:                                        ; preds = %bb4.i.i
  store i32 -1, i32* bitcast (<{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE to i32*), align 4, !alias.scope !22653, !noalias !22647
  %_8.idx.val.i.i.i = load i32, i32* bitcast (i8* getelementptr inbounds (<{ [8 x i8], [12 x i8] }>, <{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE, i32 0, i32 0, i32 4) to i32*), align 4, !noalias !22652
  %.not.i.not.i.i.i.i = icmp eq i32 %_8.idx.val.i.i.i, 2
  br i1 %.not.i.not.i.i.i.i, label %"_ZN3std6thread5local17LocalKey$LT$T$GT$4with17h90ccdff8c97fd3deE.exit", label %bb9.i.i.i

bb9.i.i.i:                                        ; preds = %bb3.i.i.i
  %7 = getelementptr inbounds %"core::result::Result<(), io::error::Error>", %"core::result::Result<(), io::error::Error>"* %_13.i.i.i, i32 0, i32 0
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %7), !noalias !22652
  %8 = bitcast %"core::fmt::Arguments"* %_16.i.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %8), !noalias !22652
  %9 = bitcast [1 x { i8*, i32* }]* %_23.i.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %9), !noalias !22652
  %10 = bitcast %"core::fmt::Arguments"* %_27.i.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %10), !noalias !22652
  %11 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 0, i32 0
  store [0 x { [0 x i8]*, i32 }]* bitcast (<{ i8*, [4 x i8] }>* @alloc8873 to [0 x { [0 x i8]*, i32 }]*), [0 x { [0 x i8]*, i32 }]** %11, align 4, !alias.scope !22656, !noalias !22659
  %12 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 0, i32 1
  store i32 1, i32* %12, align 4, !alias.scope !22656, !noalias !22659
  %13 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 1, i32 0
  store i32* null, i32** %13, align 4, !alias.scope !22656, !noalias !22659
  %14 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 1, i32 1
  store i32 0, i32* %14, align 4, !alias.scope !22656, !noalias !22659
  %15 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 2, i32 0
  store [0 x { i8*, i32* }]* bitcast (<{}>* @alloc11409 to [0 x { i8*, i32* }]*), [0 x { i8*, i32* }]** %15, align 4, !alias.scope !22656, !noalias !22659
  %16 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_27.i.i.i, i32 0, i32 2, i32 1
  store i32 0, i32* %16, align 4, !alias.scope !22656, !noalias !22659
  %17 = bitcast [1 x { i8*, i32* }]* %_23.i.i.i to %"core::fmt::Arguments"**
  store %"core::fmt::Arguments"* %_27.i.i.i, %"core::fmt::Arguments"** %17, align 4, !noalias !22652
  %18 = getelementptr inbounds [1 x { i8*, i32* }], [1 x { i8*, i32* }]* %_23.i.i.i, i32 0, i32 0, i32 1
  store i32* bitcast (i1 (%"core::fmt::Arguments"*, %"core::fmt::Formatter"*)* @"_ZN59_$LT$core..fmt..Arguments$u20$as$u20$core..fmt..Display$GT$3fmt17hce6cd8b639ce6a3dE" to i32*), i32** %18, align 4, !noalias !22652
  %19 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 0, i32 0
  store [0 x { [0 x i8]*, i32 }]* bitcast (<{ i8*, [4 x i8], i8*, [4 x i8] }>* @alloc10020 to [0 x { [0 x i8]*, i32 }]*), [0 x { [0 x i8]*, i32 }]** %19, align 4, !alias.scope !22662, !noalias !22665
  %20 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 0, i32 1
  store i32 2, i32* %20, align 4, !alias.scope !22662, !noalias !22665
  %21 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 1, i32 0
  store i32* null, i32** %21, align 4, !alias.scope !22662, !noalias !22665
  %22 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 1, i32 1
  store i32 0, i32* %22, align 4, !alias.scope !22662, !noalias !22665
  %23 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 2, i32 0
  %24 = bitcast [0 x { i8*, i32* }]** %23 to [1 x { i8*, i32* }]**
  store [1 x { i8*, i32* }]* %_23.i.i.i, [1 x { i8*, i32* }]** %24, align 4, !alias.scope !22662, !noalias !22665
  %25 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_16.i.i.i, i32 0, i32 2, i32 1
  store i32 1, i32* %25, align 4, !alias.scope !22662, !noalias !22665
; invoke std::io::Write::write_fmt
  invoke void @_ZN3std2io5Write9write_fmt17h45a15172362812f0E(%"core::result::Result<(), io::error::Error>"* noalias nocapture noundef nonnull sret(%"core::result::Result<(), io::error::Error>") dereferenceable(8) %_13.i.i.i, %"sys::unix::stdio::Stderr"* noalias noundef nonnull align 1 %out.i.i.i, %"core::fmt::Arguments"* noalias nocapture noundef nonnull dereferenceable(24) %_16.i.i.i)
          to label %bb10.i.i.i unwind label %cleanup2.i.i.i, !noalias !22652

bb10.i.i.i:                                       ; preds = %bb9.i.i.i
  call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %8), !noalias !22652
; invoke core::ptr::drop_in_place<core::result::Result<(),std::io::error::Error>>
  invoke fastcc void @"_ZN4core3ptr81drop_in_place$LT$core..result..Result$LT$$LP$$RP$$C$std..io..error..Error$GT$$GT$17h431ad2a12340277fE"(%"core::result::Result<(), io::error::Error>"* nonnull %_13.i.i.i)
          to label %bb11.i.i.i unwind label %cleanup2.i.i.i, !noalias !22652

bb11.i.i.i:                                       ; preds = %bb10.i.i.i
  call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %10), !noalias !22652
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %9), !noalias !22652
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %7), !noalias !22652
; call std::sys::unix::abort_internal
  call void @_ZN3std3sys4unix14abort_internal17h4245ef31ee184d64E() #50, !noalias !22652
  unreachable

bb7.i.i:                                          ; preds = %start
  tail call void @llvm.experimental.noalias.scope.decl(metadata !22668) #49
  %_5.i.i.i.i.i.i.i = bitcast i64* %thread to i32*
  %26 = atomicrmw sub i32* %_5.i.i.i.i.i.i.i, i32 1 release, align 4, !noalias !22671
  %27 = icmp eq i32 %26, 1
  br i1 %27, label %bb4.i.i.i.i.i.i, label %bb1.i.i

bb4.i.i.i.i.i.i:                                  ; preds = %bb7.i.i
  fence acquire
  %self.val1.i.i.i.i.i.i = load i64*, i64** %_5.sroa.6.0..sroa_idx9, align 4, !alias.scope !22672, !noalias !22640
; call alloc::sync::Arc<T>::drop_slow
  tail call fastcc void @"_ZN5alloc4sync12Arc$LT$T$GT$9drop_slow17h81ef5329a1037c41E"(i64* %self.val1.i.i.i.i.i.i) #49, !noalias !22671
  br label %bb1.i.i

bb10.i.i:                                         ; preds = %bb11.i.i, %cleanup.body.i.i
  %eh.lpad-body3.i.i = phi { i8*, i32 } [ %28, %bb11.i.i ], [ %.pn.pn.pn35.i.i.i, %cleanup.body.i.i ]
  resume { i8*, i32 } %eh.lpad-body3.i.i

bb11.i.i:                                         ; preds = %bb5.i.i.i
  %28 = landingpad { i8*, i32 }
          cleanup
; call core::ptr::drop_in_place<std::sys_common::thread_info::set::{{closure}}>
  call fastcc void @"_ZN4core3ptr83drop_in_place$LT$std..sys_common..thread_info..set..$u7b$$u7b$closure$u7d$$u7d$$GT$17he77886effe6b212dE"(%"[closure@/home/mark/media/Projects/3DS/rust-horizon/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/std/src/sys_common/thread_info.rs:42:22: 46:6]"* nonnull %_5.i) #51, !noalias !22640
  br label %bb10.i.i

bb1.i.i:                                          ; preds = %bb4.i.i.i.i.i.i, %bb7.i.i
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %0), !noalias !22640
  %29 = bitcast %"thread::local::AccessError"* %e.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 0, i8* nonnull %29), !noalias !22640
  %_6.0.i.i = bitcast %"thread::local::AccessError"* %e.i.i to {}*
; call core::result::unwrap_failed
  call void @_ZN4core6result13unwrap_failed17h87ebc7c661748118E([0 x i8]* noalias noundef nonnull readonly align 1 bitcast (<{ [70 x i8] }>* @alloc18057 to [0 x i8]*), i32 70, {}* noundef nonnull align 1 %_6.0.i.i, [3 x i32]* noalias noundef readonly align 4 dereferenceable(12) bitcast (<{ i8*, [8 x i8], i8* }>* @vtable.q to [3 x i32]*), %"core::panic::location::Location"* noalias noundef readonly align 4 dereferenceable(16) bitcast (<{ i8*, [12 x i8] }>* @alloc18051 to %"core::panic::location::Location"*)) #50, !noalias !22640
  unreachable

"_ZN3std6thread5local17LocalKey$LT$T$GT$4with17h90ccdff8c97fd3deE.exit": ; preds = %bb3.i.i.i
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 4 dereferenceable(12) getelementptr inbounds (<{ [8 x i8], [12 x i8] }>, <{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE, i32 0, i32 0, i32 4), i8* noundef nonnull align 4 dereferenceable(12) %_5.sroa.0.0..sroa_cast5, i32 12, i1 false)
  store i64* %thread, i64** bitcast (i8* getelementptr inbounds (<{ [8 x i8], [12 x i8] }>, <{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE, i32 0, i32 1, i32 8) to i64**), align 4, !noalias !22652
  store i32 0, i32* bitcast (<{ [8 x i8], [12 x i8] }>* @_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE to i32*), align 4, !alias.scope !22673, !noalias !22652
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %2), !noalias !22646
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %0), !noalias !22640
  ret void
}

Interestingly, it looks like the thread local and option logic got inlined. Here's some more LLVM-IR lines which define the initial value of the thread local:

@_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h0518af5de06cb2ebE = internal thread_local global <{ [8 x i8], [12 x i8] }> <{ [8 x i8] c"\00\00\00\00\02\00\00\00", [12 x i8] undef }>, align 4
@_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5STATE17h6afa64b14c91fc0fE.0 = internal thread_local unnamed_addr global i8 0, align 1

For comparison, here are the same lines from the debug build:

@_ZN3std10sys_common11thread_info11THREAD_INFO7__getit3VAL17h7bb0bc4174acaf2cE = internal thread_local global <{ [8 x i8], [12 x i8] }> <{ [8 x i8] c"\00\00\00\00\02\00\00\00", [12 x i8] undef }>, align 4, !dbg !3680
@_ZN3std10sys_common11thread_info11THREAD_INFO7__getit5STATE17hf3ec8b00011c9df5E = internal thread_local global <{ [1 x i8] }> zeroinitializer, align 1, !dbg !3732

It looks like both debug and release LLVM-IR is setting a 2 value in the thread local...

@Meziu
Copy link
Member

Meziu commented Apr 27, 2022

Hmm, I’ve tried doing some research in the rapier2d crate to look for anything related to our issue, but thread_local! seems to be never used, as well as other threading related functionality. Also, this issue is generated at compile-time without optimisations, so either LLVM or the linker are pulling something weird on us.

@AzureMarker Let’s open a Zulip thread about this. (I’m busy today, so you can do it if you want).

@AzureMarker
Copy link
Member

AzureMarker commented Apr 29, 2022

I've been a bit busy as well, but there's also a few things I want to try before making a thread (we also need to make some sort of summary so they know what we're talking about). For example, maybe the 3DS has a really small thread local store? I also want to inspect the binaries to see if the initial state of the thread locals is the same.

Edit: I see @ian-h-chamberlain has already started a thread for the regex issue (with no replies):
https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Debugging.20optimized-only.20crashes.3F

@Meziu
Copy link
Member

Meziu commented Apr 29, 2022

For example, maybe the 3DS has a really small thread local store?

Even if that was true, it’d have no importance in this issue. The problem is caused before any actual LocalKey is created, and only shows problems with a specific optimisation level, hinting at a problem during compilation.

I understand wanting to search for other possibilities ourselves (and am pleased by any help you bring in), but for me this research hit a wall, so I would find external help very useful.

@ian-h-chamberlain
Copy link
Member Author

Edit: I see @ian-h-chamberlain has already started a thread for the regex issue (with no replies):
rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Debugging.20optimized-only.20crashes.3F

Yeah, if you two want to bump or add to that discussion there's always a chance someone else will notice it and chime in (probably scrolled off the backlog for most people by now).


Sorry for radio silence from my end recently, but meanwhile I got my example down to this (almost minimal, there might be more a little more I could do) reproduction:

use regex::RegexBuilder;

const RE: &str = r"(?P<key>.+)=(?P<value>.+)";

fn main() {
    pthread_3ds::init();
    linker_fix_3ds::init();

    let builder = RegexBuilder::new(RE);
    let _regex = builder.build();
}

I've been trying to debug the LLVM IR following the rustc dev guide and have at least gotten to a point where some N number of LLVM optimization passes generates a segfault but MAX_N does not (for rustc -C opt-level=1). I am hoping to nail down the specific pass that causes it but haven't had time to do it yet (bisecting is hard to automate in this case, lol!).

I'm not sure in this case it has anything to do with thread-locals, but I'll see where the investigation leads me. Unfortunately it seems unusual that a miscompilation might be fixed by an optimization pass rather than caused by it, but that's currently my best guess for what's happening. Hoping to get a chance to nail it down this weekend, will post back here once I've tried.

@Meziu
Copy link
Member

Meziu commented Apr 30, 2022

Yeah, it looks to be more or less the same issue, differing only in where the problem generates from. Quite interesting indeed.

@ian-h-chamberlain
Copy link
Member Author

ian-h-chamberlain commented May 1, 2022

Ok, I've made some progress!

I was able to find an optimization pass after which -Z verify-llvm-ir=yes no longer fails, which I'm still not 100% sure is important, but it does indicate a change in behavior.

FLAGS=(
    -C opt-level=1
    -C debuginfo=0
    -Z verify-llvm-ir=yes
)

Final bisect was -opt-bisect-limit=294912 (which may be meaningless for other build environments / flags, but it's something).

Rustc invocation that failed with SIGSEGV:
rustc \
  --crate-name regex \
  --edition=2018 /Users/ianchamberlain/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.5.5/src/lib.rs \
  --error-format=json \
  --json=diagnostic-rendered-ansi,future-incompat \
  --crate-type lib \
  --emit=dep-info,metadata,link \
  -C embed-bitcode=no \
  -C debuginfo=2 \
  --cfg 'feature="aho-corasick"' \
  --cfg 'feature="default"' \
  --cfg 'feature="memchr"' \
  --cfg 'feature="perf"' \
  --cfg 'feature="perf-cache"' \
  --cfg 'feature="perf-dfa"' \
  --cfg 'feature="perf-inline"' \
  --cfg 'feature="perf-literal"' \
  --cfg 'feature="std"' \
  --cfg 'feature="unicode"' \
  --cfg 'feature="unicode-age"' \
  --cfg 'feature="unicode-bool"' \
  --cfg 'feature="unicode-case"' \
  --cfg 'feature="unicode-gencat"' \
  --cfg 'feature="unicode-perl"' \
  --cfg 'feature="unicode-script"' \
  --cfg 'feature="unicode-segment"' \
  -C metadata=d43bcfb0bfe156b4 \
  -C extra-filename=-d43bcfb0bfe156b4 \
  --out-dir /Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/armv6k-nintendo-3ds/debug/deps \
  --target armv6k-nintendo-3ds \
  -L dependency=/Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/armv6k-nintendo-3ds/debug/deps \
  -L dependency=/Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/debug/deps \
  --extern aho_corasick=/Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/armv6k-nintendo-3ds/debug/deps/libaho_corasick-32b97d304112fb85.rmeta \
  --extern memchr=/Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/armv6k-nintendo-3ds/debug/deps/libmemchr-43ddaf7574a56279.rmeta \
  --extern regex_syntax=/Users/ianchamberlain/Documents/Development/3ds/crash-repro/target/armv6k-nintendo-3ds/debug/deps/libregex_syntax-7ba7dfda0b6f2707.rmeta \
  --cap-lints allow \
  -C opt-level=1 \
  -C debuginfo=0 \
  -Z verify-llvm-ir=yes \
  -C llvm-args=-opt-bisect-limit=294912

I've been trying to follow some of the steps in https://gist.github.com/luqmana/be1af5b64d2cda5a533e3e23a7830b44 to debug further, but haven't had too much luck with the emitted llvm-ir (--emit=llvm-ir seems to prevent the issue, lol).

However, I did also find something else interesting: the issue does not reproduce if I set -C lto=off or set -C codegen-units=1, which makes me think the issue is really happening during LTO, but is hidden when enough prior optimization passes have run. I wondered if, since we are linking in the standard library and also devkitPro libraries (built with gcc, not clang) there is some issue with cross-language LTO.

I found this resource which references cross-language LTO in Firefox (which I think is built with LLVM). In particular this paragraph seems relevant:

You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within rustc would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for libstd by default.

In the case of cargo-3ds, we are using -Zbuild-std which I assume would just use the default number of codegen units (256 for incremental builds, 16 for non-incremental) and thin-local LTO for building core and std. In my case, I am reproducing using a custom built toolchain, but it was built with incremental = true, so probably also used thin-local LTO.

I suspect, and hope to prove, that building std using codegen-units=1 prevents this problem. There are a few issues related to this, which makes me think we should set the number of units to 1 if at all possible:


I'll keep trying to prove my theory more precisely, but meanwhile if @Meziu @AzureMarker you can try building with either RUSTFLAGS="-C codegen-units=1" or RUSTFLAGS="-C lto=off" to see if that resolves the rapier2d issue, perhaps that will help corroborate?

@AzureMarker
Copy link
Member

Thanks for the update. I just tried both of those RUSTFLAGS values in debug mode and it didn't fix the rapier-physics issue for me :/.

@Meziu
Copy link
Member

Meziu commented May 1, 2022

Looks like it doesn’t. I haven’t looked into the inner workings for any differences, but just running the program yields the same behaviour.

@Meziu Meziu closed this May 1, 2022
@Meziu Meziu reopened this May 1, 2022
@ian-h-chamberlain
Copy link
Member Author

After a bit more testing, I think the codegen units / LTO may be a red herring after all. I tried rebuilding std with the same flags and it didn't seem to matter, so I might go back to the drawing board and try to bisect the hard way. It is interesting that I was able to bisect a segfault in rustc but seems like it's not directly related to the codegen issue we are seeing.

@ian-h-chamberlain
Copy link
Member Author

Ok, got some more details! I think this might be the real cause of the difference between debug + release mode builds.

After bisecting, I found that the segfault occurred only when a stack-coloring LLVM pass did not occur. Looking at the LLVM IR with -C llvm-args=--print-after=stack-coloring -C llvm-args=--print-before=stack-coloring, the difference is quite apparent:

Before stack-coloring
# Machine code for function _ZN12regex_syntax3ast5parse16ParserI$LT$P$GT$19parse_with_comments17hb84b2d14834d7bfbE: IsSSA, TracksLiveness
Frame Objects:
  fi#0: size=1, align=4, at location [SP]
  fi#1: size=60, align=8, at location [SP]
  fi#2: size=60, align=8, at location [SP]
  fi#3: size=60, align=8, at location [SP]
  fi#4: size=60, align=8, at location [SP]
  fi#5: size=8, align=4, at location [SP]
  fi#6: size=8, align=4, at location [SP]
  fi#7: size=132, align=8, at location [SP]
  fi#8: size=144, align=8, at location [SP]
  fi#9: size=64, align=4, at location [SP]
  fi#10: size=60, align=8, at location [SP]
  fi#11: size=36, align=8, at location [SP]
  fi#12: size=136, align=4, at location [SP]
  fi#13: size=132, align=8, at location [SP]
  fi#14: size=132, align=8, at location [SP]
  fi#15: size=68, align=8, at location [SP]
  fi#16: size=56, align=8, at location [SP]
  fi#17: size=132, align=8, at location [SP]
  fi#18: size=36, align=8, at location [SP]
  fi#19: size=68, align=4, at location [SP]
  fi#20: size=64, align=8, at location [SP]
  fi#21: size=12, align=8, at location [SP]
  fi#22: size=36, align=8, at location [SP]
  fi#23: size=68, align=4, at location [SP]
  fi#24: size=64, align=8, at location [SP]
  fi#25: size=12, align=8, at location [SP]
  fi#26: size=36, align=8, at location [SP]
  fi#27: size=68, align=4, at location [SP]
  fi#28: size=64, align=8, at location [SP]
  fi#29: size=12, align=8, at location [SP]
  fi#30: size=36, align=8, at location [SP]
  fi#31: size=68, align=4, at location [SP]
  fi#32: size=64, align=8, at location [SP]
  fi#33: size=132, align=8, at location [SP]
  fi#34: size=132, align=4, at location [SP]
  fi#35: size=128, align=8, at location [SP]
  fi#36: size=128, align=8, at location [SP]
  fi#37: size=36, align=8, at location [SP]
  fi#38: size=68, align=4, at location [SP]
  fi#39: size=64, align=8, at location [SP]
  fi#40: size=36, align=8, at location [SP]
  fi#41: size=68, align=4, at location [SP]
  fi#42: size=64, align=8, at location [SP]
  fi#43: size=36, align=8, at location [SP]
  fi#44: size=68, align=4, at location [SP]
  fi#45: size=64, align=8, at location [SP]
  fi#46: size=24, align=8, at location [SP]
  fi#47: size=36, align=8, at location [SP]
  fi#48: size=24, align=4, at location [SP]
  fi#49: size=4, align=4, at location [SP]
After stack-coloring
# Machine code for function _ZN12regex_syntax3ast5parse16ParserI$LT$P$GT$19parse_with_comments17hb84b2d14834d7bfbE: IsSSA, TracksLiveness
Frame Objects:
  fi#0: dead
  fi#1: dead
  fi#2: dead
  fi#3: dead
  fi#4: dead
  fi#5: dead
  fi#6: dead
  fi#7: size=132, align=8, at location [SP]
  fi#8: size=144, align=8, at location [SP]
  fi#9: dead
  fi#10: dead
  fi#11: dead
  fi#12: dead
  fi#13: dead
  fi#14: size=132, align=8, at location [SP]
  fi#15: dead
  fi#16: size=56, align=8, at location [SP]
  fi#17: dead
  fi#18: dead
  fi#19: dead
  fi#20: dead
  fi#21: dead
  fi#22: dead
  fi#23: dead
  fi#24: dead
  fi#25: dead
  fi#26: dead
  fi#27: dead
  fi#28: dead
  fi#29: dead
  fi#30: dead
  fi#31: dead
  fi#32: dead
  fi#33: dead
  fi#34: dead
  fi#35: dead
  fi#36: dead
  fi#37: dead
  fi#38: dead
  fi#39: dead
  fi#40: dead
  fi#41: dead
  fi#42: dead
  fi#43: dead
  fi#44: dead
  fi#45: dead
  fi#46: dead
  fi#47: size=36, align=8, at location [SP]
  fi#48: dead
  fi#49: dead

This optimizes down the stack (for this particular function) from 3169 (!) to 500 bytes, which when combined with other stack frames seems like it must be enough to blow the stack. As far as I can tell, there's nothing actually being miscompiled here, it's just that the debug build options (-C opt-level=0) does not optimize well for stack space and we are left with a lot of unnecessary locals, etc.

The good news is, we have a way to change the stack size at compile time! libctru exposes a static variable which gets used to set the stack size on startup:

#[no_mangle]
static __stacksize__: usize = 64 * 1024; // default is 32k

In my case, this was enough to prevent the crash!

We may want a more conservative default like 2MB (the pthread default on linux, I think) for more programs to work correctly, but I'm also not sure if it makes sense to set this in ctru at all... we'd want a way to #[cfg] it out probably in case the user wanted to set a custom stack size, or maybe there's a better way?

Let me know if someone is able to try this with rapier2d – hopefully, this is a simple fix to test and see what happens, and then we can discuss options for resolving it in the general case?

@Meziu
Copy link
Member

Meziu commented May 3, 2022

Well, having it at 2MB makes sense since it is also the stack size a new thread would have. I’ll try running rapier2d later.

Edit: nope, nothing. The result is exactly the same... We should try debugging rapier-physics the same way you debugged regex.

@ian-h-chamberlain
Copy link
Member Author

Hmm, I tried the rapier2d-physics example as well, but it seems like the symptoms are different – the app closes immediately, but does not cause an ARM exception, unlike the regex example I've been testing.

Should we open a separate issue / discussion to address that issue, since the change in #59 doesn't seem to resolve it? This PR can probably be closed, at least.

@Meziu Meziu closed this May 4, 2022
@ian-h-chamberlain ian-h-chamberlain deleted the example/lazy_static branch May 4, 2022 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants