Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-C force-frame-pointers=yes not respected by -Z build-std or opt-level = "z" #136198

Open
rslawson opened this issue Jan 28, 2025 · 13 comments
Open
Labels
A-backtrace Area: Backtraces A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@rslawson
Copy link

rslawson commented Jan 28, 2025

So I've got a project I'm trying to debug, but I'm running into something of an issue. The setup is that I have RISC-V CPUs (riscv32imc-unknown-none-elf) debuggable over JTAG using OpenOCD + GDB. I would like to be able to get a backtrace out of GDB in the case that a program panics, but previously I was never able to guarantee that such a trace would be there since the stack would collapse as soon as execution entered into core::panicking::panic or the like. The way I was attempting to get backtraces from GDB was by doing something to the effect of

file target/riscv32imc-unknown-none-elf/release/crate
set remotetimeout unlimited
break crate::panic_handler

target extended-remote :3333
load

define hook-stop
printf "!!!!! hit panic, execution has stopped !!!!!\n"
backtrace
disconnect
quit 1
end

continue

I figured that if I:

  • Added panic = "abort" to the release profile inCargo.toml
  • Switched to nightly
  • Added the following to .cargo/config.toml:
[build]
# ...
# added:
rustflags = [
    # ...
    "-C", "force-frame-pointers=yes",
]
# ...

[unstable]
build-std = ["core", "panic_abort"]

Then in theory, I should be guaranteed to get backtraces. So I wrote a simple program:

#![no_std]
#![no_main]

// The particular impl here doesn't matter.
use some_hal_crate::uart::Uart;
use riscv_rt::entry;

const UART_ADDR: *const () = (0b11 << 30) as *const ();

// And neither does the implementation here - I'm just using it for some extra debugging output.
#[panic_handler]
fn panic_handler(info: &core::panic::PanicInfo) -> ! {
    let mut uart = Uart::new(UART_ADDR);
    writeln!(uart, "{info:?}").unwrap();
}

#[entry]
fn main() -> ! {
    skooks();
    loop {}
}

misadventures! {
    am i glad hes frozen in_ there and that were out here and_ that_ hes_ the
    sheriff and__ that__ were_ frozen_ out_ here_ and___ that___ were__ in__
    there__ and____ i_ just remembered were___ out__ here__ what i__ want to know
    is wheres the_ caveman
}

macro_rules! misadventures {
    (@rev [] [$($rev:ident)*]) => {
        misadventures! {
            @defs [skooks $($rev)*]
        }
    };
    (@rev [$first:ident $($rest:ident)*] [$($rev:ident)*]) => {
        misadventures! {
            @rev [$($rest)*] [$first $($rev)*]
        }
    };
    (@defs [$last:ident]) => {
        #[inline(never)]
        fn $last() {
            panic!();
        }
    };
    (@defs [$n0:ident $n1:ident $($rest:ident)*]) => {
        #[inline(never)]
        fn $n0() {
            $n1();
        }

        misadventures! {
            @defs [$n1 $($rest)*]
        }
    };
    ($($words:ident)+) => {
        misadventures! {
            @rev [$($words)+] []
        }
    }
}

pub(crate) use misadventures;

However, I have two problem cases:

  1. In debug builds, the stack trace disappears as soon as execution enters the panicking code (crate::am::panic_cold_explicit I believe). Until then, I get a full trace (very nice!).
  2. In release builds (with debug = true, split-debuginfo = "unpacked", and opt-level = "z"), the stack trace is never deeper than #0 and #1 in GDB. It reports Backtrace stopped: frame did not save the PC.

panic = "abort" was used for all of my testing.

Meta

This was done on nightly-2024-11-18.

@rslawson rslawson added the C-bug Category: This is a bug. label Jan 28, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 28, 2025
@rslawson rslawson changed the title -C force-frame-pointers=yes not respected -C force-frame-pointers=yes not respected by -Z build-std or opt-level = "z" Jan 28, 2025
@workingjubilee
Copy link
Member

workingjubilee commented Jan 28, 2025

opt-level=z breaking this does not surprise me, honestly.

Have you tried this with a compiler built sometime this year?

I believe these are separate issues, can you describe the working vs. broken case in full for opt-level="z"?

@rslawson
Copy link
Author

I'll try with a newer compiler tomorrow morning. As for a working case for opt-level = "z", I have yet to see one. I haven't seen a working case for "s" or 3 either, but I'm guessing that's not too surprising given "z" didn't work.

Also, I'm sorry, I forgot to attach images of output I had observed. Note that for the following, I named the panic handler gdb_panic. Here is the problem on debug builds:
Image
As can be seen here, there is a complete backtrace when the am function is entered, but once it actually hits the panic the stack trace is collapsed to be just what happens after core::panicking::panic_explicit. As for release with opt-level = "z", I see the following:
Image
The stack trace is always just a single function deep (in this case - I've seen it be slightly deeper on other binaries but for no discernible reason), and again gets cut off when it gets into the panic handler.

In both cases I would expect the backtrace to be complete going into the final function, and once it hits the panic handler. I should be able to see all the way back from gdb_panic to am. The idea here is that I would like just to be able to break on the panic handler, set hook-halt, and then get debug output on where any failures occur without having to put a break on every function that could panic.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 28, 2025

By "working" I mean "what is the working state that setting -Copt-level=z breaks?" The control, if you will. You have described two problems with subtle differences involving the reproduction steps:

In release builds (with debug = true, split-debuginfo = "unpacked", and opt-level = "z"), the stack trace is never deeper than # 0 and # 1 in GDB. It reports Backtrace stopped: frame did not save the PC.

Does it require setting ALL of these to break? Surely not? But can any of these flags be set without breaking it? These are separate issues, as I described.

I believe you can set rust_panic as a breakpoint.

@rslawson
Copy link
Author

Ah, sorry, I misunderstood. There is no properly working state, as-is. It doesn't work as expected on debug, and the problem is made worse on release - the stack traces are forever collapsed to a maximum of 2, instead of just when the program enters the panic functions.

As for using rust_panic as a breakpoint, that doesn't currently exist as a symbol in my binaries apparently. The closest thing GDB gives me a tab completion for is rust_begin_unwind.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 28, 2025

Oh, rust_panic is in std, apparently. My mistake.

@workingjubilee
Copy link
Member

Ah, sorry, I misunderstood. There is no properly working state, as-is. It doesn't work as expected on debug, and the problem is made worse on release - the stack traces are forever collapsed to a maximum of 2, instead of just when the program enters the panic functions.

Then, my apologies as I must inventively paraphrase, but how is problem 2 a problem instead of just "it doesn't work as expected, and when I do other things that will probably make it harder for it to work, it still doesn't work as expected"?

In general, optimization eliminates most of the information that can be used to trace anything. That's not a bug, that's just kinda what you're asking for. Asking for debug = "full" cannot change that. I don't think we will be able to do anything about that part, as it probably is destroyed in LLVM, unless it's specifically something about MIR opt passes.

@workingjubilee
Copy link
Member

I probably seem overly perplexed for such a simple matter? Mostly I'm just trying to figure out if this needs to be split into 2 issues or not and at first it seemed there was indeed 2 root problems, but on review I suppose not.

@rslawson
Copy link
Author

rslawson commented Jan 29, 2025

I suppose it is two issues?

  1. Even with -Z build-std="core,panic_abort" (supplied through .cargo/config.toml), -C force-frame-pointers=yes is not respected by panic functions
  2. -C force-frame-pointers=yes is (almost?) never respected by at the very least opt levels z, s, and 3.

These are both necessary for what I was attempting to do, so I didn't think to separate them. Should I?

@workingjubilee
Copy link
Member

workingjubilee commented Jan 29, 2025

Right, I suppose your concern is

Backtrace stopped: frame did not save the PC.

But I'm... not entirely sure how gdb discerns this, so I'm not entirely sure if gdb is telling the truth there.

Er, to elaborate, I'm one of the most recent people who touched the code that actually does the forcing of frame pointers. It does get respected quite vigorously in many cases when we pass the code on to LLVM. In fact some cases which some people expect -Cforce-frame-pointers=no to skip frame pointers do actually get frame pointers anyways, or at least we tell LLVM such. So for the latter, it shouldn't matter much unless either

  • optimization destroyed all the information
  • LLVM miscompiles it

Hmm. Does this code happen to have a more... architecturally neutral reproducer? Or at least less "and also, JTAG"? I suppose I can just rip out the #[entry] macro and start again with fn main() and -Zbuild-std there. Mostly, I need something that still disassembles the same when built for RISCV.

@rslawson
Copy link
Author

Thankfully this work is open source, so I can share it. I was more hoping that it would be general enough so that I wouldn't need to get someone else familiar enough with the workflows of the project to start hacking on it. It's currently very late for me so I'll do it when I get up later this morning, but I will write up a comment here with a link to the repo, some setup help, and intro to hacking steps relevant to this issue.

@bjorn3
Copy link
Member

bjorn3 commented Jan 29, 2025

What likely happens here is that LLVM does tail call optimization. Rather than creating a new stack frame, LLVM will replace the stack frame of the caller with the stack frame of the callee if the call is the last thing the caller will do and a couple other conditions hold. Even with frame pointers there is no way to recover the old stack frame.

@rslawson
Copy link
Author

Alright, so the repository in question is for the Bittide project. The branch I'm currently working on is rs/fixRustflags.

As for getting set up with this repo, it has a shell.nix that should pull in everything needed. If you've got Nix installed on your system, just a simple nix-shell should be the only setup step necessary.

The relevant source files for what I'm working on are:

  • All of the source files in firmware-binaries/test-cases/panic-checks/src.
  • The testbench file at bittide-instances/tests/Tests/PanicBacktraces.hs, and more specifically, lines 154 through 198.

In order to run the test case(s), first you have to build the necessary binaries. You can absolutely navigate to them and build them individually, but you can also just get everything in the repo done at once with ./cargo.sh build --locked and ./cargo.sh build --locked --release from the base of the repository. Then once you've done that, you can run one of the test cases with something like cabal run bittide-instances:unittests -- -p 'test with simple binary'. With the testbench written as it is now, this will pull in the release binary. To change that to the debug binary, you can change the build type relevant to the binary you're testing (line 53, 56, or 59) to Debug.

@rslawson
Copy link
Author

Also I forgot to say, but I did bring things up to date (nightly 2025-01-28) and the issues were still present.

@jieyouxu jieyouxu added the A-backtrace Area: Backtraces label Jan 29, 2025
@saethlin saethlin added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backtrace Area: Backtraces A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants