Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredump: ARM: Ensure sp in dump is set as gdb expects #73189

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

mrkhldn
Copy link
Contributor

@mrkhldn mrkhldn commented May 23, 2024

Gdb is typically able to reconstruct the first two frames of the failing stack using the "pc" and "lr" registers. After that, (if the frame pointer is omitted) it appears to need the stack pointer (sp register) to point to the top of the stack before a fatal error occurred.

The ARM Cortex-M7 processor pushes registers r0-r3, r12, LR, {possibly FPU registers}, PC, SPSR onto the stack before entering the exception handler. We adjust the stack pointer back to the point before these registers were pushed for preservation in the dump.

During k_oops/k_panic, the sp wasn't stored in the core dump at all. Apply similar logic to store it when failures occur in that path.

@kartben
Copy link
Collaborator

kartben commented Jun 12, 2024

@mrkhldn can you rebase to solve conflict?

arch/arm/core/cortex_m/fault.c Outdated Show resolved Hide resolved
arch/arm/core/fatal.c Outdated Show resolved Hide resolved
@mrkhldn
Copy link
Contributor Author

mrkhldn commented Jun 17, 2024

For reference, using the coredump_gdbserver.py script and connection from gdb to get the backtrace, before this change when using the output of the test at tests/subsys/debug/coredump_backends/debug.coredump.backends.logging:

(Wrong stack)

(gdb) bt
#0  dump_entry (p1=<optimized out>, p2=0x0 <cbvprintf_package>, p3=0x0 <cbvprintf_package>) at ~/zephyrproject/zephyr/tests/subsys/debug/coredump_backends/src/main.c:34
#1  0x00000e84 in z_arm_usage_fault ()
#2  0x00000e70 in z_arm_fault_init () at ~/zephyrproject/zephyr/arch/arm/core/cortex_m/fault.c:1196
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And after this change:
(Expected stack)

(gdb) bt
#0  dump_entry (p1=<optimized out>, p2=0x0 <cbvprintf_package>, p3=0x0 <cbvprintf_package>) at ~/zephyrproject/zephyr/tests/subsys/debug/coredump_backends/src/main.c:34
#1  0x000008b0 in z_thread_entry (entry=0x3ad <dump_entry>, p1=0x0 <cbvprintf_package>, p2=0x0 <cbvprintf_package>, p3=0x0 <cbvprintf_package>)
    at ~/zephyrproject/zephyr/lib/os/thread_entry.c:48
#2  0x00000000 in ?? ()

ithinuel
ithinuel previously approved these changes Jun 17, 2024
Comment on lines 1067 to 1085
#ifdef CONFIG_DEBUG_COREDUMP
z_arm_coredump_fault_sp = POINTER_TO_UINT(esf);
#endif
#if defined(CONFIG_ARMV7_M_ARMV8_M_MAINLINE) || defined(CONFIG_ARMV6_M_ARMV8_M_BASELINE)
/* Gdb expects a stack pointer that does not include the exception stack frame in order to
* unwind. So adjust the stack pointer accordingly.
*/
z_arm_coredump_fault_sp += sizeof(esf->basic);

#if defined(CONFIG_FPU) && defined(CONFIG_FPU_SHARING)
/* Assess whether thread had been using the FP registers and add size of additional
* registers if necessary
*/
if ((exc_return & EXC_RETURN_STACK_FRAME_TYPE_STANDARD) ==
EXC_RETURN_STACK_FRAME_TYPE_EXTENDED) {
z_arm_coredump_fault_sp += sizeof(esf->fpu);
}
#endif /* CONFIG_FPU && CONFIG_FPU_SHARING */
#endif /* CONFIG_ARMV7_M_ARMV8_M_MAINLINE || CONFIG_ARMV6_M_ARMV8_M_BASELINE */
#endif /* CONFIG_DEBUG_COREDUMP */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: how about extracting all this into a static inline in exception.h so the whole ifdef mess doesn't have to be implemented in two places?

*/
void z_do_kernel_oops(const struct arch_esf *esf, _callee_saved_t *callee_regs)
void z_do_kernel_oops(const struct arch_esf *esf, _callee_saved_t *callee_regs, uint32_t exc_return)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #76221 I made sure Cortex-A/Cortex-R don't pass garbage here, but I don't know if that's really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

Comment on lines +42 to +88
/* Exception Return (EXC_RETURN) is provided in LR upon exception entry.
* It is used to perform an exception return and to detect possible state
* transition upon exception.
*/

/* Prefix. Indicates that this is an EXC_RETURN value.
* This field reads as 0b11111111.
*/
#define EXC_RETURN_INDICATOR_PREFIX (0xFF << 24)
/* bit[0]: Exception Secure. The security domain the exception was taken to. */
#define EXC_RETURN_EXCEPTION_SECURE_Pos 0
#define EXC_RETURN_EXCEPTION_SECURE_Msk \
BIT(EXC_RETURN_EXCEPTION_SECURE_Pos)
#define EXC_RETURN_EXCEPTION_SECURE_Non_Secure 0
#define EXC_RETURN_EXCEPTION_SECURE_Secure EXC_RETURN_EXCEPTION_SECURE_Msk
/* bit[2]: Stack Pointer selection. */
#define EXC_RETURN_SPSEL_Pos 2
#define EXC_RETURN_SPSEL_Msk BIT(EXC_RETURN_SPSEL_Pos)
#define EXC_RETURN_SPSEL_MAIN 0
#define EXC_RETURN_SPSEL_PROCESS EXC_RETURN_SPSEL_Msk
/* bit[3]: Mode. Indicates the Mode that was stacked from. */
#define EXC_RETURN_MODE_Pos 3
#define EXC_RETURN_MODE_Msk BIT(EXC_RETURN_MODE_Pos)
#define EXC_RETURN_MODE_HANDLER 0
#define EXC_RETURN_MODE_THREAD EXC_RETURN_MODE_Msk
/* bit[4]: Stack frame type. Indicates whether the stack frame is a standard
* integer only stack frame or an extended floating-point stack frame.
*/
#define EXC_RETURN_STACK_FRAME_TYPE_Pos 4
#define EXC_RETURN_STACK_FRAME_TYPE_Msk BIT(EXC_RETURN_STACK_FRAME_TYPE_Pos)
#define EXC_RETURN_STACK_FRAME_TYPE_EXTENDED 0
#define EXC_RETURN_STACK_FRAME_TYPE_STANDARD EXC_RETURN_STACK_FRAME_TYPE_Msk
/* bit[5]: Default callee register stacking. Indicates whether the default
* stacking rules apply, or whether the callee registers are already on the
* stack.
*/
#define EXC_RETURN_CALLEE_STACK_Pos 5
#define EXC_RETURN_CALLEE_STACK_Msk BIT(EXC_RETURN_CALLEE_STACK_Pos)
#define EXC_RETURN_CALLEE_STACK_SKIPPED 0
#define EXC_RETURN_CALLEE_STACK_DEFAULT EXC_RETURN_CALLEE_STACK_Msk
/* bit[6]: Secure or Non-secure stack. Indicates whether a Secure or
* Non-secure stack is used to restore stack frame on exception return.
*/
#define EXC_RETURN_RETURN_STACK_Pos 6
#define EXC_RETURN_RETURN_STACK_Msk BIT(EXC_RETURN_RETURN_STACK_Pos)
#define EXC_RETURN_RETURN_STACK_Non_Secure 0
#define EXC_RETURN_RETURN_STACK_Secure EXC_RETURN_RETURN_STACK_Msk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only relevant for Cortex-M, maybe it should be put behind an #if defined(CONFIG_ARMV7_M_ARMV8_M_MAINLINE) || defined(CONFIG_ARMV6_M_ARMV8_M_BASELINE) to avoid accidental misuse in Cortex-A/Cortex-R code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path to this include file is arch/arm/include/cortex_m/exception.h, so hopefully that's enough to make it clear that it's intended for Cortex-M

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that, that's fine then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to check xPSR bit 9 to see if the stack is padded an additional 4 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to check xPSR bit 9 to see if the stack is padded an additional 4 bytes.

Added, thanks

ithinuel
ithinuel previously approved these changes Aug 12, 2024
Gdb is typically able to reconstruct the first two frames of the
failing stack using the "pc" and "lr" registers. After that, (if
the frame pointer is omitted) it appears to need the stack pointer
(sp register) to point to the top of the stack before a fatal
error occurred.

The ARM Cortex-M processors push registers r0-r3, r12, LR,
{possibly FPU registers}, PC, SPSR onto the stack before entering the
exception handler. We adjust the stack pointer back to the point
before these registers were pushed for preservation in the dump.

During k_oops/k_panic, the sp wasn't stored in the core dump at all.
Apply similar logic to store it when failures occur in that path.

Signed-off-by: Mark Holden <mholden@meta.com>
@ithinuel
Copy link
Collaborator

@vbrzeski would you mind checking if this resolved your issue and if so approve this PR please ?

@vbrzeski
Copy link

vbrzeski commented Oct 1, 2024

Thanks for adding the xPSR check! LGTM after clang-formatting the source code.

@dawidwojtas-mudita
Copy link

dawidwojtas-mudita commented Oct 8, 2024

@mrkhldn I checked your changes in my test code and it works for nRF52840.
Before:

Reading symbols from ../build/zephyr/zephyr.elf...
someFunc () at /home/dawid/Documents/test/src/main.cpp:55
55	    __asm__ volatile("udf #0" : : :);
#0  someFunc () at /home/dawid/Documents/test/src/main.cpp:55
Backtrace stopped: Cannot access memory at address 0x200022c0

After:

Reading symbols from ../build/zephyr/zephyr.elf...
someFunc () at /home/dawid/Documents/test/src/main.cpp:55
55	    __asm__ volatile("udf #0" : : :);
#0  someFunc () at /home/dawid/Documents/test/src/main.cpp:55
#1  0x0001dcac in main () at /home/dawid/Documents/test/src/main.cpp:69

Great job!

@aescolar aescolar merged commit ec7943b into zephyrproject-rtos:main Oct 9, 2024
22 checks passed
@aescolar
Copy link
Member

aescolar commented Oct 9, 2024

@aescolar
Copy link
Member

aescolar commented Oct 9, 2024

@mrkhldn unfortunately this PR was reverted due to the introduced regression :( .
You are more than welcome to resubmit the PR with a fix so it also builds for M33 cores.

@mrkhldn
Copy link
Contributor Author

mrkhldn commented Oct 9, 2024

Fixed the regression and re-submitted here #79622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception Stack Frames are not correct in coredumps
9 participants