Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(signal): Fix stacktrace resolution for native crashes on ARM32 #378

Merged
merged 2 commits into from
Oct 24, 2018

Conversation

kattrali
Copy link
Contributor

@kattrali kattrali commented Oct 17, 2018

Goal

When a signal is raised on 32-bit ARM, the current context is the signal
stack rather than the crash stack. To work around this, set the register
state before unwinding the first frame (using the program counter as the
first frame). Then the stack can be unwound normally.

Changeset

  • Add signal_context to bsg_libunwind_state
  • Set the signal_context to the unwind function signal context (which may be null, as it is in the event of using notify()
  • Use state.signal_context in the unwind callback to set the initial register state

Tests

Manually tested against:

  • Nexus 4 (ARM32) running API 21
  • Nexus 5P (ARM32) on API 24
  • Pixel 2 (ARM64) on API 28
  • Tested against various x86/x64 emulators to make sure the architecture macros are correct

Automated test verifying the correct stacktrace (file & method) for a given crash

Review

The following items are needed:

  • Reproducing the initial problem - Signal-based crashes on ARM32 using v4.8.2 (not using abort()) should produce an uninformative stacktrace without the original frames
  • Confirm consistency between the changeset and the goal stated above.
  • Explore and discover any crash cases where this change results in a worse stacktrace (or no stacktrace 😱 )

@kattrali kattrali added wip and removed wip labels Oct 17, 2018
* True if first and second are the same length and contain the same characters
* in the same order.
*/
bool bsg_strequal(char *first, char *second) __asyncsafe;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any usage of this function in the project. We should remove it if it's not used here, or add it in a separate PR if it's needed more generally.

String metadata = getEventMetaData();
if (metadata != null && metadata.equals("non-crashy")) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~30 files in the scenario package contain the same 4 lines as above, and nearly every scenario contains config.setAutoCaptureSessions(false);.

I think it'd be good to think about whether there are any abstractions/sensible defaults we could use to reduce this duplication. This could be addressed as a separate ticket and shouldn't hold up the work in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still helpful to know for every test case exactly what is going on, and then not fight it if for some reason its needed in a more complex case down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some additional context, say we refactor this into a method, what then would it save? Maybe a line at most?

if (shouldNotCrash()) {
  return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think a method like shouldNotCrash would improve readability. The reader doesn't have to know what event metadata is, why it needs a null-check, or where 'non-crashy' has been set - they only need to know that the author didn't want the scenario to crash at this point.

Additionally, we wouldn't have to worry about updating 30 separate source files if we need to make a change to this feature, and could also add some brief JavaDoc to explain the convention of setting 'non-crashy' as an Environment Variable within a feature.

This doesn't need to be addressed as part of this PR but I do feel it's worth creating a ticket to address this at some future point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken a look at this and I think that my main problem with it is the name of the variable its stored in. It's too easily confused with event metadata. If we made this called something obviously environment variable like then this code seems fine to me.

@@ -1,4 +1,5 @@
#include "../utils/build.h"
#include "../utils/string.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this include doesn't appear to be used?

@fractalwrench
Copy link
Contributor

I've left a few comments inline. A couple of other items that may need addressing include:

  • CI is not passing for this PR
  • The new mazerunner scenario does not pass on an x86, API 27 emulator

I've tested that this works by running the mazerunner scenario on an ARMv7, API 22 emulator. With the changes the scenario passes and the frames are as expected; without the changes, the stackframe is equal to the signal stack (and does contain a frame referencing libmonochrome, around 4 frames down).

@kattrali kattrali force-pushed the kattrali/fix-arm32-stack-resolver branch 2 times, most recently from 90edc1a to bd2016d Compare October 19, 2018 01:35
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@bugsnag bugsnag deleted a comment from coveralls Oct 19, 2018
@kattrali kattrali force-pushed the kattrali/fix-arm32-stack-resolver branch from bd2016d to a9509c5 Compare October 19, 2018 05:15
When a signal is raised on 32-bit ARM, the current context is the signal
stack rather than the crash stack. To work around this, set the register
state before unwinding the first frame (using the program counter as the
first frame). Then the stack can be unwound normally.

In cases where the signal code is TKILL (6), libunwind hangs / crashes
immediately after setting the register values. For these cases, the
first frame is recorded and the stack is terminated early to prevent
losing the report altogether. Alternative approaches would be to grab
every word from the stack and inspect for symbols or to bundle a
different unwinder altogether (libstackunwind maybe?).

For stack buffer overflows (and potentially other variants of
SEGV_MAPERR), libunwind crashes immediately after setting the register
values as well.
Add new "third-party" library fixture to validate stacktraces from
separate libraries.
Add new step to validate stacktraces skipping insignificant frames.
@kattrali kattrali force-pushed the kattrali/fix-arm32-stack-resolver branch from a9509c5 to e624568 Compare October 19, 2018 19:02
@kattrali kattrali merged commit 5ba5d22 into master Oct 24, 2018
@kattrali kattrali deleted the kattrali/fix-arm32-stack-resolver branch October 24, 2018 00:09
kattrali added a commit that referenced this pull request Nov 2, 2018
While the stack enhancements added in #378 are intended only to be run
on 32-bit devices and gated to the __arm__ build, when 64-bit devices
are running the 32-bit library, the stacktraces for those devices were
unintentionally affected. This is the case when an app is configured to
only run 32-bit binaries (Android cannot run both 64-bit and 32-bit
binaries at once), and many popular libraries (e.g. React Native) only
distribute 32-bit libraries.

Fixes #382
kattrali added a commit that referenced this pull request Nov 7, 2018
While the stack enhancements added in #378 are intended only to be run
on 32-bit devices and gated to the __arm__ build, when 64-bit devices
are running the 32-bit library, the stacktraces for those devices were
unintentionally affected. This is the case when an app is configured to
only run 32-bit binaries (Android cannot run both 64-bit and 32-bit
binaries at once), and many popular libraries (e.g. React Native) only
distribute 32-bit libraries.

To further improve the stack quality on 32-bit, switches the underlying
unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution
and does not have the crash in unwind when setting registers.

Fixes #382
kattrali added a commit that referenced this pull request Nov 7, 2018
While the stack enhancements added in #378 are intended only to be run
on 32-bit devices and gated to the __arm__ build, when 64-bit devices
are running the 32-bit library, the stacktraces for those devices were
unintentionally affected. This is the case when an app is configured to
only run 32-bit binaries (Android cannot run both 64-bit and 32-bit
binaries at once), and many popular libraries (e.g. React Native) only
distribute 32-bit libraries.

To further improve the stack quality on 32-bit, switches the underlying
unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution
and does not have the crash in unwind when setting registers.

Fixes #382
kattrali added a commit that referenced this pull request Nov 7, 2018
Uses the ABI list to conditionally switch unwinders on ARM.

While the stack enhancements added in #378 are intended only to be run
on 32-bit devices and gated to the __arm__ build, when 64-bit devices
are running the 32-bit library, the stacktraces for those devices were
unintentionally affected. This is the case when an app is configured to
only run 32-bit binaries (Android cannot run both 64-bit and 32-bit
binaries at once), and many popular libraries (e.g. React Native) only
distribute 32-bit libraries.

To further improve the stack quality on 32-bit, switches the underlying
unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution
and does not have the crash in unwind when setting registers.

Fixes #382
kattrali added a commit that referenced this pull request Nov 8, 2018
Small refactor of the API 21+ stack unwinding logic to restrict 32-bit
unwinding style from 64-bit devices.

While the stack enhancements added in #378 are intended only to be run
on 32-bit devices and gated to the arm build, when 64-bit devices
are running the 32-bit library, the stacktraces for those devices were
unintentionally affected.

This is the case when an app is configured to only run 32-bit binaries
(Android cannot run both 64-bit and 32-bit binaries at once), and many
popular libraries (e.g. React Native) only distribute 32-bit libraries.

Added additional improvement to the 32-bit stacktrace resolution by
switching to nongnu libunwind only for those devices. This significantly
improves the number of frames which can be unwound without risk of a
segfault.
rich-bugsnag added a commit that referenced this pull request Sep 3, 2021
* task(context): make Context a writable property of the client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants