-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(signal): Fix stacktrace resolution for native crashes on ARM32 #378
Conversation
ndk/src/main/jni/utils/string.h
Outdated
* True if first and second are the same length and contain the same characters | ||
* in the same order. | ||
*/ | ||
bool bsg_strequal(char *first, char *second) __asyncsafe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any usage of this function in the project. We should remove it if it's not used here, or add it in a separate PR if it's needed more generally.
String metadata = getEventMetaData(); | ||
if (metadata != null && metadata.equals("non-crashy")) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~30 files in the scenario
package contain the same 4 lines as above, and nearly every scenario contains config.setAutoCaptureSessions(false);
.
I think it'd be good to think about whether there are any abstractions/sensible defaults we could use to reduce this duplication. This could be addressed as a separate ticket and shouldn't hold up the work in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its still helpful to know for every test case exactly what is going on, and then not fight it if for some reason its needed in a more complex case down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add some additional context, say we refactor this into a method, what then would it save? Maybe a line at most?
if (shouldNotCrash()) {
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think a method like shouldNotCrash
would improve readability. The reader doesn't have to know what event metadata is, why it needs a null-check, or where 'non-crashy' has been set - they only need to know that the author didn't want the scenario to crash at this point.
Additionally, we wouldn't have to worry about updating 30 separate source files if we need to make a change to this feature, and could also add some brief JavaDoc to explain the convention of setting 'non-crashy' as an Environment Variable within a feature.
This doesn't need to be addressed as part of this PR but I do feel it's worth creating a ticket to address this at some future point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken a look at this and I think that my main problem with it is the name of the variable its stored in. It's too easily confused with event metadata. If we made this called something obviously environment variable like then this code seems fine to me.
@@ -1,4 +1,5 @@ | |||
#include "../utils/build.h" | |||
#include "../utils/string.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this include doesn't appear to be used?
I've left a few comments inline. A couple of other items that may need addressing include:
I've tested that this works by running the mazerunner scenario on an ARMv7, API 22 emulator. With the changes the scenario passes and the frames are as expected; without the changes, the stackframe is equal to the signal stack (and does contain a frame referencing libmonochrome, around 4 frames down). |
90edc1a
to
bd2016d
Compare
bd2016d
to
a9509c5
Compare
When a signal is raised on 32-bit ARM, the current context is the signal stack rather than the crash stack. To work around this, set the register state before unwinding the first frame (using the program counter as the first frame). Then the stack can be unwound normally. In cases where the signal code is TKILL (6), libunwind hangs / crashes immediately after setting the register values. For these cases, the first frame is recorded and the stack is terminated early to prevent losing the report altogether. Alternative approaches would be to grab every word from the stack and inspect for symbols or to bundle a different unwinder altogether (libstackunwind maybe?). For stack buffer overflows (and potentially other variants of SEGV_MAPERR), libunwind crashes immediately after setting the register values as well.
Add new "third-party" library fixture to validate stacktraces from separate libraries. Add new step to validate stacktraces skipping insignificant frames.
a9509c5
to
e624568
Compare
While the stack enhancements added in #378 are intended only to be run on 32-bit devices and gated to the __arm__ build, when 64-bit devices are running the 32-bit library, the stacktraces for those devices were unintentionally affected. This is the case when an app is configured to only run 32-bit binaries (Android cannot run both 64-bit and 32-bit binaries at once), and many popular libraries (e.g. React Native) only distribute 32-bit libraries. Fixes #382
While the stack enhancements added in #378 are intended only to be run on 32-bit devices and gated to the __arm__ build, when 64-bit devices are running the 32-bit library, the stacktraces for those devices were unintentionally affected. This is the case when an app is configured to only run 32-bit binaries (Android cannot run both 64-bit and 32-bit binaries at once), and many popular libraries (e.g. React Native) only distribute 32-bit libraries. To further improve the stack quality on 32-bit, switches the underlying unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution and does not have the crash in unwind when setting registers. Fixes #382
While the stack enhancements added in #378 are intended only to be run on 32-bit devices and gated to the __arm__ build, when 64-bit devices are running the 32-bit library, the stacktraces for those devices were unintentionally affected. This is the case when an app is configured to only run 32-bit binaries (Android cannot run both 64-bit and 32-bit binaries at once), and many popular libraries (e.g. React Native) only distribute 32-bit libraries. To further improve the stack quality on 32-bit, switches the underlying unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution and does not have the crash in unwind when setting registers. Fixes #382
Uses the ABI list to conditionally switch unwinders on ARM. While the stack enhancements added in #378 are intended only to be run on 32-bit devices and gated to the __arm__ build, when 64-bit devices are running the 32-bit library, the stacktraces for those devices were unintentionally affected. This is the case when an app is configured to only run 32-bit binaries (Android cannot run both 64-bit and 32-bit binaries at once), and many popular libraries (e.g. React Native) only distribute 32-bit libraries. To further improve the stack quality on 32-bit, switches the underlying unwinder from gnu unwind to non-gnu libunwind. Improves frame resolution and does not have the crash in unwind when setting registers. Fixes #382
Small refactor of the API 21+ stack unwinding logic to restrict 32-bit unwinding style from 64-bit devices. While the stack enhancements added in #378 are intended only to be run on 32-bit devices and gated to the arm build, when 64-bit devices are running the 32-bit library, the stacktraces for those devices were unintentionally affected. This is the case when an app is configured to only run 32-bit binaries (Android cannot run both 64-bit and 32-bit binaries at once), and many popular libraries (e.g. React Native) only distribute 32-bit libraries. Added additional improvement to the 32-bit stacktrace resolution by switching to nongnu libunwind only for those devices. This significantly improves the number of frames which can be unwound without risk of a segfault.
* task(context): make Context a writable property of the client
Goal
When a signal is raised on 32-bit ARM, the current context is the signal
stack rather than the crash stack. To work around this, set the register
state before unwinding the first frame (using the program counter as the
first frame). Then the stack can be unwound normally.
Changeset
signal_context
tobsg_libunwind_state
signal_context
to the unwind function signal context (which may be null, as it is in the event of usingnotify()
state.signal_context
in the unwind callback to set the initial register stateTests
Manually tested against:
Automated test verifying the correct stacktrace (file & method) for a given crash
Review
The following items are needed:
abort()
) should produce an uninformative stacktrace without the original frames