Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-7848] Improve handling of concurrent crashes #1286

Merged
merged 1 commit into from
Jan 25, 2022

Conversation

nickdowell
Copy link
Contributor

@nickdowell nickdowell commented Jan 20, 2022

Goal

Successfully report crashes in the event that multiple threads crash at the same time.

In the current release, a secondary crash that occurs while the first crash handling thread is still working (and presumably has not yet suspended other threads) will be incorrectly identified as a "crash in the crash reporter" and trigger an internal error report (via a recrash report.)

There are also other race conditions that could cause corruption of the crash reporting process.

Changeset

Crash reporting is now explicitly one-shot (in practical terms it already was - a single crash report path is configured per process lifetime) so that only a single crash report will attempt to be written.

A recrash report is now only written for a secondary crash which occured in the original crash reporting thread.

Both of these checks are performed in bsg_kscrashsentry_beginHandlingCrash() which now takes the offending (crashed) thread as its argument. An atomic compare-exchange operation is used to ensure only a single thread can win the race to be reported.

Crashes in secondary threads are prevented from immediately killing the process by waiting in bsg_kscrashsentry_beginHandlingCrash() until the crash handling has finished.

Testing

An E2E scenario that reliably reproduced the problem has been added, and the fix verified in multiple runs. Attempts to include checks of the stackframe contents failed because C++ exception stacktraces are not currently recorded when Bugsnag is linked dynamically, which it is for the Mac fixture.

Existing test for recrash reports have also been run successfully.

@github-actions
Copy link

github-actions bot commented Jan 20, 2022

Infer: No issues found 🎉

OCLint: No issues found 🎉

Bugsnag.framework binary size increased by 984 bytes from 1,299,736 to 1,300,720

Generated by 🚫 Danger

@nickdowell nickdowell force-pushed the nickdowell/multiple-crashing-threads branch from f39da20 to 0605f83 Compare January 21, 2022 09:23
@nickdowell nickdowell force-pushed the nickdowell/multiple-crashing-threads branch from 0605f83 to a5800f4 Compare January 21, 2022 15:10
@nickdowell nickdowell merged commit 9f4f8f2 into next Jan 25, 2022
@nickdowell nickdowell deleted the nickdowell/multiple-crashing-threads branch January 25, 2022 13:17
@nickdowell nickdowell mentioned this pull request Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants