Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-5035] Capture user-reported error threads to disk to avoid deadlock #833

Merged
merged 2 commits into from
Oct 13, 2020

Conversation

nickdowell
Copy link
Contributor

Goal

This change is necessary to fix a deadlock that (occasionally) occurs when capturing thread traces for user-reported errors.

The notifier suspends all other threads before capturing a thread trace, but these may be holding locks in malloc, the Objective-C runtime, or other subsystems. Therefore we must only call async-signal-safe functions while other threads are suspended.

bsg_kscrw_i_captureThreadTrace was calling malloc to create a buffer for the JSON output, and this was observed to deadlock when reporting multiple errors on several threads simultaneously.

Design

This fix works by creating a temporary file to capture the JSON output using KSJSONEncoder's "regular" file output mechanism.

Using a file minimises the amount of memory required to perform the serialisation, and avoids needing to guess at an amount of memory to pre-allocate.

Changeset

bsg_kscrw_i_captureThreadTrace now encodes JSON to file output, which is then parsed using NSJSONSerialization.

The mechanism to capture KSJSONEncoder's in memory has been removed, as it is no longer used and not async-signal-safe.

Testing

This has been tested by manually running the ManyConcurrentNotifyScenario scenario in the test app on a real device and stepping through the new code in a debugger to ensure it's working as expected.

I have verified that errors are still reported on the dashboard and display the expected information on the "threads" tab.

The code that has been modified is covered by unit tests and e2e tests.

Threads may have been suspended while holding locks in malloc, the
Objective-C runtime, or other subsystems, so we must only call
async-signal-safe functions while other threads are suspended.
@nickdowell nickdowell force-pushed the feature/fix-thread-capture-deadlock branch from 01a49df to 4425396 Compare October 9, 2020 08:47
@nickdowell
Copy link
Contributor Author

nickdowell commented Oct 9, 2020

Quick performance test:

NSLog(@"%s starting....", __PRETTY_FUNCTION__);
NSException *exception = [NSException exceptionWithName:NSGenericException reason:@"Testing" userInfo:nil];
CFAbsoluteTime startTime = CFAbsoluteTimeGetCurrent();
for (int i = 0; i < 100; i++) {
    [Bugsnag notify:exception block:^BOOL(BugsnagEvent *event) {
        return NO;
    }];
}
CFAbsoluteTime elapsed = CFAbsoluteTimeGetCurrent() - startTime;
NSLog(@"%s completed in %f seconds", __PRETTY_FUNCTION__, elapsed);

Results:

4.8 seconds after change (writing to disk)
2020-10-09 10:43:47.305710+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] starting....
2020-10-09 10:43:52.043808+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] completed in 4.737836 seconds
2020-10-09 10:43:55.320507+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] starting....
2020-10-09 10:44:00.180394+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] completed in 4.859597 seconds
2020-10-09 10:44:02.570664+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] starting....
2020-10-09 10:44:07.482511+0100 Bugsnag Test App[1709:628901] -[ViewController nonFatalException:] completed in 4.911555 seconds
11.4 seconds before the change (in-memory)
2020-10-09 10:46:53.812575+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] starting....
2020-10-09 10:47:05.321102+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] completed in 11.508221 seconds
2020-10-09 10:47:12.212653+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] starting....
2020-10-09 10:47:23.570621+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] completed in 11.357671 seconds
2020-10-09 10:47:27.113114+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] starting....
2020-10-09 10:47:38.474201+0100 Bugsnag Test App[1713:630075] -[ViewController nonFatalException:] completed in 11.360798 seconds

(tests done on iPhone SE 2)

@kattrali kattrali removed the request for review from tomlongridge October 12, 2020 09:39
@nickdowell nickdowell merged commit 4b896a6 into next Oct 13, 2020
@nickdowell nickdowell deleted the feature/fix-thread-capture-deadlock branch October 13, 2020 07:59
nickdowell added a commit that referenced this pull request Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants