Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential deadlock when generating a crash report #577

Closed
wants to merge 2 commits into from

Conversation

robinmacharg
Copy link
Contributor

@robinmacharg robinmacharg commented May 1, 2020

Goal

An issue has been identified that can lead to an app deadlocking when reporting a crash.

The sequence of events is:

  1. Some program component accesses dyld and takes a lock out.
  2. A crash is reported.
  3. Bugsnag, as part of its normal crash reporting behaviour, suspends all but a few critical threads, including the thread that holds the dyld lock.
  4. Bugsnag attempts to read the Mach binary images loaded into the app. This also attempts to take out a lock but is unable to, leading to deadlock.

This PR fixes this issue.

Design

There are two main mechanisms by which a program may find out about dyld behaviour: by direct querying (e.g. _dyld_get_image_header() etc., used by the KSCrash component of Bugsnag prior to this issue) and by callback (_dyld_register_func_for_add_image() etc.). This change modifies the KSCrash component to use callbacks and caches information about loaded binary images for when a crash does occur. Thus, when threads are suspended and a crash report is generated no additional calls to dyld need to be made and deadlock is avoided.

Changeset

dyld binary image capture was moved from BSG_KSCrashReport.c to BSG_KSCrash.m. A static cache is used. An additional .m file contains functions to ease bridging between Objective C and C.

Tests

TBD

Review

Outstanding Questions

  • This pull request is ready for:
    • Initial review of the intended approach, not yet feature complete
    • Structural review of the classes, functions, and properties modified
    • Final review
    • Release
  • The correct target branch has been selected (master for fixes, next for
    features)
  • If this is intended for release have all of the pre-release checks been considered?
  • Consistency across platforms for structures or concepts added or modified
  • Consistency between the changeset and the goal stated above
  • Internal consistency with the rest of the library - is there any overlap between existing interfaces and any which have been added?
  • Usage friction - is the proposed change in usage cumbersome or complicated?
  • Performance and complexity - are there any cases of unexpected O(n^3) when iterating, recursing, flat mapping, etc?
  • Concurrency concerns - if components are accessed asynchronously, what issues will arise
  • Thoroughness of added tests and any missing edge cases
  • Idiomatic use of the language

@robinmacharg robinmacharg changed the title Fix dyld deadlock Fix potential deadlock when generating a crash report May 1, 2020
Copy link
Contributor

@fractalwrench fractalwrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems reasonable from an initial review - I've left a few inline comments going through it, and I haven't attempted to run the code yet. One thing to consider would be running Instruments to give a very basic benchmark on initialization performance, so we can quantify how many milliseconds this approach is adding.

- (void)listenForLoadedBinaries {
bsg_initialise_mach_binary_headers();
_dyld_register_func_for_add_image(&bsg_mach_binary_image_added);
_dyld_register_func_for_remove_image(&bsg_mach_binary_image_removed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially worth registering bsg_mach_binary_image_removed first. This avoids the case where some binary image is removed on another thread while the bsg_mach_binary_image_added callback is executed for the first time.

@@ -226,6 +228,9 @@ - (BOOL)install {
return false;
}

// Maintain a cache of info about dynamically loaded binary images
[self listenForLoadedBinaries];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially worth putting this before bsg_kscrash_install, or at least initialising an empty array of binary images to prevent against the case where a crash occurs during/at the same time as initialisation.

}

/**
* Compare two mach biunary image structures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling nit: biunary ->binary

const char* name;
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
} BSG_Mach_Binary_Image_Info;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encapsulating this info in a struct works well 👍

{
BSG_Mach_Binary_Image_Info info = { 0 };
if (populate_info(header, &info)) {
[bsg_mach_binary_images_info addObject:[NSValue valueWithBytes:&info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an Objective-C object makes me a bit nervous - as if dyld continues to execute this callback during a signal crash, then it could hang the process.

Would it be possible to initialise a C array to start with, then allocate more memory as required?

/**
* Called when a binary image is unloaded.
*/
void bsg_mach_binary_image_removed(const struct mach_header *header, intptr_t slide)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this method need any form of synchronization if images are loaded/unloaded on different threads? Or is that a non-issue?

/**
* Returns a C array of structs describing the loaded Mach binaries
*/
BSG_Mach_Binary_Image_Info* bsg_mach_header_array(size_t *count) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feel is that this function could be done away with if the structs are stored as a C array

@robinmacharg
Copy link
Contributor Author

This PR is superseded by #580. All noted issues have been addressed there.

@fractalwrench fractalwrench deleted the fix-dyld-deadlock branch June 23, 2020 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants