Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture trace of error reporting thread and identify with boolean flag #355

Merged
merged 17 commits into from
Sep 25, 2018

Conversation

fractalwrench
Copy link
Contributor

@fractalwrench fractalwrench commented Aug 2, 2018

Goal

Captures the trace of Thread.currentThread(), which was previously omitted, and to identify it with a boolean flag in the serialised JSON.

Changeset

  • Alter ThreadState to stop removing the current thread from serialisation
  • Conditionally write errorReportingThread boolean flag by checking thread ID

Tests

Added unit tests for general ThreadState serialisation, and for checking that the current thread is now captured. Ran mazerunner locally, tested with dashboard.

Review

For the submitter, initial self-review:

  • Commented on code changes inline explain the reasoning behind the approach
  • Reviewed the test cases added for completeness and possible points for discussion
  • A changelog entry was added for the goal of this pull request
  • Check the scope of the changeset - is everything in the diff required for the pull request?
  • This pull request is ready for:
    • Initial review of the intended approach, not yet feature complete
    • Structural review of the classes, functions, and properties modified
    • Final review

For the pull request reviewer(s), this changeset has been reviewed for:

  • Consistency across platforms for structures or concepts added or modified
  • Consistency between the changeset and the goal stated above
  • Internal consistency with the rest of the library - is there any overlap between existing interfaces and any which have been added?
  • Usage friction - is the proposed change in usage cumbersome or complicated?
  • Performance and complexity - are there any cases of unexpected O(n^3) when iterating, recursing, flat mapping, etc?
  • Concurrency concerns - if components are accessed asynchronously, what issues will arise
  • Thoroughness of added tests and any missing edge cases
  • Idiomatic use of the language

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 74.917% when pulling 60b5934 on thread-id into 0390356 on next.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 74.917% when pulling 60b5934 on thread-id into 0390356 on next.

@coveralls
Copy link

coveralls commented Aug 2, 2018

Coverage Status

Coverage decreased (-0.2%) to 74.731% when pulling 2f83a7c on thread-id into 0390356 on next.

private final Thread[] threads;
private final Map<Thread, StackTraceElement[]> stackTraces;
private final long currentThreadId;

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@fractalwrench fractalwrench requested a review from a team August 3, 2018 08:11
Copy link
Contributor

@kattrali kattrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with a few cases:

  • Spinning up background threads which periodically do work and then cause a handled or unhandled exception
  • Causing handled or unhandled exceptions from UI events
  • Causing handled or unhandled exceptions from the main thread immediately

In the case of unhandled exceptions, an incorrect thread is generally marked as the unhandledException thread is not the same as the error reporting thread. So a thread is marked but it has a different stacktrace from the Stacktrace tab.

For handled exceptions, I never see any thread marked as the reporting thread on the Threads tab.

private final Thread[] threads;
private final Map<Thread, StackTraceElement[]> stackTraces;
private final long currentThreadId;

This comment was marked as resolved.

*/
private Thread[] sanitiseThreads(long currentThreadId,
Map<Thread, StackTraceElement[]> liveThreads) {
private Thread[] sanitiseThreads(Map<Thread, StackTraceElement[]> liveThreads) {

This comment was marked as resolved.

@fractalwrench
Copy link
Contributor Author

Thanks for the review @kattrali - I've updated ThreadState so that it now requires a Thread parameter in its constructor, which is populated via the thread value from uncaughtException, and Thread.currentThread for notify calls. This should address the wrong thread being selected for unhandled exceptions.

For handled exceptions, I never see any thread marked as the reporting thread on the Threads tab.

I was not able to reproduce this behaviour, did you have a code example that will trigger this?

@fractalwrench
Copy link
Contributor Author

I've updated the thread serialisation to capture the threads at the point of report generation, rather than serialisation. I've also made sure that we pass in the current Thread to ThreadState rather than assuming Thread.currentThread, as this value will often be wrong.

@kattrali
Copy link
Contributor

kattrali commented Aug 8, 2018

I was not able to reproduce this behaviour, did you have a code example that will trigger this?

Nothing special, just fired up the example app with the handled exception buttons. It still happens sometimes.

Copy link
Contributor

@kattrali kattrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this further, made some suggestions for how to get a consistent result for both handled and unhandled exceptions. The error reporting thread for handled errors looks as I would expect (though sometimes no thread is marked?), though unhandled errors may end up with an unexpected stack if they are from a pool and should be set to the exception stack.

@@ -77,12 +77,12 @@ public void uncaughtException(@NonNull Thread thread, @NonNull Throwable throwab
StrictMode.setThreadPolicy(StrictMode.ThreadPolicy.LAX);

client.cacheAndNotify(throwable, Severity.ERROR,
metaData, severityReason, violationDesc);
metaData, severityReason, violationDesc, thread);

This comment was marked as resolved.

@fractalwrench
Copy link
Contributor Author

Nothing special, just fired up the example app with the handled exception buttons. It still happens sometimes.

I've tried pressing each button in the example app 10x on an API 27 emulator. For all of the 70 received events "Error reported from this thread" is present on one entry in the threads tab. Do I need to increase the frequency to reproduce this issue, or is there some sort of code change required?

One potential cause I can think of would be that Thread.getAllStackTraces() only returns alive threads, and that the thread parameter passed into uncaughtException might be terminated, and therefore not in the map.

If this is the case then we might be able to record the thread ID, name, and a zero-element array stacktrace by adding the terminated thread into the set of live threads. However, I'm still not sure how this could explain why this behaviour was observed for a blocking notify call - I suspect there's something subtly different in our environments and it would be good to get to the bottom of this.

However, all of the required elements are here to construct the expected result. For unhandled reports, the errorReportingThread should have: exception.getStackTrace()

I discussed this with Duncan at project kick-off and believe the requested behaviour was to capture the thread trace for both handled/unhandled errors, rather than the exception stacktrace. I'll run this past the necessary Slack channel before making the change

@fractalwrench
Copy link
Contributor Author

This behaviour is reproducible on API 25 not API 27, investigating further

… already a key

An issue exists in api 24/25 where not all threads are reported correctly from
Thread.getAllStacktraces(). Adding it in manually and calling Thread.getStacktrace() appears to
negate this behaviour. https://issuetracker.google.com/issues/64122757

private fun verifyCurrentThreadStructure(json: JSONArray,
currentThreadId: Long,
action: ((thread: JSONObject) -> Unit)? = null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows passing a function that will be invoked when the errorReportingThread json is found. This means assertions can be run with less boilerplate finding the necessary thread

@NonNull Thread thread,
boolean unhandled) {
Throwable exc = unhandled ? exception : null;
this.threadState = new ThreadState(config, thread, Thread.getAllStackTraces(), exc);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception is passed in (default for unhandled errors), the exception stacktrace will be serialised rather than the thread stacktrace

@fractalwrench
Copy link
Contributor Author

There appears to be a bug in API levels 24/25 where Thread.getAllStacktraces() does not actually capture all threads, which meant the errorReportingThread would not be serialised at all on those levels. This led to the observed behaviour with handled exceptions not showing the expected information in the dashboard.

I have applied a workaround for this by capturing the stacktrace of the errorReportingThread directly, via Thread.getStackTrace(). However, it's worth noting that the rest of the affected threads will not be serialised on Android 7 devices. I'm not aware of a suitable workaround for this at the present time.

I have also altered the thread serialisation so that the exception stacktrace is used for the errorReportingThread, for unhandled errors only.

Testing included:

  • Full Mazerunner suite on API 21
  • Addition of new unit tests and running on API 16, 19, 21, 25, 27
  • Manual tests on API 25/27 (inspected thread tab on dashboard):
    • Handled + unhandled exception button in app
    • Wrapping handled + unhandled exception button code with Thread {}.start() and confirming errorReportingThread differed
  • Self-review of changeset

@fractalwrench fractalwrench requested a review from a team August 17, 2018 14:21
@fractalwrench
Copy link
Contributor Author

Added an additional mazerunner scenario, which has passed locally (CI build is blocked again but I believe this should be ready for review.)

Feature: Error Reporting Thread

Scenario: Only 1 thread is flagged as the error reporting thread
When I run "HandledExceptionScenario" with the defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a scenario for unhandled as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point.

@simonbowring
Copy link
Contributor

The current thread stacktrace for handled events looks like this:

VMStack.java:-2dalvik.system.VMStack.getThreadStackTrace	
Thread.java:580java.lang.Thread.getStackTrace	
Thread.java:522java.lang.Thread.getAllStackTraces	
Error.java:408com.bugsnag.android.Error$Builder.<init>	
Client.java:665com.bugsnag.android.Client.notify	
Bugsnag.java:380com.bugsnag.android.Bugsnag.notify	
ExampleActivity.java:83com.bugsnag.android.example.ExampleActivity.crashHandled
...

Should we trim off everything from Bugsnag.java upwards as it's potentially not relevant (our code), or is it important that we present the entire thread state?

@fractalwrench
Copy link
Contributor Author

@simonbowring this is the requested behaviour for handled events - unhandled events will show the exception stacktrace rather than the thread trace.

adds a scenario for an unhandled exception, that ensures the error reporting thread is serialised.
For handled exceptions, the thread trace should be reported. For unhandled exceptions, the exception
stacktrace should be used instead.
Copy link
Contributor

@simonbowring simonbowring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fractalwrench fractalwrench merged commit d91643e into next Sep 25, 2018
@fractalwrench fractalwrench deleted the thread-id branch September 25, 2018 13:45
lemnik pushed a commit that referenced this pull request Jun 2, 2021
Add react native maven repo earlier in plugin lifecycle
rich-bugsnag added a commit that referenced this pull request Sep 3, 2021
Bump bugsnag-android to v5.11.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants