Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ios] RNBGL randomly stops uploading location data points #3926

Closed
bengtan opened this issue Jul 18, 2019 · 9 comments
Closed

[ios] RNBGL randomly stops uploading location data points #3926

bengtan opened this issue Jul 18, 2019 · 9 comments
Assignees

Comments

@bengtan
Copy link
Contributor

bengtan commented Jul 18, 2019

This has been observed recently on Staging and Prod with 4.16 and 4.17 (and perhaps some other recent versions). There have been anecdotes for a while but wasn't ticketed earlier.

So far, this has only been observed and tested on iOS. It may also exist on Android but we haven't looked.

  1. The app (ie. RNBGL) randomly/unpredictably stops/pauses uploading location data points. I say 'pauses' because once the app is opened/bought-to-foreground, RNBGL starts working again.

There is another subclass of symptoms which may or may not be related.

  1. After swipe-killing the app, RNBGL stops uploading location data points until it is opened/bought-to-foreground again.

@thescurry thinks subclass 2 is fixed in 4.17.

(Background: Recent versions before 4.17, including 4.16, use RNBGL 3.0.0-rc5. Version 4.17 uses RNBGL 3.0.9.)

Anyway, both symptoms have been observed by QA over the last few days but the characteristics/preconditions/reproducing-steps/etc haven't been isolated yet.

@bengtan bengtan self-assigned this Jul 18, 2019
@bengtan
Copy link
Contributor Author

bengtan commented Jul 18, 2019

I finally caught one of these today. First, I'll just describe what happened, and then discuss thoughts/opinions/suspicions afterwards.

(These are a bit verbose because they also double as notes for future reference. See the short version later if you like.)

Long story

I spent a couple of hours ish today driving around trying to reproduce this. I shared live location from a 'sharer' account (using iOS Staging 4.17) to another 'sharedWith' account (using android Staging 4.15). The version under test was the sharer (iOS Staging 4.17). The android account was just to observe in a convenient way.

I drove around with both phones. The sharedWith phone was open all the time so I could observe the live location points being generated by the sharer phone.

I must have made 10+ attempts. Sometimes I would stop and wait for 1+ minute, sometimes I would stop for 3-5 minutes, sometimes I would stop for about 20-40 seconds and then start driving again. Didn't reproduce. The sharer phone was screen-locked during this time.

I was about to give up testing for the day when, on a whim, I swipe-killed the app on the sharer phone and continued testing. At the first attempt, it didn't reproduce. I decided to head home. Then, magically, the sharer phone got 'stuck' (ie. uploading of location data points stopped/paused) (I guess this is the second attempt after the swipe-kill). I drove around for another 10+ minutes just to confirm that new location data points weren't being generated/uploaded.

(Hmmm ... I wonder if a recent swipe-kill is a precondition. Maybe it happens the second car drive after a swipe-kill, not the first?)

Got home and took some notes.

The sharer phone got 'stuck' at 2019-07-18 15:32 pm local time (GMT +8) (2019-07-18 07:32 UTC). I reached phone and inspected the server-side logs and confirmed the last location data point was at 15:32 GMT+8.

I deliberately didn't touch the sharer phone for a long time. Finally, at 16:07 GMT+8 (08:07 UTC), I opened the (app on the) sharer phone and emailed myself the RNBGL location debug log.

I confirmed that, upon open the app, it had uploaded new location data points at 16:07 GMT+8. So RNBGL had paused/stopped for 35 minutes.

Looking in the debug log, I noticed this excerpt:

...

2019-07-18 15:26:46.279 🔴-[TSLocationManager stopUpdatingLocation] 

2019-07-18 16:07:16.790 ℹ️-[TSLocationManager init] {

...

No log messages between 15:26 and 16:07. 15:26 was the last log message before being 'stuck'. 16:07 would have been because I had opened the app (to retrieve the debug log). So RNBGL didn't do anything during the 'stuck' 35 minutes.

Looking in loggly, I observed that there were data points at these times:

2019-07-18 15:26:37.000 (GMT+8) 0785C9C2-9C66-43F2-A98D-011F14ED47F3
2019-07-18 15:31:58.000 F4CFBBA2-83DA-49D1-9325-6FB43382215C
2019-07-18 15:31:58.000 B444BA5E-6CFA-47E1-99B6-8B09BF83C6B4
2019-07-18 15:32:00.000 6B5916BE-4ECB-4317-9836-6B40A1D62D24
2019-07-18 16:07:22.000 F907DB4D-B59A-44F9-BFA8-C61E7FCB07F3

Wow, the three data points at 15:31/32, including the last 'stuck' point, didn't appear in the RNBGL log. Points 0785C9C2-9C66-43F2-A98D-011F14ED47F3 and F907DB4D-B59A-44F9-BFA8-C61E7FCB07F3 were logged. The others in the middle didn't get logged ... Almost as if RNBGL had uploaded them, but afterwards, it didn't get a chance to write to the log.

At this point, I had hard evidence of RNBGL misbehaving. I had enough to post a ticket to the RNBGL developer.

But I sat on it a bit.


Then, on a whim, I had a look at bugsnag.

Well, that's interesting. There are three 'Out of Memory' bugsnags at 16:07 GMT+8, and they're from my sharer phone. (No one else could possibly be running Staging 4.17 and iOS 12.2 on an iPhone 6.)

Hmmm ... 16:07 is when I opened the app to retrieve the debug log. But the app didn't crash that I could see.

So I'm thinking ... maybe 16:07 is when the bugsnag was posted, not when the crash happened. That would make sense. I wouldn't expect the app to be able to post a bugsnag if it had just been OOM-killed (OOM == 'out of memory').

Looking at the bugsnag some more (https://app.bugsnag.com/hippware/tinyrobot-1/errors/5d27d25a4dbe55001aaa94b0) there are more instances of it over the last 24 hours, and they're frequent enough that I think they correspond to instances when QA were testing and then ran into this issue.

Short story and discussion

My current hypothesis is this:

Something is causing the app to get OOM-killed. This prevents RNBGL from uploading any further location data points until the app is restarted.

Normally, if the app is swipe-killed, RNBGL restarts (in the background) again when the phone is moving again. It seems like an OOM-kill is a bit more drastic and the app/RNBGL is unable/dis-allowed to restart (in the background).

So ... a few things to think about:

  • I wonder whether a swipe-kill is necessary for this issue to happen. A swipe-kill is how I reproduced it, but this could just be coincidence. In my case, this issue doesn't happen straight after a swipe-kill. It happened the second time the phone started moving after the swipe-kill.

  • I suspect that a OOM bugsnag gets posted when the app is restarted AFTER the OOM-kill, not at the time of the OOM-kill itself. But I'm not sure yet. I've only seen the OOM kill once and don't have enough of a sample size.

So ... things to test:

  1. Open the app and background it (without a swipe-kill), then drive-and-stop a few times to see if this issue occurs.

  2. Background the app and swipe-kill it, then drive-and-stop a few times to see if this issue occurs. Keep in mind that this issue may not happen at the first attempt after the swipe-kill.

I'd like to find out whether this issue occurs only in 1, or only in 2, or occurs in both.

And:

  1. When the app gets 'stuck', see if there is a new OOM bugsnag posted at the same time as when the app is restarted (ie. and becomes 'unstuck') AFTERWARDS. Hopefully we discover an almost perfect correlation.

(The OOM bugsnag is https://app.bugsnag.com/hippware/tinyrobot-1/errors/5d27d25a4dbe55001aaa94b0)

@bengtan
Copy link
Contributor Author

bengtan commented Jul 19, 2019

Another round of testing today, and I got similar results to yesterday.

  • Backgrounded the app (without swipe-killing) and did about five short trips to try to reproduce this issue. Did not reproduce.

  • Swipe-killed the app and did one trip (ie. start driving, observe fresh new data points, and then stop somewhere). Did not reproduce.

  • Started another trip. Issue occurred and the location data points got 'stuck'. Drove around for a few minutes just to make sure.

  • Went home and waited for almost one hour.

  • Opened the app to retrieve the location debug log. App uploaded a few fresh data points (and was no longer 'stuck').

  • Checked the OOM bugsnag. Yep, there were a few OOM bugsnags for my phone, all posted at the same time as the previous step.

  • Checked the location debug log. Yep, there's a one hour gap where RNBGL didn't produce any log messages.

So ... that's two days in a row where:

I couldn't reproduce after a normal background-ing of the app, but I easily reproduced after a swipe-kill ... BUT it happens at the start of the second trip after the swipe-kill, not the first.

And, yes, there's an OOM bugsnag.

In fact, I found out today that bugsnags have 'breadcrumbs' which is a timeline of events. In one of the bugsnags today, it clearly shows that the app/RNBGL was (about to be) terminated by the OS.

Screenshot from 2019-07-19 20-01-05

So ... for me, I have a way to reproduce this issue, and it's worked 2 times out of 2. However, my instances are a bit different from those of @mstidham and @thescurry and Don. Their bugsnags don't have the 'App Will Terminate' breadcrumb. I dunno if this is significant or not.

Certainly, the OOM kill (#3928) is one cause of this issue. Is it the main cause, or only one of several causes? I dunno.

@bengtan
Copy link
Contributor Author

bengtan commented Jul 22, 2019

Today, I upgraded my iPhone from iOS 12.2 to 12.3.1.

The behaviour changed. A lot.

Now, the Staging (4.17) app doesn't upload location whilst in the background most of the time, unless I press the Home button (in order to light up the screen for a few seconds) or (presumably) unlock the phone.

Whereas with 12.2, this problem didn't reproduce most of the time, with 12.3.1, this problem does reproduce most of the time.

So it seems like the slightly different behaviour I was seeing was due to differences in iOS version. Apologies for misdirecting people.

So what I'm seeing now is ... when the app is in the background, there's no uploading at all unless the phone is 'active'/unlock-ed for some other reason. ie Feels like iOS is batching up background execution of our app.

But anyway, I did make progress on another front.

I reproduced my (new) instances of this problem with the RNBGL app, and posted a ticket: transistorsoft/react-native-background-geolocation#778

Let's see what the RNBGL developer says.

@bengtan
Copy link
Contributor Author

bengtan commented Jul 23, 2019

Updated my iPhone to iOS 12.4 today and re-tested. Same results. See transistorsoft/react-native-background-geolocation#778 (comment)


More comments:

This problem seems to be related to the stationary->moving transition.

If I'm in the middle of driving (after having driven for 200+ metres) and I press Home on the iPhone, RNBGL wakes up and starts tracking ... and keeps tracking whilst I am still driving (I was driving for about an additional 5 ish minutes).

@bengtan
Copy link
Contributor Author

bengtan commented Jul 24, 2019

RNBGL developer suggested rebooting the phone. He said that the iPhone should be rebooted after each iOS update.

So I rebooted and did some testing.

Then I realised that my testing over the last few days wasn't representative. I had a mobile hotspot with me and my phone was probably connecting to the mobile hotspot's wifi when it should be using cellular.

So I redid my testing, making sure the phone wasn't connecting to mobile wifi.

In the first test run (non-swipe-kill backgrounding), both tinyrobotStaging and the RNBGL demo app worked fine. (Albeit I needed to travel ~1000m instead of 200m before RNBGL would wake up)

In the second test run, the RNBGL demo app crashed and stayed crashed. tinyrobotStaging worked fine.

For the non-swipe-kill case, it seems to be working better. Not sure if this is due to the reboot or due to staying on cellular (instead of mobile hotspot wifi).

Next, I tested swipe-kill but got unsure results. I'll continue with more testing tomorrow.

@bengtan
Copy link
Contributor Author

bengtan commented Jul 25, 2019

Interesting.

After reading about how iOS apps wake up to receive background location updates, I re-tested (Staging 4.17, iOS 12.4) without a mobile hotspot in the car.

Wow. The performance and responsiveness is similar to my prior testing with iOS 12.2. I even feel that iOS 12.4 'feels' a bit more responsive, but this is just a feeling.

Anyway, here is the summary:

  • When the app is backgrounded without a swipe-kill, it works (for the 3-4 times I tried it).
  • After a swipe-kill, the app stopped tracking about 3-4 minutes afterwards. Opening it indicated an app restart (ie. the app splash screen could be seen).

So it seems I'm back to having a consistent way to reproduce one type of 'stuck'-iness.

Next: Try to reproduce the 'stuck'-iness whilst the app is connected (and logging) to XCode. (However, I have to install both macOS mojave and XCode 10.3. This might burn an entire day.)


Notes for future reference

App connected to mobile hotspot wifi (iOS 12.3.1 and 12.4): Requires about 1500m of travelling for RNBGL to wake up and start tracking. Sometimes even 1500m was not even enough and needed a 'Press Home button'.

App in car using cellular, mobile hotspot present (iOS 12.4): Requires about 1000m of travelling.

App in car using cellular, no mobile hotspot present (iOS 12.4): Requires 50-500m of travelling.

Tested in an urban setting: Bull Creek, Bateman, Western Australia.

@bengtan
Copy link
Contributor Author

bengtan commented Jul 25, 2019

Breakthrough

Turns out iPhone's keep crash logs on the phone. Goto Settings » Privacy » Analytics » Analytics Data (on iOS 12).

For the instance of this issue that was reproduced in my testing as documented in my first comment, I found this crash report (iOS 12.2):

{"app_name":"tinyrobotStaging","app_cohort":"2|date=1563424200000&sf=143473&tid=48f75ccb5e77098c540853918fc7a4bdae2f991e0b41f65127d883c7b799be58&ttype=i","app_version":"4.17.0","timestamp":"2019-07-18 15:32:02.58 +0800","slice_uuid":"e556be47-e9a1-38a0-a018-b2daacbeb9cb","adam_id":1198202376,"build_version":"2494","bundleID":"com.hippware.ios.ChatStaging","share_with_app_devs":false,"is_first_party":false,"bug_type":"109","os_version":"iPhone OS 12.2 (16E227)","incident_id":"01B911FD-5548-4617-A8EF-6DC24FF4CC9A","is_beta":true,"name":"tinyrobotStaging"}
Incident Identifier: 01B911FD-5548-4617-A8EF-6DC24FF4CC9A
Beta Identifier:     5C2C9498-52C8-461A-8DDA-71AA8338D191
Hardware Model:      iPhone7,2
Process:             tinyrobotStaging [6473]
Path:                /private/var/containers/Bundle/Application/2A31084D-CDD9-4D3B-BC5E-A6FB276F11F4/tinyrobotStaging.app/tinyrobotStaging
Identifier:          com.hippware.ios.ChatStaging
Version:             2494 (4.17.0)
AppStoreTools:       10G3
AppVariant:          1:iPhone7,2:12.2
Beta:                YES
Code Type:           ARM-64 (Native)
Role:                Non UI
Parent Process:      launchd [1]
Coalition:           com.hippware.ios.ChatStaging [1052]


Date/Time:           2019-07-18 15:32:01.3853 +0800
Launch Time:         2019-07-18 15:31:53.1768 +0800
OS Version:          iPhone OS 12.2 (16E227)
Baseband Version:    7.55.01
Report Version:      104

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace SPRINGBOARD, Code 0x8badf00d
Termination Description: SPRINGBOARD, scene-create watchdog transgression: com.hippware.ios.ChatStaging exhausted CPU time allowance of 4.53 seconds | ProcessVisibility: Background | ProcessState: Running | WatchdogEvent: scene-create | WatchdogVisibility: Background | WatchdogCPUStatistics: ( | "Elapsed total CPU time (seconds): 13.010 (user 13.010, system 0.000), 81% CPU", | "Elapsed application CPU time (seconds): 6.034, 38% CPU" | )
Triggered by Thread:  0
...

For the instance reproduced about two hours (mentioned in my immediately preceding comment), I found this crash report:

{"app_name":"tinyrobotStaging","app_cohort":"2|date=1563424200000&sf=143473&tid=48f75ccb5e77098c540853918fc7a4bdae2f991e0b41f65127d883c7b799be58&ttype=i","app_version":"4.17.0","timestamp":"2019-07-25 11:31:17.90 +0800","slice_uuid":"e556be47-e9a1-38a0-a018-b2daacbeb9cb","adam_id":1198202376,"build_version":"2494","bundleID":"com.hippware.ios.ChatStaging","share_with_app_devs":false,"is_first_party":false,"bug_type":"109","os_version":"iPhone OS 12.4 (16G77)","incident_id":"15AD1C5C-607F-4BE8-A4C0-19642874C4D6","is_beta":true,"name":"tinyrobotStaging"}
Incident Identifier: 15AD1C5C-607F-4BE8-A4C0-19642874C4D6
Beta Identifier:     5C2C9498-52C8-461A-8DDA-71AA8338D191
Hardware Model:      iPhone7,2
Process:             tinyrobotStaging [285]
Path:                /private/var/containers/Bundle/Application/2A31084D-CDD9-4D3B-BC5E-A6FB276F11F4/tinyrobotStaging.app/tinyrobotStaging
Identifier:          com.hippware.ios.ChatStaging
Version:             2494 (4.17.0)
AppStoreTools:       10G3
AppVariant:          1:iPhone7,2:12.2
Beta:                YES
Code Type:           ARM-64 (Native)
Role:                Non UI
Parent Process:      launchd [1]
Coalition:           com.hippware.ios.ChatStaging [424]


Date/Time:           2019-07-25 11:31:17.2799 +0800
Launch Time:         2019-07-25 11:30:47.1581 +0800
OS Version:          iPhone OS 12.4 (16G77)
Baseband Version:    7.80.04
Report Version:      104

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note:  EXC_CORPSE_NOTIFY
Termination Reason: Namespace SPRINGBOARD, Code 0x8badf00d
Termination Description: SPRINGBOARD, scene-create watchdog transgression: com.hippware.ios.ChatStaging exhausted real (wall clock) time allowance of 29.65 seconds | ProcessVisibility: Background | ProcessState: Running | WatchdogEvent: scene-create | WatchdogVisibility: Background | WatchdogCPUStatistics: ( | "Elapsed total CPU time (seconds): 18.390 (user 18.390, system 0.000), 31% CPU", | "Elapsed application CPU time (seconds): 3.509, 6% CPU" | )
Triggered by Thread:  0
...

I think what's happening is that ... sometimes when the app/RNBGL wakes in the background in order to commence tracking, it takes too long to 'do something' and then iOS kills it. Then, sometimes, the app tries to wake a few minutes later to try again, takes too long, and gets killed again. This continues to happen until the user starts the app (in the foreground).

I don't know what the app is trying to do. It might even be some react native initialisation code which takes too long. shrugs

Need to do more debugging/logging.


Logs

tinyrobotStaging-2019-07-18-153202.ips.beta.txt

tinyrobotStaging-2019-07-25-113117.ips.beta.txt

@bengtan
Copy link
Contributor Author

bengtan commented Jul 26, 2019

Alright, found the main cause of this. Hopefully it is also the only cause, but we won't know pending further testing.

There is a recently-introduced RN component we use: react-native-splash-screen. There is some sort of weird or buggy interaction with iOS such that ... if the app is started in the background, it takes too long or something, and iOS kills it.

So what happens is:

  • The app is no longer running. Either it terminated naturally, or the user did a swipe-kill.
  • Phone starts moving.
  • iOS wakes up the app in the background to do location tracking.
  • Problem/bug happens. OS kills app about 5-30 seconds later.
  • A few minutes later (if still moving), OS wakes app again. OS kills app again.
  • Repeat.

The crash loop keeps occuring until the user opens the app manually. Opening the app manually starts the app in the foreground and avoids the OS kill.

The component was introduced as part of #3743 so it's been around for about two months. I dunno if this problem manifested immediately after #3743 or if there was some other contributing factor which made it much more prevalent recently.

Anyway, this is probably the upstream ticket: Crash on Splash Screen iOS 12.3.1 (crazycodeboy/react-native-splash-screen#397) but I'm not certain because some of the symptom details differ. But I followed links to find a suggested work-around which seems to work.

Also, I've spun out a separate ticket: rn-splash-screen triggers background crash #3943

@bengtan bengtan changed the title RNBGL randomly stops uploading location data points [iOS] RNBGL randomly stops uploading location data points Jul 26, 2019
@bengtan bengtan changed the title [iOS] RNBGL randomly stops uploading location data points [ios] RNBGL randomly stops uploading location data points Jul 26, 2019
@mstidham
Copy link

Staging Version: 4.30.0
Closing this ticket, new tickets have been created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants