-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ios] RNBGL randomly stops uploading location data points #3926
Comments
I finally caught one of these today. First, I'll just describe what happened, and then discuss thoughts/opinions/suspicions afterwards. (These are a bit verbose because they also double as notes for future reference. See the short version later if you like.) Long storyI spent a couple of hours ish today driving around trying to reproduce this. I shared live location from a 'sharer' account (using iOS Staging 4.17) to another 'sharedWith' account (using android Staging 4.15). The version under test was the sharer (iOS Staging 4.17). The android account was just to observe in a convenient way. I drove around with both phones. The sharedWith phone was open all the time so I could observe the live location points being generated by the sharer phone. I must have made 10+ attempts. Sometimes I would stop and wait for 1+ minute, sometimes I would stop for 3-5 minutes, sometimes I would stop for about 20-40 seconds and then start driving again. Didn't reproduce. The sharer phone was screen-locked during this time. I was about to give up testing for the day when, on a whim, I swipe-killed the app on the sharer phone and continued testing. At the first attempt, it didn't reproduce. I decided to head home. Then, magically, the sharer phone got 'stuck' (ie. uploading of location data points stopped/paused) (I guess this is the second attempt after the swipe-kill). I drove around for another 10+ minutes just to confirm that new location data points weren't being generated/uploaded. (Hmmm ... I wonder if a recent swipe-kill is a precondition. Maybe it happens the second car drive after a swipe-kill, not the first?) Got home and took some notes. The sharer phone got 'stuck' at 2019-07-18 15:32 pm local time (GMT +8) (2019-07-18 07:32 UTC). I reached phone and inspected the server-side logs and confirmed the last location data point was at 15:32 GMT+8. I deliberately didn't touch the sharer phone for a long time. Finally, at 16:07 GMT+8 (08:07 UTC), I opened the (app on the) sharer phone and emailed myself the RNBGL location debug log. I confirmed that, upon open the app, it had uploaded new location data points at 16:07 GMT+8. So RNBGL had paused/stopped for 35 minutes. Looking in the debug log, I noticed this excerpt:
No log messages between 15:26 and 16:07. 15:26 was the last log message before being 'stuck'. 16:07 would have been because I had opened the app (to retrieve the debug log). So RNBGL didn't do anything during the 'stuck' 35 minutes. Looking in loggly, I observed that there were data points at these times: 2019-07-18 15:26:37.000 (GMT+8) 0785C9C2-9C66-43F2-A98D-011F14ED47F3 Wow, the three data points at 15:31/32, including the last 'stuck' point, didn't appear in the RNBGL log. Points 0785C9C2-9C66-43F2-A98D-011F14ED47F3 and F907DB4D-B59A-44F9-BFA8-C61E7FCB07F3 were logged. The others in the middle didn't get logged ... Almost as if RNBGL had uploaded them, but afterwards, it didn't get a chance to write to the log. At this point, I had hard evidence of RNBGL misbehaving. I had enough to post a ticket to the RNBGL developer. But I sat on it a bit. Then, on a whim, I had a look at bugsnag. Well, that's interesting. There are three 'Out of Memory' bugsnags at 16:07 GMT+8, and they're from my sharer phone. (No one else could possibly be running Staging 4.17 and iOS 12.2 on an iPhone 6.) Hmmm ... 16:07 is when I opened the app to retrieve the debug log. But the app didn't crash that I could see. So I'm thinking ... maybe 16:07 is when the bugsnag was posted, not when the crash happened. That would make sense. I wouldn't expect the app to be able to post a bugsnag if it had just been OOM-killed (OOM == 'out of memory'). Looking at the bugsnag some more (https://app.bugsnag.com/hippware/tinyrobot-1/errors/5d27d25a4dbe55001aaa94b0) there are more instances of it over the last 24 hours, and they're frequent enough that I think they correspond to instances when QA were testing and then ran into this issue. Short story and discussionMy current hypothesis is this: Something is causing the app to get OOM-killed. This prevents RNBGL from uploading any further location data points until the app is restarted. Normally, if the app is swipe-killed, RNBGL restarts (in the background) again when the phone is moving again. It seems like an OOM-kill is a bit more drastic and the app/RNBGL is unable/dis-allowed to restart (in the background). So ... a few things to think about:
So ... things to test:
I'd like to find out whether this issue occurs only in 1, or only in 2, or occurs in both. And:
(The OOM bugsnag is https://app.bugsnag.com/hippware/tinyrobot-1/errors/5d27d25a4dbe55001aaa94b0) |
Another round of testing today, and I got similar results to yesterday.
So ... that's two days in a row where: I couldn't reproduce after a normal background-ing of the app, but I easily reproduced after a swipe-kill ... BUT it happens at the start of the second trip after the swipe-kill, not the first. And, yes, there's an OOM bugsnag. In fact, I found out today that bugsnags have 'breadcrumbs' which is a timeline of events. In one of the bugsnags today, it clearly shows that the app/RNBGL was (about to be) terminated by the OS. So ... for me, I have a way to reproduce this issue, and it's worked 2 times out of 2. However, my instances are a bit different from those of @mstidham and @thescurry and Don. Their bugsnags don't have the 'App Will Terminate' breadcrumb. I dunno if this is significant or not. Certainly, the OOM kill (#3928) is one cause of this issue. Is it the main cause, or only one of several causes? I dunno. |
Today, I upgraded my iPhone from iOS 12.2 to 12.3.1. The behaviour changed. A lot. Now, the Staging (4.17) app doesn't upload location whilst in the background most of the time, unless I press the Home button (in order to light up the screen for a few seconds) or (presumably) unlock the phone. Whereas with 12.2, this problem didn't reproduce most of the time, with 12.3.1, this problem does reproduce most of the time. So it seems like the slightly different behaviour I was seeing was due to differences in iOS version. Apologies for misdirecting people. So what I'm seeing now is ... when the app is in the background, there's no uploading at all unless the phone is 'active'/unlock-ed for some other reason. ie Feels like iOS is batching up background execution of our app. But anyway, I did make progress on another front. I reproduced my (new) instances of this problem with the RNBGL app, and posted a ticket: transistorsoft/react-native-background-geolocation#778 Let's see what the RNBGL developer says. |
Updated my iPhone to iOS 12.4 today and re-tested. Same results. See transistorsoft/react-native-background-geolocation#778 (comment) More comments: This problem seems to be related to the stationary->moving transition. If I'm in the middle of driving (after having driven for 200+ metres) and I press Home on the iPhone, RNBGL wakes up and starts tracking ... and keeps tracking whilst I am still driving (I was driving for about an additional 5 ish minutes). |
RNBGL developer suggested rebooting the phone. He said that the iPhone should be rebooted after each iOS update. So I rebooted and did some testing. Then I realised that my testing over the last few days wasn't representative. I had a mobile hotspot with me and my phone was probably connecting to the mobile hotspot's wifi when it should be using cellular. So I redid my testing, making sure the phone wasn't connecting to mobile wifi. In the first test run (non-swipe-kill backgrounding), both tinyrobotStaging and the RNBGL demo app worked fine. (Albeit I needed to travel ~1000m instead of 200m before RNBGL would wake up) In the second test run, the RNBGL demo app crashed and stayed crashed. tinyrobotStaging worked fine. For the non-swipe-kill case, it seems to be working better. Not sure if this is due to the reboot or due to staying on cellular (instead of mobile hotspot wifi). Next, I tested swipe-kill but got unsure results. I'll continue with more testing tomorrow. |
Interesting. After reading about how iOS apps wake up to receive background location updates, I re-tested (Staging 4.17, iOS 12.4) without a mobile hotspot in the car. Wow. The performance and responsiveness is similar to my prior testing with iOS 12.2. I even feel that iOS 12.4 'feels' a bit more responsive, but this is just a feeling. Anyway, here is the summary:
So it seems I'm back to having a consistent way to reproduce one type of 'stuck'-iness. Next: Try to reproduce the 'stuck'-iness whilst the app is connected (and logging) to XCode. (However, I have to install both macOS mojave and XCode 10.3. This might burn an entire day.) Notes for future referenceApp connected to mobile hotspot wifi (iOS 12.3.1 and 12.4): Requires about 1500m of travelling for RNBGL to wake up and start tracking. Sometimes even 1500m was not even enough and needed a 'Press Home button'. App in car using cellular, mobile hotspot present (iOS 12.4): Requires about 1000m of travelling. App in car using cellular, no mobile hotspot present (iOS 12.4): Requires 50-500m of travelling. Tested in an urban setting: Bull Creek, Bateman, Western Australia. |
Breakthrough Turns out iPhone's keep crash logs on the phone. Goto Settings » Privacy » Analytics » Analytics Data (on iOS 12). For the instance of this issue that was reproduced in my testing as documented in my first comment, I found this crash report (iOS 12.2):
For the instance reproduced about two hours (mentioned in my immediately preceding comment), I found this crash report:
I think what's happening is that ... sometimes when the app/RNBGL wakes in the background in order to commence tracking, it takes too long to 'do something' and then iOS kills it. Then, sometimes, the app tries to wake a few minutes later to try again, takes too long, and gets killed again. This continues to happen until the user starts the app (in the foreground). I don't know what the app is trying to do. It might even be some react native initialisation code which takes too long. shrugs Need to do more debugging/logging. Logs |
Alright, found the main cause of this. Hopefully it is also the only cause, but we won't know pending further testing. There is a recently-introduced RN component we use: react-native-splash-screen. There is some sort of weird or buggy interaction with iOS such that ... if the app is started in the background, it takes too long or something, and iOS kills it. So what happens is:
The crash loop keeps occuring until the user opens the app manually. Opening the app manually starts the app in the foreground and avoids the OS kill. The component was introduced as part of #3743 so it's been around for about two months. I dunno if this problem manifested immediately after #3743 or if there was some other contributing factor which made it much more prevalent recently. Anyway, this is probably the upstream ticket: Crash on Splash Screen iOS 12.3.1 (crazycodeboy/react-native-splash-screen#397) but I'm not certain because some of the symptom details differ. But I followed links to find a suggested work-around which seems to work. Also, I've spun out a separate ticket: rn-splash-screen triggers background crash #3943 |
Staging Version: 4.30.0 |
This has been observed recently on Staging and Prod with 4.16 and 4.17 (and perhaps some other recent versions). There have been anecdotes for a while but wasn't ticketed earlier.
So far, this has only been observed and tested on iOS. It may also exist on Android but we haven't looked.
There is another subclass of symptoms which may or may not be related.
@thescurry thinks subclass 2 is fixed in 4.17.
(Background: Recent versions before 4.17, including 4.16, use RNBGL 3.0.0-rc5. Version 4.17 uses RNBGL 3.0.9.)
Anyway, both symptoms have been observed by QA over the last few days but the characteristics/preconditions/reproducing-steps/etc haven't been isolated yet.
The text was updated successfully, but these errors were encountered: