-
-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ble advertising logic #625
Conversation
Refactor ble advertising based on ble standards and conventions. Changes are based on the bleprph example code, bluetooth docs, and nimble docs.
I re-enabled the assert statements that have been commented out in the advertising function and I have not observed them fire. My nRFConnect shows the watch advertising in 64ms intervals. I set the interval in the call to ble_gap_adv_start to 5000 based on my nRFConnect logs which showed 5000 as the agreed upon parameters. I;m not clear on why the agreed connection parameters are 5000 ms and the actual interval on the watch is 64 ms. |
I implemented bonding with a passkey in my local branch and tested it to see if that impacted the re-connection issue. The only change I observed was that everytime the watch restarted advertising (i.g. central out or range) I would have to delete the bond to successfully re-connect. I believe this can be fixed by persisting the keys on the watch. |
Yeah getting proper bonding working would be great! Feel free to nag me on the discord as well I'm almost always around :) |
On Sun, 29 Aug 2021, Tim Keller wrote:
Yeah getting proper bonding working would be great! Feel free to nag me
on the discord as well I'm almost always around :)
I can force a bonded connection with or without a pass key on every
connect message. But that is super annoying if you are not attached to
your phone. If you get out of range (of the central) you have to go
through the whole connect sequence again.
The good news if that with a secure connection my android will recognize
pinetime as a trusted device. I loved that feature on my Pebble and
_really_ want that working properly on pinetime.
Thanks for the feedback.
|
After 4 days, 18 hours, and some minutes of continous bluetooth connectivity my watch and GB disconnected. I noticed the bluetooth icon was gone on the watch while looking at the time and pulled up GB to see "waiting for reconnect". I opened NRFConnect and the watch was not advertising. :-( I'm confident the watch received a disconnect event from nimble (because the ble icon was off) and it should have been advertising (the code says so). This is very frustrating. |
Another unexplained disconnect today that could only be fixed by rebooting the watch. I started this WIP to discover the root cause of the disconnects and now I do not believe this PR will resolve that issue. However, based on the specs, docs, and example nimble apps, I do feel that this is the right way to manage advertising. Hopefully @JF002 will discover a useful clue with the logic probe and sniffer. I am in need of some inspiration ;-| |
I'm in agreement, Do you think its in the scope of this PR to limit the advertising speed after a few attempts? To "save" power when its just idling advertising. |
Great suggestion! I'll dive back into the standard to see what is
prescribed (if anything) and add to this PR.
|
hi @evergreen22 , here https://github.com/espruino/Espruino/tree/master/targetlibs/nrf5x_12/components/ble is the bangle.js source code related to Bluetooth , it's based on open source nordic source code in order to achieve proper ble stack and may help to understand what pinetime lack. @JF002 since we use nordic , why we don't use their stack? since from i can see it's open source (otherwise esprino would not be able to modifiy it and share it) , and would resolve some ble problem. best regards |
The "linear decrease..." commit c32ba84 slows down the rate of advertisements to conserve battery when no connection is available. See the commit message for details on how it works. This chart visually describes the advertisement intervals versus time (smaller interval is faster advertising). I've tested this on my dev-kit and sealed unit with numerous connect-disconnect and power cycles and it is working. |
Thanks @lman0. |
Start advertising aggressively when powered on then slow down linearly over 75 seconds. This will conserve battery by not advertising rapidly the whole time we are seeking a connection. The slowest rate is approximately once every 4.5 seconds to balance responsiveness and battery life. We use a fixed advertising duration of 5 seconds and start with a 62.5 ms advertising interval. Every 5 seconds (the advertising duration) we step up to a larger advertising interval (slower advertising). We continue to increase the advertising interval linearly for 75 seconds from the start of advertising. At 75 seconds we have an advertising interval of 4.44 seconds which we keep until connected. A reboot will restart the sequence. When we receive a disconnect event we restart the sequence with fast advertising and then slow down as described above. Note that we are not using the BLE high duty cycle setting to change the advertising rate. The rate is managed by repeatedly setting the minimum and maximum intervals. The linear rate of decrease and the slowest interval size were determined experimentally by the author. The 5.3 Core spec suggests that you not advertise slower than once every 1.2 seconds to preserve responsiveness but we ignored that suggestion.
0275f86
to
c32ba84
Compare
I feel like we should be doing the 62.5 Advertising interval for about 30 seconds every-time the screen turns on since its a "priority" style thing. And then back off either linearly or in a jump after that. I'm not sure of the benefit of the linear drop...Some rough numbers are ~1.35mA -> 1.18mA after it fully scales down. Any input on why you picked a linear drop? thanks for putting this effort in :) |
Note to @geekbozu, somehow I've broken my client, so I'm recreating my response to you here (assuming you got the response in the first place). Thanks for the feedback and the discussion. I do appreciate it.
I don't see the point of advertising when we are already connected. We can only have one connection at a time on the watch. What do you mean by "priority style thing"?
Thanks for measuring this! 18 hours is nice and more than I expected. If anything, this confirms my intution that advertising is fairly energy efficient on this hardware.
Linear is simple. Simple code is easier for humans to understand and less likely to contain defects. My first approach was based on Ethernet's exponential backoff algorithm, However, that was too fast of a decrease imo for this application (and unnecessarily complex). This specific linear curve gives folks a full minute of fairly quick advertising to get connected but still feels responsive even after the 75 seconds. The worst case, after the advertising reaches the slowest interval, is that a connection will take 4.43 seconds. The average case will be 2.2 seconds and both feel pretty snappy imo. |
Ahh I explained poorly then. So all of this only applies when not connected and we should be advertising. So when not connected, We initially advertise at the 62.5 Interval for about 30 seconds. After those 30 seconds we back off to a more power friendly speed. Say 1.2 seconds like recommended by the spec. Or less if we determine its snappy enough. Now while not connected, When the screen turns on. We could consider going back up to the 62.5interval for responsiveness and compatibility. (Thats what I meant by "priority" ) Yeah 18 hours is quite a bit id say its a good trade off for the watch always advertising. |
So when not connected, We initially advertise at the 62.5 Interval for
about 30 seconds. After those 30 seconds we back off to a more power
friendly speed. Say 1.2 seconds like recommended by the spec. Or less if
we determine its snappy enough.
OK, I think I'm following. You're suggesting that instead of the linear
slow down in 15 steps use two steps.
The code would do this:
* When powered on, advertise at 62.5ms interval for 30 seconds then
change the interval to 1.2 seconds until connected.
* When a disconnect event is received, restart the two step sequence.
Now while not connected, When the screen turns on. We could consider
going back up to the 62.5interval for responsiveness and compatibility.
(Thats what I meant by "priority" )
I don't understand why we would advertise when the screen turns on.
|
While not connected is the key part I missed above. I think I see what you're saying now.
|
This reverts commit c32ba84.
On power up, advertise aggressively for at least 30 seconds then switch to a longer interval to conserve battery life. This fast/slow pattern is designed to balance connection response time and battery life. When a disconnect event is received restart the fast/slow pattern. When a failed connect event is received, restart the fast/slow pattern. When the screen is activated and ble is not connected, restart the fast/slow pattern. This pattern is consistent with Apple's BLE developer standards (QA 1931).
I've pushed the reworked advertising backoff scheme to PR #625. It now advertises agressively for at least 30 seconds then backs down to about one advertisement a second. It also restarts aggressive advertising under selected conditions. The chosen advertising intervals are consistent with Apple's developers guidelines for BLE and since we have a iOS app now that seems right. I'm ready to remove the 'WIP' designation if a dev is willing to give it an 'approve' review. Thanks to @geekbozu and @hubmartin for the feedback. |
I'm not qualified to provide a review however I'm testing this PR (plus #621) and will report back once I have > 24h uptime |
Thanks. More testing would be super helpful.
I've been running this for a couple of weeks on both my dev-kit and watch
and have not noticed any regressions. I have seen several days of
continuous connection, several days of advertising, and the mystery
disconnects.
I set out to remedy the disconnect problem, but sadly this has not fixed
it. However, this PR does get our ble advertising in spec with the ble
standards, Apple developer guidelines, and best practices for bluetooth.
|
Code looks great! I will actually pull some Power plots of this shortly so we can calculate a new "average" battery life! |
So i have been running this as a "daily driver' for a very small sample size of 2 days and its working great. Pairing is way more consistent and much more responsive. I use a self compiled version of gadget-bridge with the Auto Reconnect on Out of range enabled and I have yet to notice BT going down/not coming back after those events either now. Where before it would sometimes take a good minute to actually connect. Does not solve the 18Hour bug as i have come to call it, But hey still super nice! |
hi, @evergreen22 i have tested your pull and it do advertising faster since it not possible to sknow if the advertising is indeed restarted it would help other people , to help them unserstand that it do adversting.. thanks for your understanding |
lman0 there is a bug with InfiniTime where ALL BLE stops working after about 18 hours of runtime. As in such we have no easy way to know in firmware when this issue happens. As it would still show as advertising. Even thought bluetooth is broken. |
, @geekbozu, in my case , it's not in 18 h but that's why i wish that @evergreen22 add an icon when it's advertising so i can reliably said that the icon is not here when it should otherwise it will be word again word , and none wil be able to help properly , since there is no way to send log |
then i can say that , sadly , that this pull is not working properly on the long run but only a few times after reset |
by the way , @evergreen22 , if like @geekbozu say , the bluetooth stop working after 18h of runtime , |
You can't determine if the radio is advertising with an 'icon'.
|
Yeah...there is no way to know when we are advertising. In the code, on any connection drop or advertisement finish event finish we start advertising again. evergreen22@22571d4#diff-c69ced976eada3d4c4a15fd5ce542651ff381e21e2bb8cee746ac198c8847f89R152 We check if the call to advertise works, When it does we continue. If it does not we just reset the watch completely. So we have no way of knowing if we are actually advertising or not. Because as far as InfiniTime is concerned we are advertising. So an indicator will show as advertising even when its not. Since we actually have no way of knowing what the radio it self is actually doing. |
at very least @evergreen22 , you could make a icon showing when the "startAdvetising()" is launched , right? it should help regard less , to pin point use case that you may overlooked... |
by the way , @geekbozu what you say should be true then with the blutooth icon when we are connected , |
To continue the reply from @geekbozu
The assert will fire if nimble indicates that advertising failed to
start/restart in the advertising function. Neither of my test units are
restarting. No one else has reported their watch is restarting in the
advertising function. Therefore, we can conclude that nimble has told the
watch it is advertising. What you are suggesting is assert does not work.
I see no evidence to support your claim. I'm not saying you are wrong. I
am just unwilling to accept your claim without some concrete proof that
the assert does not work.
Everyone agrees there is a problem with intermittent disconnects and this
refactor is a step forward in identifying the cause because it conforms to
the ble standards and conventions. It allows us to eliminate
'unconventional or nonconforming' usage as a cause of the disconnects. We
are just as disappointed as you that the root cause has yet to be
discovered.
|
how i can prove my , claim if i have no way to prove that the the advertising have been started? if you don't do it , so be it , once a dev , think his code or test are 'perfect' and reject any way to prove otherwise i pray that you will not become like that |
I don't accept your claim without proof because you are arguing that the
compiler or build process is broken. My reluctance has nothing to do with
my PR code.
how i can prove my , claim
Can you identify code in InfiniTime that disables the assert macro or
otherwise prevents it from working?
|
You seem to understand this pretty well, it should be fairly trivial to display an icon, why don't you do it? |
i'm an user @evergreen22 , not a dev , and i feel like talking to support that say their program is 'perfect"' |
i think @kieranc that @evergreen22 is good dev since he tackle a problem know to be very difficult to resolve even more when he say that his unit test / use case are perfect ... |
Who said that? |
when he say , @kieranc , that his assert are working properly by the way , @evergreen22 say "Neither of my test units are restarting. No one else has reported their watch is restarting in the advertising function. " |
Its not 'his' assert, its an assert, one of hundreds in the code, and if it wasn't working as expected then many many other things would be broken.
He means he has 2 watches running code which should reboot the watch if advertising is not running (ie, the same approach as display an icon if its running/not) and no watches are rebooting, so as far as the code is concerned, the watch is advertising. We know the advertising is not working correctly, and while this pr doesn't fix it, it fixes other things which will hopefully make finding the issue with advertising easier. |
I don't really know how this works, but obviously StartAdvertising() must have succeeded because of the asserts. The only other thing I can think of is if StartAdvertising() somehow doesn't get called. This could stop the advertising from working until reset if there are no new events that call it right? Previously waking the watch up would restart advertising. However with how
StartAdvertising() hasn't changed significantly. It still works basically the same way as before, but with more asserts.
If it isn't restarting, that means StartAdvertising() is working. |
As explained previously, the 'icon' would be on all the time. @lman0 is taking this personally and I don't mean it that way. I assumed they were a programmer and would understand my ask, However, since @lman0 is not a programmer, I concede it is an unreasonable request and I apologize. Thank you @lman0. You have inspired a new line of thought about troubleshooting the disconnects. My test builds enable the assertions. They are not enabled in the production build. You have caused me to wonder if they are enabled in nimble - I shall investigate - thank you. |
thanks @evergreen22 i wish you good luck , in your investigation ! |
thanks @evergreen22 i wish you good luck , in your investigation
You are most welcome.
For those who wish to build a debug version against 1.4.0 , here is what
you need:
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 67bbc83..c651352 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -777,8 +777,8 @@ add_definitions(-D__STACK_SIZE=1024)
add_definitions(-D__HEAP_SIZE=4096)
# NOTE : Add the following defines to enable debug mode of the NRF SDK:
-#add_definitions(-DDEBUG)
-#add_definitions(-DDEBUG_NRF_USER)
+add_definitions(-DDEBUG)
+add_definitions(-DDEBUG_NRF_USER)
if (NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release")
|
@evergreen22 Thank you so much for this PR! Even if it doesn't fix all our BLE issues, it'll probably improve the reliability of the (re)connection to the phone 👍 I'll try to answer to some of the comments I've read in this review.
Nordic stack (SoftDevice) is not open source, but the SDK is. The SoftDevice is provided as a binary file (/components/softdevice/s132/hex/s132_nrf52_6.1.1_softdevice.hex) you have to flash at a specific location in flash. Then, the software (using the open source SDK) is able to call functions from this binary file using some linker script magic. This is a smart way to allow open source software to use a proprietary stack, as the software does not link directly with the proprietary code. At the beginning, InfiniTime used the NRF SoftDevice as BLE stack, but I decided to swtich to NimBLE because I wanted InfiniTime to be a fully FOSS firmware ;) The switch was done in InfiniTime 0.5. @evergreen22 @geekbozu Regarding fast/slow advertising, I think you chose the best compromise (fast advertising for 30s, then slow advertising). 👍 @lman0 Regarding the advertising icon, it could probably be useful to tell the user the watch is actively advertising and trying to the connect to a host device. Maybe some users will find this additional feedback useful, maybe others will just find it confusing... I don't really know. However, in this case, it won't be useful at all as our BLE issues cause inconsistencies between InfiniTime and the radio : InfiniTime (and NimBLE) really think that the watch is advertising while the radio outputs nothing into the air. Which mean that the advertising icon will be displayed even when advertising stopped working for any reason. It's a bit late, I'll continue the review tomorrow ! |
FWIW, I've been running this PR for nearly a week and haven't noticed any regressions, it seems to reconnect automatically more reliably than before |
Here's a DFU file for testing this PR (built on 6356c7f). |
This is my WIP attempt to resolve the ble re-connect issue and get more reliable ble behavior. Comments, discussion, and suggestions are welcome. I'm testing on both my watch and dev-kit. The watch has been connected now for 3+ days. The dev-kit still fails periodically. I'm using both nRFConnect and GB for testing. I am (still) waiting on the breakout to connect my Segger JLink and gaze into the dumpster fire that is ble.