Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Watchdog timer and add support for NRF watchdog #2933

Merged
merged 36 commits into from
Jun 2, 2020

Conversation

xobs
Copy link
Collaborator

@xobs xobs commented May 20, 2020

This patchset adds a new wdt module based on the WDT class provided by Micropython: http://docs.micropython.org/en/latest/library/machine.WDT.html

Note that on nrf, once the WDT is enabled it can not be disabled or have its duration modified without resetting the device.

@xobs xobs force-pushed the wdt-nrf branch 2 times, most recently from 66b9384 to 4d04be9 Compare May 20, 2020 13:00
@xobs
Copy link
Collaborator Author

xobs commented May 20, 2020

Initially I had it enabled for all NRF platforms, however that caused several platforms to overrun their flash. So I have since disabled all platforms except Simmel.

@dhalbert
Copy link
Collaborator

Initially I had it enabled for all NRF platforms, however that caused several platforms to overrun their flash. So I have since disabled all platforms except Simmel.

I'm surprised by this, since there is still a lot of unused flash on nRF52480, even with an internal filesystem. What's an example of a platform that overflowed? Maybe you were compiling with optimization off?

@xobs
Copy link
Collaborator Author

xobs commented May 20, 2020

Let me re-enable them for all platforms and re-build.

@dhalbert
Copy link
Collaborator

@xobs ok, despite some missing builds (which are not boards), we can see that it didn't quite fit in the pca10100 build (by 80 bytes), so that's the (only) one to fix.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I have some API feedback.

ports/nrf/boards/common.template.ld Show resolved Hide resolved
shared-bindings/wdt/__init__.c Outdated Show resolved Hide resolved
shared-bindings/wdt/__init__.c Outdated Show resolved Hide resolved
shared-bindings/wdt/WDT.c Outdated Show resolved Hide resolved
shared-bindings/wdt/__init__.c Outdated Show resolved Hide resolved
shared-bindings/wdt/WDT.c Outdated Show resolved Hide resolved
shared-bindings/wdt/WDT.c Outdated Show resolved Hide resolved
shared-bindings/wdt/__init__.c Outdated Show resolved Hide resolved
@tannewt
Copy link
Member

tannewt commented May 20, 2020

It'd be awesome to have the result of the watchdog be configurable. For testing, raising an exception rather than restarting would be very helpful. I imagine it's possible that libraries take a while to return to the main code.

@xobs
Copy link
Collaborator Author

xobs commented May 21, 2020

I put together a messy patch that implements what you requested. Documentation is unclear, but I wanted to make sure you were okay with the API before documenting everything again.

Example usage:

Simple usage:

wdt = microcontroller.watchdog.WatchDogTimer(5.0)
print("Sleeping for 3 seconds (shouldn't exit)")
time.sleep(3)
wdt.feed()
print("Sleeping for 3 seconds (also shouldn't exit)")
time.sleep(3)
print("Sleeping for 3 seconds without feeding WDT (should exit)")
time.sleep(3)
print("Shouldn't get here")

Using a with block:

import microcontroller

with microcontroller.watchdog.WatchDogTimer(5.0) as wdt:
    print("Sleeping for 3 seconds (shouldn't exit)")
    time.sleep(3)
    wdt.feed()
    print("Sleeping for 3 seconds (also shouldn't exit)")
    time.sleep(3)
    print("Sleeping for 3 seconds without feeding WDT (should exit)")
    time.sleep(3)
    print("Shouldn't get here")

You can also add hardware=True to get the true Hardware watchdog timer.


For some reason, I'm unable to actually get this to work. It seems as though there's no way to set an exception in an interrupt routine, and while I'm setting mp_pending_exception this value is not immediately consulted. As a result, "Shouldn't get here" is printed, although time.sleep(3) returns immediately.

It looks like the Python VM only checks for exceptions on loop constructs. Additionally, when it sees an exception it calls nlr_pop(), but because we raised the exception in an interrupt handler there is no context to pop so it crashes.

ports/nrf/common-hal/microcontroller/__init__.c Outdated Show resolved Hide resolved
py/modbuiltins.c Outdated Show resolved Hide resolved
@tannewt
Copy link
Member

tannewt commented May 21, 2020

I put together a messy patch that implements what you requested. Documentation is unclear, but I wanted to make sure you were okay with the API before documenting everything again.

Totally ok. Feel free to ping me on Discord when you get to your computer in the morning and we can video chat to sync up.

Example usage:

Simple usage:

I'd do:

# By default the watchdog has infinite timeout and raises an exception on timeout.
wdt = microcontroller.watchdog 
# Now we set the timeout is 5 seconds and it will raise an exception. This may need to
# be implemented without the hardware WDT because it may only do real resets.
wdt.timeout = 5 
# This is where we start the real WDT to trigger a reset. Setting action to 
# anything else after will raise an exception (and likely lead to a WDT reset since it isn't fed 
# outside the VM.)
wdt.action = RESET 
print("Sleeping for 3 seconds (shouldn't exit)")
time.sleep(3)
wdt.feed()
print("Sleeping for 3 seconds (also shouldn't exit)")
time.sleep(3)
print("Sleeping for 3 seconds without feeding WDT (should exit)")
time.sleep(3)
print("Shouldn't get here")

With this API, the WatchdogTimer instance already exists. This is helpful because it's state will live across VMs.

Another thing to think about is the watchdog's relationship to safe mode. When off USB we probably want to boot up in normal mode after a watchdog. However, when on USB we may want to boot into safe mode or silently ignore action set to RESET if we started up on USB after a watchdog reset. If we do neither, we risk the user getting into a situation where they can't easily edit code.py due to a reset loop. There is a magic reset press to enter safe mode but most folks probably don't know about it.

Using a with block:

Why support a with block? It seems unhelpful because the nRF WDT can't be disabled.

For some reason, I'm unable to actually get this to work. It seems as though there's no way to set an exception in an interrupt routine, and while I'm setting mp_pending_exception this value is not immediately consulted. As a result, "Shouldn't get here" is printed, although time.sleep(3) returns immediately.

It looks like the Python VM only checks for exceptions on loop constructs. Additionally, when it sees an exception it calls nlr_pop(), but because we raised the exception in an interrupt handler there is no context to pop so it crashes.

You should be able to set the pending exception from an ISR. We won't want to raise from there though. Maybe delay_ms needs to mp_raise the pending_exception? Does it work as is if you use a for loop to delay?

xobs added a commit to simmel-project/circuitpython that referenced this pull request May 22, 2020
This refactors the WatchDogTimer API to use the format proposed in
adafruit#2933 (comment)

Signed-off-by: Sean Cross <sean@xobs.io>
@xobs
Copy link
Collaborator Author

xobs commented May 22, 2020

I've implemented the new API. Additionally, it now boots into safe mode if it's connected to USB and is reset due to a watchdog timeout.

I will work to disentangle the watchdog exception stuff from the vm.

@xobs
Copy link
Collaborator Author

xobs commented May 22, 2020

This patchset got big. But it was refactored twice. It contains the following features:

  • The watchdog is a singleton under microcontroller
  • Setting microcontroller.watchdog.mode = watchdog.WatchDogMode.RAISE works
  • If watchdog.WatchDogMode.RESET causes a reset, and the board is plugged in to USB, then it enters safe mode
  • When the watchdog timer hits, it calls reset_cpu()
  • Calling microcontroller.reset() now goes through reset_cpu()
  • The number of system ticks is saved in uninitialized memory, so the clock is mostly correct after a reset

Simmel exceeded flash by 500 bytes, so I shrank the filesystem by 4096 bytes.

pca10100 has now exceeded its flash. I could disable watchdog on this platform.

I'm not sure why xtensa failed to build. It looks like a configuration error on the server.

@dhalbert
Copy link
Collaborator

To get pca10100 to build (it's overflowing), change the options list in mpconfigboard.mk to this:

CIRCUITPY_AUDIOMP3 = 0
CIRCUITPY_BITBANGIO = 0
CIRCUITPY_BUSIO = 1
CIRCUITPY_COUNTIO = 0
CIRCUITPY_DISPLAYIO = 0
CIRCUITPY_FRAMEBUFFERIO = 0
CIRCUITPY_FREQUENCYIO = 0
CIRCUITPY_I2CSLAVE = 0
CIRCUITPY_NEOPIXEL_WRITE = 0
CIRCUITPY_NVM = 0
CIRCUITPY_PIXELBUF  = 0
CIRCUITPY_RGBMATRIX = 0
CIRCUITPY_ROTARYIO = 0
CIRCUITPY_RTC = 1
CIRCUITPY_TOUCHIO = 0
CIRCUITPY_ULAB = 0

SUPEROPT_GC = 0

This turns off several more modules and disables some extra optimization for GC. An alternative would be to shrink the internal filesystem size.

I would have a made a commit for this, but your fork isn't set to allow external changes via the PR mechanism.

@dhalbert
Copy link
Collaborator

By the way, it appears from the nRF52833 errata and datsheet that it does not suffer from the nRF52840 SPIM3 hardware bug. I'm not sure if you're avoiding SPIM3 for simmel or not, but we should not need to allocate the special 8KB SPIM3 RAM buffer on any nRF52833 board and also not include the special case handling of SPIM3.

xobs added a commit to simmel-project/circuitpython that referenced this pull request May 23, 2020
This removes some features that are largely unused in order to get the
image to fit.

Recommended in
adafruit#2933 (comment)

Signed-off-by: Sean Cross <sean@xobs.io>
@xobs
Copy link
Collaborator Author

xobs commented May 23, 2020

@dhalbert Thanks for the heads-up! I am indeed disabling SPIM3 because the 8 kB penalty is pretty hefty on this 128 kB part. I'll need to do some special casing to only enable the workaround for NRF52840 parts. I'll do that in a separate patch.

I've made the adjustments you requested, and it's building now.

@xobs
Copy link
Collaborator Author

xobs commented May 23, 2020

I discovered an issue where the Bluetooth system couldn't be re-initialzied after the hardware watchdog was set, due to the hardware watchdog timer using IRQ priority 1.

@dhalbert
Copy link
Collaborator

@tannewt The only build failures here are ESP32S2; is this fixed now?
@xobs Assuming ESP32S2 failures are a red herring, this just needs a merge from upstream and make translate to work, and then it should be fine.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor API suggestions and a request to refactor shared-bindings/common-hal. Thank you so much for doing this! It's looking really good.

ports/nrf/common-hal/watchdog/WatchDogTimer.c Outdated Show resolved Hide resolved
ports/nrf/common-hal/watchdog/WatchDogTimer.c Outdated Show resolved Hide resolved
shared-bindings/watchdog/WatchDogMode.c Outdated Show resolved Hide resolved
shared-bindings/watchdog/__init__.c Show resolved Hide resolved
@tannewt
Copy link
Member

tannewt commented May 26, 2020

@tannewt The only build failures here are ESP32S2; is this fixed now?

Ya, ESP32S2 should be happy again thanks to Jeff.

@xobs Assuming ESP32S2 failures are a red herring, this just needs a merge from upstream and make translate to work, and then it should be fine.

xobs added 2 commits May 27, 2020 11:28
With the WDT changes, building Circuit Python results in the following error:

/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: section .ARM.exidx LMA [00000000000621c8,00000000000621cf] overlaps section .data LMA [00000000000621c8,0000000000062383]

This is because unwinding data is getting generated, but has nowhere to go.
Re-enable this data in the linker script so it is saved.

Signed-off-by: Sean Cross <sean@xobs.io>
This adds shared bindings for a watchdog timer, based on the API
provided by micropython.

Signed-off-by: Sean Cross <sean@xobs.io>
xobs added 14 commits May 27, 2020 11:28
Check to see if the current exception is a Watchdog exception, if it's
enabled. This ensures we break out of the current sleep() if a watchdog
timeout hits.

Signed-off-by: Sean Cross <sean@xobs.io>
This finishes the rework of the exception handler, which is once
again stored inside the watchdog timer module.

This also implements a `watchdog_reset()` that is used to disable the
RAISE watchdog, if one is enabled.

Signed-off-by: Sean Cross <sean@xobs.io>
Signed-off-by: Sean Cross <sean@xobs.io>
Make this exception globally available to all platforms that have
enabled the watchdog timer.

Signed-off-by: Sean Cross <sean@xobs.io>
For `microcontroller.reset()`, don't manually call NVIC_SystemReset().
Instead, call the `port_reset()` in case the port wants to do any
cleanup.

Signed-off-by: Sean Cross <sean@xobs.io>
As part of the reset process, save the current tick count to an
uninitialized memory location.  That way, the current tick value will be
preserved across reboots.

A reboot will cause us to lose a certain number of ticks, depending on
how long a reboot takes, however if reboots are infrequent then this
will not be a large amount of time lost.

Signed-off-by: Sean Cross <sean@xobs.io>
The watchdog timer has increased the amount of code and text that's
required.

Signed-off-by: Sean Cross <sean@xobs.io>
This removes some features that are largely unused in order to get the
image to fit.

Recommended in
adafruit#2933 (comment)

Signed-off-by: Sean Cross <sean@xobs.io>
The previous setting of `1` meant that the bluetooth system couldn't be
used when the watchdog timer was enabled.

Signed-off-by: Sean Cross <sean@xobs.io>
Signed-off-by: Sean Cross <sean@xobs.io>
With this patch, the exception can now be caught:

    import microcontroller
    import watchdog
    import time

    wdt = microcontroller.watchdog
    wdt.timeout = 5

    while True:
        wdt.mode = watchdog.WatchDogMode.RAISE
        print("Starting loop -- should exit after five seconds")
        try:
            while True:
                time.sleep(10)
                # pass # This also works for a spinloop
        except watchdog.WatchDogTimeout as e:
            print("Watchdog Expired (PASS)")
        except Exception as e:
            print("Other exception (FAIL)")
    print("Exited loop")

This prints:

    Starting loop -- should exit after five seconds
    Watchdog Expired (PASS)
    Starting loop -- should exit after five seconds
    Watchdog Expired (PASS)
    Starting loop -- should exit after five seconds
    Watchdog Expired (PASS)

Signed-off-by: Sean Cross <sean@xobs.io>
This pulls all common functionality into `shared-bindings` and keeps
platform-specific code inside `nrf`. Additionally, this performs most
validation in the `shared-bindings` site.

The only validation that occurs inside platform-specific `common-hal`
code is related to timeout limits that are platform-specific.

Additionally, all documentation is now inside the `shared-bindings`
directory.

Signed-off-by: Sean Cross <sean@xobs.io>
Run `make translate` to generate strings for `shared-bindings/watchdog`.

Signed-off-by: Sean Cross <sean@xobs.io>
@xobs
Copy link
Collaborator Author

xobs commented May 27, 2020

I've moved most everything into shared-bindings, rebased on current master, and exposed the exception under the watchdog module. Additionally, most checks now are performed inside shared-bindings with the exception of some platform-specific range checks.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few final tweaks I had hoped to push to the branch but it doesn't look like maintainer edits were allowed. Instead, I've made a PR to this PR's branch: simmel-project#1

shared-bindings/watchdog/__init__.c Show resolved Hide resolved
Merge and a few final tweaks
@xobs
Copy link
Collaborator Author

xobs commented May 28, 2020

Thanks, I merged the PR.

simmel-project is an organization and not a user, and apparently you can't allow maintainer edits if it's an organization: isaacs/github#1681

@tannewt
Copy link
Member

tannewt commented May 28, 2020

Welp, we merged more translations and I can't merge for you. Mind adding me as a contributor to the Simmel repo?

@xobs
Copy link
Collaborator Author

xobs commented May 29, 2020

Sure, I added you as a maintainer.

@xobs xobs requested a review from tannewt May 29, 2020 04:28
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@tannewt tannewt merged commit db04fd1 into adafruit:master Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants