Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDump: assert failed in vTaskStepTick (IDFGH-8319) #9797

Open
3 tasks done
ztefanjo opened this issue Sep 16, 2022 · 56 comments
Open
3 tasks done

CoreDump: assert failed in vTaskStepTick (IDFGH-8319) #9797

ztefanjo opened this issue Sep 16, 2022 · 56 comments
Assignees
Labels
Status: In Progress Work is in progress Type: Bug bugs in IDF

Comments

@ztefanjo
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v4.4.2

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

CMD

Development Kit.

Custom Board

Power Supply used.

Battery

What is the expected behavior?

No coredump

What is the actual behavior?

We collect coredumps from our test-fleet. The attached coredump has been reported once so far and indicates that the vApplicationSleep is not working fully.
The following log can be seen in the dump:
assert failed: vTaskStepTick tasks.c:2951 (( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime)")

core-dump_51.1_Wifi-BLE-App_42.3_220915-140032.elf.txt

Steps to reproduce.

Unfortunately, this is not known, as the coredump was fetched from the device a while after it crashed.

Debug Logs.

No response

More Information.

No response

@ztefanjo ztefanjo added the Type: Bug bugs in IDF label Sep 16, 2022
@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 16, 2022
@github-actions github-actions bot changed the title CoreDump: assert failed in vTaskStepTick CoreDump: assert failed in vTaskStepTick (IDFGH-8319) Sep 16, 2022
@ztefanjo
Copy link
Author

ztefanjo commented Nov 3, 2022

@zikalino : This error still happens occasionally while using LightSleep and we are very close to our final customer release. Please let us know what we can do to avoid the error (any patch/configuration change or similar?).

@espressif-bot espressif-bot added Status: Selected for Development Issue is selected for development and removed Status: Opened Issue is new labels Apr 13, 2023
@aircable
Copy link

I can also confirm this as a bug. It happens during Wifi initialization and is likely a race condition when light sleep power management is enabled. This is on the ESP32-S3R8V and the master branch of the IDF.
Sometimes I can get a build that does not run into this problem, but then I just have to disable light sleep.
Also note, FreeRTOS is configured with TICKLESS_IDLE at 1000Hz.
So far I have not seen the problem on the ESP32 we use in a different product, only on the S3.

Stack trace:

assert failed: vTaskStepTick tasks.c:3061 (( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime)

Backtrace: 0x40375cfa:0x3fcb18b0 0x40382389:0x3fcb18d0 0x40388bb2:0x3fcb18f0 0x4038348a:0x3fcb1a10 0x4037b68d:0x3fcb1a30 0x403831e3:0x3fcb1a50
0x40375cfa: panic_abort at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/esp_system/panic.c:451

0x40382389: esp_system_abort at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/esp_system/port/esp_system_chip.c:84

0x40388bb2: __assert_func at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/newlib/assert.c:81

0x4038348a: vTaskStepTick at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:3061 (discriminator 1)

0x4037b68d: vApplicationSleep at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/esp_pm/pm_impl.c:622

0x403831e3: prvIdleTask at /home/juergen/AIRcableLLC/ESP_SW/NEWEST/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4363

@HiFiPhile
Copy link
Contributor

Any update ?

It's still present on v5.1-rc1 once FreeRTOS tick changed from 100Hz to 1000Hz.

@expresspotato
Copy link

This is also present on an ESP32, running v4.4.6 and a 1000 Hz tick rate and BLE.

assert failed: vTaskStepTick tasks.c:2999 (( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime) Backtrace: 0x4008277e:0x3ffbc840 0x40096c29:0x3ffbc860 0x4009fcb1:0x3ffbc880 0x4009aac2:0x3ffbc9a0 0x4008ebe4:0x3ffbc9c0 0x4009a696:0x3ffbc9e0 0x4009d235:0x3ffbca00 0x4008277e: panic_abort at /Users/kevin/esp/esp-idf-v4.4.6/components/esp_system/panic.c:408 0x40096c29: esp_system_abort at /Users/kevin/esp/esp-idf-v4.4.6/components/esp_system/esp_system.c:137 0x4009fcb1: __assert_func at /Users/kevin/esp/esp-idf-v4.4.6/components/newlib/assert.c:85 0x4009aac2: vTaskStepTick at /Users/kevin/esp/esp-idf-v4.4.6/components/freertos/tasks.c:2999 (discriminator 1) 0x4008ebe4: vApplicationSleep at /Users/kevin/esp/esp-idf-v4.4.6/components/esp_pm/pm_impl.c:666 0x4009a696: prvIdleTask at /Users/kevin/esp/esp-idf-v4.4.6/components/freertos/tasks.c:4082 0x4009d235: vPortTaskWrapper at /Users/kevin/esp/esp-idf-v4.4.6/components/freertos/port/xtensa/port.c:142

@tshcherban
Copy link

tshcherban commented Jan 8, 2024

The same for me, ESP32-S3, IDF latest development (master). Power management, automatic light sleep, DFS (10..160 MHz) enabled, 1000 Hz tick rate. Using BLE listening, WiFi connectivity and some Serial printing runtime stats (arduino as a component). Asserting at different places/times, sometimes can run for a few minutes, sometimes crashes even before successful connecting to WiFi.
From other threads (STM32, EFM32) I've found this PR.

P.S. Logged the values before ASSERT, got:

xTickCount 160206, xTicksToJump 4, xNextTaskUnblockTime 160209
xTickCount 198206, xTicksToJump 4, xNextTaskUnblockTime 198209

xTicksToJump is always ahead of xNextTaskUnblockTime - xTickCount by 1, so i've modified the fix from a RTOS repo to include a case for greater than:

//configASSERT( ( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime );
if( ( xTickCount + xTicksToJump ) > xNextTaskUnblockTime )
{
    TickType_t toRemove = xTicksToJump - (xNextTaskUnblockTime - xTickCount);
    /* Arrange for xTickCount to reach xNextTaskUnblockTime in
     * xTaskIncrementTick() when the scheduler resumes.  This ensures
     * that any delayed tasks are resumed at the correct time. */
    configASSERT( uxSchedulerSuspended );
    configASSERT( xTicksToJump != ( TickType_t ) 0 );

    /* Prevent the tick interrupt modifying xPendedTicks simultaneously. */
    taskENTER_CRITICAL(&xKernelLock);
    {
        xPendedTicks += toRemove;
    }
    taskEXIT_CRITICAL(&xKernelLock);
    xTicksToJump -= toRemove;
}
else
{
    mtCOVERAGE_TEST_MARKER();
}

That workaround allows S3 to run for a long without reboots. @ztefanjo @HiFiPhile maybe you'll give a shot before we get a real reason and a bugfix for this.
P.P.S. with BLE disabled - that issue is not happening. Also if DFS lower bound (CPU_FREQ_MIN) changed from 10 to 20 MHz (while active BLE) - issue also not happening.

@hbf-sthlm
Copy link

Any news on this bug @sudeep-mohanty ? Do you know when it could be solved, approximately? Thank you.

/Henrik

@sudeep-mohanty
Copy link
Collaborator

Hi @hbf-sthlm, Unfortunately we do not have a timeline yet for this issue. What would help us in debugging this is if you could provide a minimal example/application which can reproduce this issue.

@hbf-sthlm
Copy link

Thank you for a quick response @sudeep-mohanty.

Actually, in my case, I found that the problem didn't appear when I powered the ESP32-S3 independently (not from the USB port of my computer). When measuring current consumption with the Power Profiler Kit II from Nordice Semiconductor, I could see that the ESP32-S3 did indeed enter light sleep mode without crashing repeatedly. The ESP32-S3 only crashed when exiting light sleep if I powered it with the USB port on my computer. That seemed strange, and after a lot of running in circles I found the following information:

https://docs.espressif.com/projects/esp-idf/en/v5.2/esp32s3/api-reference/kconfig.html#config-usj-no-auto-ls-on-connection

I interpret above linked section of the ESP-IDF docs to mean that it is expected that the ESP32-S3 should crash when exiting light sleep while being connected via USB to a computer.

I would suggest you add this information a little more prominently, for example in the Power Management, Sleep Modes and other relevant parts of the ESP-IDF online documentation.

Also, is this a bug that will be solved? It would be good if the ESP32-S3 didn't crash while entering/exiting light sleep while connected to a computer.

Thx

@TD-er
Copy link
Contributor

TD-er commented Feb 28, 2024

Just to be clear, this is when using native USB, thus the USB port directly connected to the ESP and the ESP acting as an USB device?

@hbf-sthlm
Copy link

@TD-er Exactly, I am using the native USB of the ESP32-S3 and its connected from my computer to the USB port of the ESP32-S3 board I am using for my current project. In addition to the unofficial board I am using in my project, I have this morning also tested with an official ESP32-S3-DevKitC-1 from Espressif, and the "problem" is apparent there as well. But as mentioned in the docs (here: https://docs.espressif.com/projects/esp-idf/en/v5.2/esp32s3/api-reference/kconfig.html#config-usj-no-auto-ls-on-connection) it seems like this is expected behaviour.

@tshcherban
Copy link

tshcherban commented Feb 28, 2024

I've faced the problem regardless Native USB or Serial monitor (and pwoering externally). It seems to be more related to the BLE functionality.

@HiFiPhile
Copy link
Contributor

HiFiPhile commented Feb 28, 2024

I've faced the problem regardless Native USB or Serial monitor (and powering externally). It seems to be more related to the BLE functionality.

Me too, I'm using ESP32-C3 without USB function, I'll try if I can make a minimal example.

@ztefanjo
Copy link
Author

Also in our case, it happened without USB. But it was a while since I saw the error now.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Selected for Development Issue is selected for development labels Nov 1, 2024
@michaelboeding
Copy link

michaelboeding commented Nov 9, 2024

Also seeing this error. I have BLE running also thinking that has something to do with it. Any updates on this? Using an ESP32-S3

@sudeep-mohanty
Copy link
Collaborator

Hi, We are investigating potential solutions to avoid this assert failure and would update the thread once we have a viable one. Thanks.

@michaelboeding
Copy link

In the meantime are there any work arounds for this? Can I disable my BLE sever when I have an active Wi-Fi connection and then only run a ble scan so often on a new wakeup? Or something like that.

@ztefanjo
Copy link
Author

@michaelboeding , if I remember correctly, the workaround that "solved" the problem for us was to increase CONFIG_FREERTOS_IDLE_TIME_BEFORE_SLEEP from 2 to 3 (which is the default value).

@sudeep-mohanty
Copy link
Collaborator

sudeep-mohanty commented Nov 18, 2024

Hi,
Some recommendations that could avoid the issue are as listed below. Please note that these are not fixes so you may still see the issue however we expect the frequency of occurrence to be lower. It would be wise to perform a smoke test of your application once you turn on one or more of these settings -

  • Select PM_SLP_IRAM_OPT in menuconfig. This will avoid fluctuations in sleep processing time due to cache misses.
  • Deselect ESP_SLEEP_POWER_DOWN_FLASH in the menuconfig. Flash power-off will add extra waiting time after wake-up.
  • Increase min_freq_mhz in esp_pm_configure. This will cause the CPU to run the code that exits the sleep process at a higher frequency.
  • Use a more stable slow clock source, such as external 32K crystal.
  • Increase FREERTOS_HZ to 1000 in menuconfig
  • Don't let the chip sleep for too long at a time. The longer the sleep time, the more errors the RC clock accumulates, and the easier it is to miss a tick.

cc: @esp-wzh

@michaelboeding
Copy link

Yep will try now I'm currently in the process of debugging this. Thanks for the help this late (if its late for you) @esp-wzh

@michaelboeding
Copy link

michaelboeding commented Dec 1, 2024

Hey @esp-wzh tried to apply the patch but got the following error

(base) MacBook-Pro:esp-idf user$ git apply /Users/user/Desktop/IDFGH-8319.patch
error: patch failed: examples/wifi/power_save/sdkconfig.defaults:48
error: examples/wifi/power_save/sdkconfig.defaults: patch does not apply
(base) MacBook-Pro:esp-idf user$ 

Am I doing something wrong here? I am running the latest stable version so v5.3.1

Update I applied the patch with the --reject flag and got the following

(base) MacBook-Pro:esp-idf user$ git apply --reject /Users/user/Desktop/IDFGH-8319.patch 
Checking patch components/esp_hw_support/include/esp_private/esp_sleep_internal.h...
Checking patch components/esp_hw_support/sleep_modes.c...
Checking patch components/esp_wifi/src/wifi_init.c...
Checking patch examples/wifi/power_save/sdkconfig.defaults...
error: while searching for:
CONFIG_FREERTOS_HZ=1000
CONFIG_ESP_PHY_MAC_BB_PD=y

CONFIG_IDF_EXPERIMENTAL_FEATURES=y
error: patch failed: examples/wifi/power_save/sdkconfig.defaults:48
Applied patch components/esp_hw_support/include/esp_private/esp_sleep_internal.h cleanly.
Applied patch components/esp_hw_support/sleep_modes.c cleanly.
Applied patch components/esp_wifi/src/wifi_init.c cleanly.
Applying patch examples/wifi/power_save/sdkconfig.defaults with 1 reject...
Rejected hunk #1.
(base) MacBook-Pro:esp-idf user$ 

Are these all the important items? I'm also using an ESP32-S3 just for reference

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 1, 2024

Sorry, I accidentally added unnecessary changes to the patch. The patch has been updated. Please download it again and try again. @michaelboeding

@michaelboeding
Copy link

Hey @esp-wzh no problem! Thanks patch applied and now running the test. Should know within about 10 minutes if it failed...standby.

@michaelboeding
Copy link

Hey @esp-wzh , Seems to have the same crash at about 13 minutes. Are we sure this isn't somehow related to ble running also? I reread the whole thread and it seems like @tshcherban had the most compelling explanation

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 1, 2024

Okay... I need to ask my BLE colleagues tomorrow if they have any opinions. I will update here if there is any progress. You can use tshcherban's workaround for now.

@michaelboeding
Copy link

michaelboeding commented Dec 1, 2024

@esp-wzh Thanks for the help! Let me know I could also be completely wrong about the BLE but not sure what else to try. I do have one question based on his comments though. He mentioned he was able to get the crash to not occur when he:

"P.P.S. with BLE disabled - that issue is not happening. Also if DFS lower bound (CPU_FREQ_MIN) changed from 10 to 20 MHz (while active BLE) - issue also not happening."

Are 10/20 MHz even valid ranges? From everything I've seen the lowest is 40 MHz. I'm planning to try his fix now with my config looking like this

// Configure Bluetooth modem sleep with higher minimum frequency
//https://github.com/espressif/esp-idf/issues/9797
esp_pm_config_esp32_t pm_config = {
    .max_freq_mhz = 160,  // Increased from 80
    .min_freq_mhz = 20,   // Increased from 40 to 80 to try and get rid of the tickless idle error thing 20
#if CONFIG_FREERTOS_USE_TICKLESS_IDLE
    .light_sleep_enable = true // this should be true but makuing it false so we get uart output for now 
#endif
};
ESP_ERROR_CHECK(esp_pm_configure(&pm_config));

Also I doubt this is related but I am also setting the WiFi power save mode to

 esp_wifi_set_ps(WIFI_PS_MAX_MODEM);

After configuring the DFS.

Update I actually just looked at the implementation of the power management code and it seems like there is a hard minimum of 2 MHz for the low frequency but I'm assuming that's only if you are using a 32 xtal (which I am). So technically I could make the min 10/20 MHz like he suggested or even go as low as 2 MHz? References below

#elif CONFIG_IDF_TARGET_ESP32S3
/* Minimal divider at which REF_CLK_FREQ can be obtained */
#define REF_CLK_DIV_MIN 2


if (min_freq_mhz < xtal_freq_mhz && min_freq_mhz * MHZ / REF_CLK_FREQ < REF_CLK_DIV_MIN) {
    ESP_LOGW(TAG, "min_freq_mhz should be >= %d", REF_CLK_FREQ * REF_CLK_DIV_MIN / MHZ);
    return ESP_ERR_INVALID_ARG;
}

Main reason for asking about this is it seems the crash doesn't happen when the minimum is set to 20 MHz....but I want to make sure it's actually setting it to 20 MHz and it's not just defaulting to the max frequency and not causing DFS or something like that. Basically want to make sure it's not making me "think" it's working when it's actually not.

@michaelboeding
Copy link

The same for me, ESP32-S3, IDF latest development (master). Power management, automatic light sleep, DFS (10..160 MHz) enabled, 1000 Hz tick rate. Using BLE listening, WiFi connectivity and some Serial printing runtime stats (arduino as a component). Asserting at different places/times, sometimes can run for a few minutes, sometimes crashes even before successful connecting to WiFi. From other threads (STM32, EFM32) I've found this PR.

P.S. Logged the values before ASSERT, got:

xTickCount 160206, xTicksToJump 4, xNextTaskUnblockTime 160209
xTickCount 198206, xTicksToJump 4, xNextTaskUnblockTime 198209

xTicksToJump is always ahead of xNextTaskUnblockTime - xTickCount by 1, so i've modified the fix from a RTOS repo to include a case for greater than:

//configASSERT( ( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime );
if( ( xTickCount + xTicksToJump ) > xNextTaskUnblockTime )
{
    TickType_t toRemove = xTicksToJump - (xNextTaskUnblockTime - xTickCount);
    /* Arrange for xTickCount to reach xNextTaskUnblockTime in
     * xTaskIncrementTick() when the scheduler resumes.  This ensures
     * that any delayed tasks are resumed at the correct time. */
    configASSERT( uxSchedulerSuspended );
    configASSERT( xTicksToJump != ( TickType_t ) 0 );

    /* Prevent the tick interrupt modifying xPendedTicks simultaneously. */
    taskENTER_CRITICAL(&xKernelLock);
    {
        xPendedTicks += toRemove;
    }
    taskEXIT_CRITICAL(&xKernelLock);
    xTicksToJump -= toRemove;
}
else
{
    mtCOVERAGE_TEST_MARKER();
}

That workaround allows S3 to run for a long without reboots. @ztefanjo @HiFiPhile maybe you'll give a shot before we get a real reason and a bugfix for this. P.P.S. with BLE disabled - that issue is not happening. Also if DFS lower bound (CPU_FREQ_MIN) changed from 10 to 20 MHz (while active BLE) - issue also not happening.

Hey @tshcherban could you give a little more context as to what file you changed the above code in to make this bug go away? Thanks

@tshcherban
Copy link

tshcherban commented Dec 1, 2024

The file is tasks.c
I have not debugged that too deep because had to complete a project. I have just made a dumb change to the comparison logic include greater condition
#9797 (comment)
Anyway, I had to turn off DFS completely since it caused serial communication problems due to APB changes (have not configured reftick)
Will try debugging this soon if I find my minimum reproducible example.
P.S. I was observing a failure just using BLE+DFS+ALS, and it was failing right away on first sleep attempt.

@tshcherban
Copy link

tshcherban commented Dec 1, 2024

my changes are the following:

change a comparison condition to GreaterThan:
if( ( xTickCount + xTicksToJump ) == xNextTaskUnblockTime ) -> if( ( xTickCount + xTicksToJump ) > xNextTaskUnblockTime)

inside the condition add value indicating number of ticks to skip:
TickType_t toRemove = xTicksToJump - (xNextTaskUnblockTime - xTickCount);

change increment to account new value (whithin taskENTER_CRITICAL protected section):
xPendedTicks++; -> xPendedTicks += toRemove;

change decrement of xTicksToJump to account new value:
xTicksToJump--; -> xTicksToJump -= toRemove;

image

My current level does not allow me to understant the root cause of this problem, im not certain of the reason this code exists and what should it do under the normal circumstances, so i used that ugly fix sticky tape to just avoid a crash. However maybe it will give some clue to the more experienced developers what should be done in a proper way.

@michaelboeding
Copy link

michaelboeding commented Dec 1, 2024

Hey @tshcherban thanks for this I will give it a try! I actually ran the system overnight with the minimum frequency at 20 MHz and the crash didn't occur once for DFS. But I'm not convinced that it actually is using the DFS at 20 and not just reverting to the max frequency when I set it to 20 MHz. I tried to look through the code and it seems maybe it does but I would like someone at espressif to confirm this. And if it does maybe that can be another hint to what is actually wrong. Could you confirm this @esp-wzh

Update:

I think I may have been fooling me into thinking it was working because it was crashing on the first run with a truncated debug log. See below. I guess this then resets and reverts the DFS settings? It would still be nice to know the min frequency for the device.

 (43200) CommunicationManager: Event received: 20
I (43205) CommunicationManager: Size of the queue: 0
I (43211) CommunicationManager: WiFi connecting event received
I (43218) CommunicationManager: WiFi connecting event received
I (43224) MainApp: Setting up power config in standard tracker
I (43231) MainApp: Setting up wifi power config with 20MHz min freq
I (43240) pm: Frequency switching config: CPU_MAX: 160, APB_MAX: 80, APB_MIN: 20, Light sleep: ENABLED
I (43248) wifi:<ba-add>idx:0 (ifx:0, 5c:a6:e6:f9:ff:a2), tid:0, ssn:4, winSize:64
I (43257) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (43263) sleep: Code start at 0x42000020, total 1543550, data start at 0x3c000000, total 33554432 Bytes
--- 0x42000020: _stext at ??:?

I (43273) wifi:Set ps type: 2, coexist: 1


assert failed: vTaskStepTick tasks.c:3059 (( xTickCount + xTicI (31) boot: ESP-IDF v5.3.1-dirty 2nd stage bootloader
I (31) boot: compile time Nov 30 2024 23:43:41

Not sure if this means anything also but it seems the --- 0x42000020: _stext at ??:? may be related somehow to #11558 ?

Update Again:

I did test @tshcherban solution but it didn't seem to work for me - I still get the error at about the same time interval.

@michaelboeding
Copy link

michaelboeding commented Dec 1, 2024

Hey @esp-wzh I have completed some more testing on this.

1st Test:

  1. Kept the patch added
  2. Disabled BLE advertising
  3. Wifi connection with max modem power savings enabled

Result : the device would completely power off. Note that we do have a pin that asserts power to the while board but it's an RTC pin and enabled with a hold.

2nd Test:

  1. Removed the patch added
  2. Disabled BLE advertising
  3. Wifi connection with max modem power savings enabled.

Result:

The device would run normally for about 30-45 minutes now. Much longer than when ble was enabled also.

Any suggestions or other steps for testing a fix would be appreciated. Launching this product soon and having this feature work is pretty critical since we need it to last a long time on battery when connected to wifi. Thanks

Update on further running it seems like this device also completely powered off. Maybe 1 hour in or so.

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 2, 2024

Not sure if this means anything also but it seems the --- 0x42000020: _stext at ??:? may be related somehow to #11558 ?

This is not a problem, this log is from idf_monitor trying to perform addr2line on the hexadecimal number printed by the chip.

the device would completely power off

Does this mean that the rtc_pin in the hold state of esp32s3 has lost its output? Can we know the status of esp32s3 when the problem occurs, from log or current wave?

Any suggestions or other steps for testing a fix?

From the config you attached. Could you try the following configuration to see if it affects this issue?

  1. Enable CONFIG_PM_SLP_IRAM_OPT and CONFIG_ESP_WIFI_SLP_IRAM_OPT (It is strongly recommended to enable!!)
  2. Try Disable CONFIG_ESP_PHY_MAC_BB_PD ?
  3. esp32s3 is a dualcore chip, why use unicore mode? (CONFIG_FREERTOS_UNICORE=y)

The BLE team said that they have not encountered such problems under cpu_max=160 / cpu_max=40 configuration.

If you can provide a minimal reproducible routine, it will help us find the root cause quickly. @michaelboeding

@michaelboeding
Copy link

michaelboeding commented Dec 2, 2024

Hey @esp-wzh

  1. Got it - not related then.

  2. There is no output since it's in light sleep - unless there is a way to make sure all logs come through while in light sleep? The connection will disconnect/reconnect as its running and switching between.

  3. I will try all of these configs now and get back to you. The reason im not using the dual core mode is im trying to save on power savings and followed the guide here that mentions enabling this https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/low-power-mode/low-power-mode-soc.html?highlight=config_freertos_unicore . Is this incorrect? Also in this guide it also says to put the CONFIG_PM_SLP_IRAM_OPT as "OFF".

  4. Would it be worth trying different wifi power modes to see if that has something to do with it or if it is just related to DFS?

  5. This is a pretty huge project - so creating a minimal example may be challenging and under the current time constraints may not be feasible. But I can try to put something together.

Update on this I have tried all of the above suggestions and I still get the crash at about 14 minutes into operation. Going to try some other configs now also.

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 2, 2024

There is no output since it's in light sleep - unless there is a way to make sure all logs come through while in light sleep? The connection will disconnect/reconnect as its running and switching between.

Maybe esp_rom_output_tx_wait_idle in esp_rom_uart.h is what you want.

Is this incorrect? Also in this guide it also says to put the CONFIG_PM_SLP_IRAM_OPT as "OFF".

Oh, sorry for this, here is just the default configuration, the default configuration is for memory optimization, but if there is enough memory, all xxx_IRAM_OPT options should be turned on as much as possible, especially for scenarios such as WiFi that require high accuracy of wake-up time point during sleep.

Would it be worth trying different wifi power modes to see if that has something to do with it or if it is just related to DFS?

all ps modes will wake up the chip, but the wake-up frequency is different. It should not be the root cause, but it will affect the probability of recurrence.

@michaelboeding
Copy link

michaelboeding commented Dec 2, 2024

  1. How would I use esp_rom_output_tx_wait_idle? Does it just need to be called once in the code? I think that would also not let the device go into light sleep? Where we do want light sleep we just want it to send logs as it switches modes.

  2. Got it - will leave all of these enabled definitely thought that was for power savings not memory. My mistake.

  3. I have now tried it on the WIFI_PS_MAX_MODEM and WIFI_PS_MIN_MODEM and the error still occurs on both. Attempting to run it now on WIFI_PS_NONE. It took about an hour for it to crash on WIFI_PS_NONE. So I would guess it definitely has something to do with the DFS and not the other power save modes.

  4. Is there anything I can do or enable to make sure my external crystal is also performing correctly? And make sure that is not causing any issues?

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 2, 2024

Is there anything I can do or enable to make sure my external crystal is also performing correctly? And make sure that is not causing any issues?

Hi, you can use the esp_clock_output feature to monitor the XTAL32K clock, just add the following code:

#include "esp_clock_output.h"
esp_clock_output_mapping_handle_t clkout_mapping_hdl;
ESP_ERROR_CHECK(esp_clock_output_start(CLKOUT_SIG_RC_SLOW, YOUR_AVAILABLE_IO_NUM, &clkout_mapping_hdl));

Note: Only GPIO_NUM_18 or GPIO_NUM_19 or GPIO_NUM_20 is supported for esp32s3's clock output feature.

@michaelboeding
Copy link

Hey @esp-wzh Seems we are using GPIO_NUM_18-GPIO_NUM_20 so not sure if this is possible. Maybe if we cut some traces on a board but I'll have to look at the schematic. I can also confirm that running with light sleep set to false for the DFS and WIFI_PS_MAX_MODEM didn't cause any crashes at all for the last day.

esp_pm_config_esp32_t pm_config = {
                .max_freq_mhz = CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ,  // Increased from 80
                .min_freq_mhz = 40,   // Increased from 40 to 80 to try and get rid of the tickless idle error thing 20
            #if CONFIG_FREERTOS_USE_TICKLESS_IDLE
                .light_sleep_enable = false // this should be true but makuing it false so we get uart output for now 
            #endif
            };
            ESP_ERROR_CHECK(esp_pm_configure(&pm_config));
            
            esp_wifi_set_ps(WIFI_PS_MAX_MODEM);

So light sleep seems to be the culprit

@michaelboeding
Copy link

Hey @esp-wzh sorry for the delay on this. We did go ahead and measure the output by cutting a trace. It was reading correctly - attached is the image from the scope. I will note that we saw it flicker during running when we didn't call

 ESP_ERROR_CHECK(esp_sleep_pd_config(ESP_PD_DOMAIN_XTAL, ESP_PD_OPTION_ON));

But ran correctly after and we are still seeing the crash (about every 20 minutes for BLE connections) any help would be appreciated.

Screenshot 2024-12-23 at 4 19 37 PM

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 24, 2024

ESP_PD_DOMAIN_XTAL controls the power domain of the main 40MHz XTAl and does not affect the 32kHz XTAL, which is reasonable.

Does the image show the waveform at the time (or after) the crash occurred?

BTW, can you identify the scenario in which the crash occurs? WiFi only, BLE only, WIFi BLE coexist, or is it possible to crash in all three scenarios?

@michaelboeding
Copy link

michaelboeding commented Dec 24, 2024

Ok so I will remove ESP_PD_DOMAIN_XTAL this was just during the normal operation - we measured for about 10 minutes and it didn't seem to happen. I can try to set it up on a longer timeframe with a video attached maybe? If you think that would be useful. @esp-wzh

Will work on identifying this tonight. Currently it's definitely happening in BLE only (no wifi enabled) and I have seen it in WI-Fi connected and BLE advertising but not connected.

We are not sure but we think the esp32-s3 is stopping the clock intermittently when the crash occurs.

@michaelboeding
Copy link

Hey @esp-wzh can confirm it happens in all scenarios. Different intervals it seems but BLE, WIFI, and BLE & WIFI cause it.

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 24, 2024

It is difficult to debug without a reproducible environment... Can you report the issue here and provide a reproducible environment?

@michaelboeding
Copy link

Yes I can do this but just as a question is this not the fix to this problem in the freeRTOS? FreeRTOS/FreeRTOS-Kernel@990643e Seems like this might be the fix but its on the v11? And just from a quick check the esp idf is running on v10? Or is that incorrect @esp-wzh

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 25, 2024

I think this fix will not fix this problem because the assert: configASSERT( ( xTickCount + xTicksToJump ) <= xNextTaskUnblockTime ); that caused the problem still exists

@michaelboeding
Copy link

Hey @esp-wzh you are correct I tried messing with a couple items and testing and the error still occurs. I have also filed a software bug report as you suggested. But in the meantime I am even open to any ideas and willing to test whatever is needed with my setup....we are planning to launch this product at the end of January and without the DFS with light sleep enabled we aren't able to hit our battery hour requirements.

@michaelboeding
Copy link

Also since its so hard to get debug output from the controller over UART (since light sleep disables the output to the console) I did write code to save the core dump and upload it to a server and was able to get the following core dump of the actual crash for my application....if this is helpful. @esp-wzh

===============================================================
==================== ESP32 CORE DUMP START ====================
The ROM ELF file won't load automatically since it was not found for the provided chip type.

Crashed task handle: 0x3fcb9548, name: 'IDLE', GDB name: 'process 1070306632'
Crashed task is not in the interrupt context
Panic reason: assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x40377717
epc2           0x0
epc3           0x42008ff1
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x60b20
eps4           0x0
eps5           0x0
eps6           0x0
pc             0x403760a5          0x403760a5 <panic_abort+21>
lbeg           0x40056f08          1074097928
lend           0x40056f12          1074097938
lcount         0x0                 0
sar            0x10                16
ps             0x60723             395043
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x80385cfc          -2143789828
a1             0x3fcb9220          1070305824
a2             0x3fcb926b          1070305899
a3             0x2                 2
a4             0xa                 10
a5             0x1                 1
a6             0xc43740b           205747211
a7             0x0                 0
a8             0x0                 0
a9             0x1                 1
a10            0x3fcb9333          1070306099
a11            0x3fcb9333          1070306099
a12            0x3fcb9334          1070306100
a13            0x28                40
a14            0xa0                160
a15            0x3fca5830          1070225456

==================== CURRENT THREAD STACK =====================
#0  0x403760a5 in panic_abort (details=0x3fcb926b "assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)") at /Users/michaelboeding/esp/esp-idf/components/esp_system/panic.c:463
#1  0x40385cfc in esp_system_abort (details=0x3fcb926b "assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)") at /Users/michaelboeding/esp/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2  0x4038f6d0 in __assert_func (file=0x3c1abbec "sleep_modes.c", line=<optimized out>, func=<optimized out>, expr=0x3c1abcec "refs >= 0") at /Users/michaelboeding/esp/esp-idf/components/newlib/assert.c:80
#3  0x4217ac4e in esp_sleep_pd_config (domain=<optimized out>, option=ESP_PD_OPTION_ON) at /Users/michaelboeding/esp/esp-idf/components/esp_hw_support/sleep_modes.c:2118
#4  0x4037c63b in vApplicationSleep (xExpectedIdleTime=175) at /Users/michaelboeding/esp/esp-idf/components/esp_pm/pm_impl.c:809
#5  0x40388dbd in prvIdleTask (pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4379
#6  0x40387028 in vPortTaskWrapper (pxCode=0x40388d74 <prvIdleTask>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

======================== THREADS INFO =========================
  Id   Target Id          Frame 
* 1    process 1070306632 0x403760a5 in panic_abort (details=0x3fcb926b "assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)") at /Users/michaelboeding/esp/esp-idf/components/esp_system/panic.c:463
  2    process 1070351484 0x40386ad2 in xQueueReceive (xQueue=0x3fcc3598, pvBuffer=0x3fcc42e0, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  3    process 1070368088 vTaskDelay (xTicksToDelay=500) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1592
  4    process 1070368824 xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
  5    process 1070372820 xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
  6    process 1070377056 xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
  7    process 1070304744 vTaskDelay (xTicksToDelay=60000) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1592
  8    process 1070361368 0x40386ad2 in xQueueReceive (xQueue=0x3fcc4654, pvBuffer=0x3fcc6950, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  9    process 1070311080 0x403879ae in prvProcessTimerOrBlockTask (xNextExpireTime=0, xListWasEmpty=1) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  10   process 1070374596 0x40386ad2 in xQueueReceive (xQueue=0x3fcc9e14, pvBuffer=0x3c240b20, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  11   process 1070270864 0x40386ad2 in xQueueReceive (xQueue=0x3fcca39c, pvBuffer=0x3fcb07b0 <connSwitchTaskStack+7712>, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  12   process 1070438704 0x40386ad2 in xQueueReceive (xQueue=0x3fcd6390, pvBuffer=0x3fcd9750, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  13   process 1070262188 0x40386ad2 in xQueueReceive (xQueue=0x3fcc70c4, pvBuffer=0x3fcae5c8 <xStack+11804>, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  14   process 1070416812 0x40386c45 in xQueueSemaphoreTake (xQueue=0x3fcd32d8, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
  15   process 1070294808 0x400559e0 in ?? ()
  16   process 1070371328 0x40386ad2 in xQueueReceive (xQueue=0x3fcc913c, pvBuffer=0x3fccd980, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3fcb9548             IDLE      0/0          672/860
0x3fcc447c              tiT    18/18         736/2840
0x3fcc8558   ledControlTask      4/4         608/5380
0x3fcc8838  batteryMonitorT      1/1         656/1376
0x3fcc97d4  activity_timer_      9/9          640/368
0x3fcca860  comm_timer_task      9/9          640/372
0x3fcb8de8             main      1/1         736/7964
0x3fcc6b18          sys_evt    20/20         704/7996
0x3fcba6a8          Tmr Svc      1/1         640/3452
0x3fcc9ec4    location_task    15/15        688/15684
0x3fcb0990  conn_switch_tas      5/5         720/7456
0x3fcd9930      nimble_host    21/21        720/12156
0x3fcae7ac  trackerManagerT    20/20        992/11284
0x3fcd43ac     btController    23/23         688/3400
0x3fcb6718        esp_timer    22/22         656/3436
0x3fcc9200   acc_event_Task      1/1         656/7524

==================== THREAD 1 (TCB: 0x3fcb9548, name: 'IDLE') =====================
#0  0x403760a5 in panic_abort (details=0x3fcb926b "assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)") at /Users/michaelboeding/esp/esp-idf/components/esp_system/panic.c:463
#1  0x40385cfc in esp_system_abort (details=0x3fcb926b "assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0)") at /Users/michaelboeding/esp/esp-idf/components/esp_system/port/esp_system_chip.c:92
#2  0x4038f6d0 in __assert_func (file=0x3c1abbec "sleep_modes.c", line=<optimized out>, func=<optimized out>, expr=0x3c1abcec "refs >= 0") at /Users/michaelboeding/esp/esp-idf/components/newlib/assert.c:80
#3  0x4217ac4e in esp_sleep_pd_config (domain=<optimized out>, option=ESP_PD_OPTION_ON) at /Users/michaelboeding/esp/esp-idf/components/esp_hw_support/sleep_modes.c:2118
#4  0x4037c63b in vApplicationSleep (xExpectedIdleTime=175) at /Users/michaelboeding/esp/esp-idf/components/esp_pm/pm_impl.c:809
#5  0x40388dbd in prvIdleTask (pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4379
#6  0x40387028 in vPortTaskWrapper (pxCode=0x40388d74 <prvIdleTask>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 2 (TCB: 0x3fcc447c, name: 'tiT') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcc3598, pvBuffer=0x3fcc42e0, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x420c4da8 in sys_arch_mbox_fetch (mbox=<optimized out>, msg=0x3fcc42e0, timeout=1000) at /Users/michaelboeding/esp/esp-idf/components/lwip/port/freertos/sys_arch.c:317
#2  0x420b585f in tcpip_timeouts_mbox_fetch (mbox=0x3c231b3c <tcpip_mbox>, msg=0x3fcc42e0) at /Users/michaelboeding/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:104
#3  0x420b5921 in tcpip_thread (arg=0x0) at /Users/michaelboeding/esp/esp-idf/components/lwip/lwip/src/api/tcpip.c:142
#4  0x40387028 in vPortTaskWrapper (pxCode=0x420b5900 <tcpip_thread>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 3 (TCB: 0x3fcc8558, name: 'ledControlTask') =====================
#0  vTaskDelay (xTicksToDelay=500) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1592
#1  0x42055511 in ledControlTask (pvParameter=0x3fcc82e4) at ../main/Objects/LED/LED.cpp:82
#2  0x40387028 in vPortTaskWrapper (pxCode=0x42055394 <ledControlTask(void*)>, pvParameters=0x3fcc82e4) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 4 (TCB: 0x3fcc8838, name: 'batteryMonitorT') =====================
#0  xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
#1  0x42034098 in BatteryMonitor::run (this=0x3fcc875c) at ../main/Objects/BatteryMonitor/BatteryMonitor.cpp:208
#2  0x420340b4 in BatteryMonitor::runTask (param=0x3fcc875c) at ../main/Objects/BatteryMonitor/BatteryMonitor.cpp:197
#3  0x40387028 in vPortTaskWrapper (pxCode=0x420340ac <BatteryMonitor::runTask(void*)>, pvParameters=0x3fcc875c) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 5 (TCB: 0x3fcc97d4, name: 'activity_timer_') =====================
#0  xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
#1  0x420566b8 in PeriodicTask::executeTask (this=0x3fcc935c) at ../main/Objects/PeriodicTask/PeriodicTask.cpp:106
#2  0x420566d0 in PeriodicTask::periodicTaskFunction (param=0x3fcc935c) at ../main/Objects/PeriodicTask/PeriodicTask.cpp:94
#3  0x40387028 in vPortTaskWrapper (pxCode=0x420566c8 <PeriodicTask::periodicTaskFunction(void*)>, pvParameters=0x3fcc935c) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 6 (TCB: 0x3fcca860, name: 'comm_timer_task') =====================
#0  xTaskDelayUntil (pxPreviousWakeTime=<optimized out>, xTimeIncrement=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1548
#1  0x420566b8 in PeriodicTask::executeTask (this=0x3fcca424) at ../main/Objects/PeriodicTask/PeriodicTask.cpp:106
#2  0x420566d0 in PeriodicTask::periodicTaskFunction (param=0x3fcca424) at ../main/Objects/PeriodicTask/PeriodicTask.cpp:94
#3  0x40387028 in vPortTaskWrapper (pxCode=0x420566c8 <PeriodicTask::periodicTaskFunction(void*)>, pvParameters=0x3fcca424) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 7 (TCB: 0x3fcb8de8, name: 'main') =====================
#0  vTaskDelay (xTicksToDelay=60000) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:1592
#1  0x42010778 in app_main () at ../main/main.cpp:352
#2  0x4217c9ca in main_task (args=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/app_startup.c:208
#3  0x40387028 in vPortTaskWrapper (pxCode=0x4217c91c <main_task>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 8 (TCB: 0x3fcc6b18, name: 'sys_evt') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcc4654, pvBuffer=0x3fcc6950, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x42177988 in esp_event_loop_run (event_loop=0x3fcc4638, ticks_to_run=4294967295) at /Users/michaelboeding/esp/esp-idf/components/esp_event/esp_event.c:560
#2  0x421779a0 in esp_event_loop_run_task (args=0x3fcc4638) at /Users/michaelboeding/esp/esp-idf/components/esp_event/esp_event.c:105
#3  0x40387028 in vPortTaskWrapper (pxCode=0x42177994 <esp_event_loop_run_task>, pvParameters=0x3fcc4638) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 9 (TCB: 0x3fcba6a8, name: 'Tmr Svc') =====================
#0  0x403879ae in prvProcessTimerOrBlockTask (xNextExpireTime=0, xListWasEmpty=1) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x40387aef in prvTimerTask (pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/timers.c:685
#2  0x40387028 in vPortTaskWrapper (pxCode=0x40387ae0 <prvTimerTask>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 10 (TCB: 0x3fcc9ec4, name: 'location_task') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcc9e14, pvBuffer=0x3c240b20, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x42035cf2 in LocationTracker::run (this=0x3fcc9930) at ../main/Objects/LocationTracker/LocationTracker.cpp:116
#2  0x42035e70 in LocationTracker::runTask (param=0x3fcc9930) at ../main/Objects/LocationTracker/LocationTracker.cpp:108
#3  0x40387028 in vPortTaskWrapper (pxCode=0x42035e68 <LocationTracker::runTask(void*)>, pvParameters=0x3fcc9930) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 11 (TCB: 0x3fcb0990, name: 'conn_switch_tas') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcca39c, pvBuffer=0x3fcb07b0 <connSwitchTaskStack+7712>, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x4202cd64 in CommunicationManager::run (this=0x3fcca0e0) at ../main/Objects/Communication/CommunicationManager.cpp:103
#2  0x4202d2d8 in CommunicationManager::runTask (this=0x3fcca0e0) at ../main/Objects/Communication/CommunicationManager.cpp:96
#3  0x4202d2e0 in CommunicationManager::runTaskWrapper (param=0x3fcca0e0) at ../main/Objects/Communication/CommunicationManager.cpp:91
#4  0x40387028 in vPortTaskWrapper (pxCode=0x4202d2d8 <CommunicationManager::runTaskWrapper(void*)>, pvParameters=0x3fcca0e0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 12 (TCB: 0x3fcd9930, name: 'nimble_host') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcd6390, pvBuffer=0x3fcd9750, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x42093bb4 in npl_freertos_eventq_get (evq=<optimized out>, tmo=4294967295) at /Users/michaelboeding/esp/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/src/npl_os_freertos.c:282
#2  0x40377cd6 in ble_npl_eventq_get (tmo=<optimized out>, evq=0x3c230be4 <g_eventq_dflt>) at /Users/michaelboeding/esp/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:167
#3  nimble_port_run () at /Users/michaelboeding/esp/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/nimble_port.c:304
#4  0x42079e51 in NimBLEDevice::host_task (param=0x0) at ../main/components/esp-nimble-cpp/src/NimBLEDevice.cpp:827
#5  0x40387028 in vPortTaskWrapper (pxCode=0x42079e48 <NimBLEDevice::host_task(void*)>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 13 (TCB: 0x3fcae7ac, name: 'trackerManagerT') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcc70c4, pvBuffer=0x3fcae5c8 <xStack+11804>, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x42026910 in StandardTracker::run (this=0x3fcc6c74) at ../main/Objects/StandardTracker.cpp:181
#2  0x42027694 in StandardTracker::runTask (this=0x3fcc6c74) at ../main/Objects/StandardTracker.cpp:175
#3  0x4202769c in StandardTracker::runTaskWrapper (param=0x3fcc6c74) at ../main/Objects/StandardTracker.cpp:170
#4  0x40387028 in vPortTaskWrapper (pxCode=0x42027694 <StandardTracker::runTaskWrapper(void*)>, pvParameters=0x3fcc6c74) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 14 (TCB: 0x3fcd43ac, name: 'btController') =====================
#0  0x40386c45 in xQueueSemaphoreTake (xQueue=0x3fcd32d8, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x42085bb0 in semphr_take_wrapper (semphr=0x3fcd32c8, block_time_ms=4294967295) at /Users/michaelboeding/esp/esp-idf/components/bt/controller/esp32c3/bt.c:589
#2  0x40378150 in btdm_controller_task ()
#3  0x40387028 in vPortTaskWrapper (pxCode=0x40377fe8 <btdm_controller_task>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 15 (TCB: 0x3fcb6718, name: 'esp_timer') =====================
#0  0x400559e0 in ?? ()
#1  0x4038724a in vPortClearInterruptMaskFromISR (prev_level=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:560
#2  vPortExitCritical (mux=0x3fca4fa0 <xKernelLock>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:505
#3  0x40389a9c in ulTaskGenericNotifyTake (uxIndexToWait=0, xClearCountOnExit=1, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:5756
#4  0x420a691b in timer_task (arg=0x0) at /Users/michaelboeding/esp/esp-idf/components/esp_timer/src/esp_timer.c:480
#5  0x40387028 in vPortTaskWrapper (pxCode=0x420a690c <timer_task>, pvParameters=0x0) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

==================== THREAD 16 (TCB: 0x3fcc9200, name: 'acc_event_Task') =====================
#0  0x40386ad2 in xQueueReceive (xQueue=0x3fcc913c, pvBuffer=0x3fccd980, xTicksToWait=<optimized out>) at /Users/michaelboeding/esp/esp-idf/components/xtensa/include/xt_utils.h:41
#1  0x4203c3a5 in eventHandlerTask (pvParameters=0x3fcc9004) at ../main/Objects/Accelerometer/Accelerometer.cpp:64
#2  0x40387028 in vPortTaskWrapper (pxCode=0x4203c390 <eventHandlerTask(void*)>, pvParameters=0x3fcc9004) at /Users/michaelboeding/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134


======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x600fe000 0x200 R XA
.rtc.force_fast 0x600fe200 0x8 RW A
.rtc.data 0x50001000 0x20 RW A
.rtc_noinit 0x50001020 0x0 RW  
.rtc.force_slow 0x50001020 0x0 RW  
.iram0.vectors 0x40374000 0x403 R XA
.iram0.text 0x40374404 0x20abf R XA
.dram0.data 0x3fca4f00 0x53ac RW A
.flash.text 0x42000020 0x17dc9a R XA
.flash.appdesc 0x3c180020 0x100 R  A
.flash.rodata 0x3c180120 0xa75b8 RW A
.iram0.data 0x40394f00 0x0 RW  
.iram0.bss 0x40394f00 0x0 RW  
.dram0.heap_start 0x3fcb44c8 0x0 RW  
.coredump.tasks.data 0x3fcb9548 0x158 RW 
.coredump.tasks.data 0x3fcb9160 0x3e0 RW 
.coredump.tasks.data 0x3fcc447c 0x158 RW 
.coredump.tasks.data 0x3fcc4190 0x2e0 RW 
.coredump.tasks.data 0x3fcc8558 0x158 RW 
.coredump.tasks.data 0x3c23c110 0x260 RW 
.coredump.tasks.data 0x3fcc8838 0x158 RW 
.coredump.tasks.data 0x3c23c950 0x290 RW 
.coredump.tasks.data 0x3fcc97d4 0x158 RW 
.coredump.tasks.data 0x3fcc9540 0x280 RW 
.coredump.tasks.data 0x3fcca860 0x158 RW 
.coredump.tasks.data 0x3fcca5d0 0x280 RW 
.coredump.tasks.data 0x3fcb8de8 0x158 RW 
.coredump.tasks.data 0x3fcb8b00 0x2e0 RW 
.coredump.tasks.data 0x3fcc6b18 0x158 RW 
.coredump.tasks.data 0x3fcc6850 0x2c0 RW 
.coredump.tasks.data 0x3fcba6a8 0x158 RW 
.coredump.tasks.data 0x3fcba420 0x280 RW 
.coredump.tasks.data 0x3fcc9ec4 0x158 RW 
.coredump.tasks.data 0x3c240a20 0x2b0 RW 
.coredump.tasks.data 0x3fcb0990 0x158 RW 
.coredump.tasks.data 0x3fcb06b0 0x2d0 RW 
.coredump.tasks.data 0x3fcd9930 0x158 RW 
.coredump.tasks.data 0x3fcd9650 0x2d0 RW 
.coredump.tasks.data 0x3fcae7ac 0x158 RW 
.coredump.tasks.data 0x3fcae3c0 0x3e0 RW 
.coredump.tasks.data 0x3fcd43ac 0x158 RW 
.coredump.tasks.data 0x3fcd40f0 0x2b0 RW 
.coredump.tasks.data 0x3fcb6718 0x158 RW 
.coredump.tasks.data 0x3fcb6480 0x290 RW 
.coredump.tasks.data 0x3fcc9200 0x158 RW 
.coredump.tasks.data 0x3fccd880 0x290 RW 

===================== ESP32 CORE DUMP END =====================
===============================================================

@esp-wzh
Copy link
Collaborator

esp-wzh commented Dec 26, 2024

Found bug, but I'm not sure if it's related with the assert failed in vTaskStepTick.
Note the abort in Panic reason: assert failed: esp_sleep_pd_config sleep_modes.c:2118 (refs >= 0), which is called from vApplicationSleep, if CONFIG_PM_TRACE is enabled, it will call esp_sleep_pd_config to force RTC_PERIPH keep power on in sleep, in the implementation of esp_sleep_pd_config, it will increase the ref_cnt of RTC_PERIPH, which is a variable of int16_t, as the auto lightsleep continues, this value will eventually overflow and trigger this positive assertion.

So can you try disabling option CONFIG_PM_TRACE? @michaelboeding

@michaelboeding
Copy link

michaelboeding commented Dec 26, 2024

Hey @esp-wzh this is great thanks for the fast response! I have disabled that and I am now running a test overnight. Hopefully there won't be any crashes once I wake up. Will report back.

Update - seems the crash happens at a longer interval now....still testing but maybe every 3 hours. I did get another crash after the 3 hour crash so I didn't get the correct core dump. So I have updated the code to auto send core dumps to the server now instead of requesting them and will run another test tonight and report back. @esp-wzh

@michaelboeding
Copy link

It seems like this crash didn't occur in WIFI or BLE so far - I have ran them both overnight. I'm going to continue to test and have also reduced the max frequency to 80 from 160 to see If that causes the crash. But fingers crossed this has been fixed! @esp-wzh. Will report back with an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: In Progress Work is in progress Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests