Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception 0 in LWIP2 on DHCP renew with WIFI_OFF #5667

Closed
1 task
dalbert2 opened this issue Jan 25, 2019 · 9 comments · Fixed by #5703 or #7114
Closed
1 task

Exception 0 in LWIP2 on DHCP renew with WIFI_OFF #5667

dalbert2 opened this issue Jan 25, 2019 · 9 comments · Fixed by #5703 or #7114

Comments

@dalbert2
Copy link
Contributor

dalbert2 commented Jan 25, 2019

Basic Infos

  • [X ] This issue complies with the issue POLICY doc.
  • [X ] I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • [X ] I have searched the issue tracker for a similar issue.
  • [X ] If there is a stack dump, I have decoded it.
  • [X ] I have filled out all fields below.

Platform

  • Hardware: [ESP-12]
  • Core Version: [Core:2_4_2 SDK:2.2.1(cfd48f3) LWIP:2.0.3]
  • Development Env: [Platformio]
  • Operating System: [Windows]

Settings in IDE

  • Module: [Generic ESP8266 Module]
  • Flash Mode: [qio]
  • Flash Size: [4MB]
  • lwip Variant: [v2 Lower Memory]
  • Reset Method: [nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [160MHz]
  • Upload Using: [SERIAL]
  • Upload Speed: [115200] (serial upload only)

Problem Description

Exception 0: interaction between application context printf and interrupt context printf

Detailed problem description goes here.

I'm experiencing periodic Exception 0s and the stack trace shows it is happening during vsnprintf in the LWIP stack when it's handling a DHCP timeout/renew (probably when WiFi is off). The exception address is always part of a printf format string from the application context. The particular address/string isn't the same in each exception, but the stack trace always ends at _vsnprintf_r and the exception address is always part of an application Serial.printf_P format string. In the example provided below, the exception address is 0x2C75253D which is ASCII "=%u," which only exists at one place in the code (see instruction below).

I first guessed that this might be a case of the app context stack overflowing into the ISR stack and tried increasing CONT_STACKSIZE to 6K but the exception still takes place. Moreover, the stack dump isn't anywhere near 4K (it's 768 bytes deep in the stack dump below)...so it must be something else (how big is the interrupt stack?)

All of the application code is running in the application context (no application ISRs). Any ideas would be much appreciated.

MCVE Sketch

Serial.printf_P(PSTR("TIMESLOT: %u (ts=%u, nbr=%u, ch=%u, mode=%u)"),
								time, slot, neighbors[slot].id, new_channel, mode);

Debug Messages

>>>stack>>>
3ffffca0:  4021a438 00000030 00000008 ffffffff
3ffffcb0:  4021a48c ffffffff 0000003f 3ffe983c
3ffffcc0:  3ffffed0 3ffffea0 00000008 3ffe983c
3ffffcd0:  00000040 3ffe983c 00000001 00000040
3ffffce0:  3ffe8304 3fff1cd0 3fff0828 3ffe983c
3ffffcf0:  00000000 3ffe8304 3ffffda0 4021c8b8
3ffffd00:  3fff3d32 0000002a 0000001f 00000000
3ffffd10:  00000032 0000001c 00000060 3fff3b55
3ffffd20:  3fff3b20 3fff3b20 3fff3cfc 4021e496
3ffffd30:  4021e82b 3fff3cfc 3fff3d32 4021e834
3ffffd40:  00000608 3ffeafdf 00000067 00000003
3ffffd50:  00000000 00000000 00000000 3fff3b55
3ffffd60:  3fff3d40 00000000 3fff3cfc 4022487f
3ffffd70:  3fff3b20 3fff3b55 3ffeafdf 3ffeafd9
3ffffd80:  00000001 0000003d 0000003d 3fff0828
3ffffd90:  3fff1cd0 3ffe8304 00000040 3fff39bc
3ffffda0:  3fff3b20 00000000 3fff7644 40224a6a
3ffffdb0:  3fff4278 00000001 3ffef540 3fff7694
3ffffdc0:  3fff3b20 3fff4278 3fff7644 40224c1d
3ffffdd0:  00000000 00000000 00000000 00000000
3ffffde0:  40103462 00000021 7fffffff 3fff7694
3ffffdf0:  3fff7644 3fff4278 3fff3b20 40224de8
3ffffe00:  40230000 3ffffec0 3ffefff8 00000016
3ffffe10:  00000014 3fff4278 3fff7644 4022595c
3ffffe20:  3ffffed0 3ffffea0 00000008 00000016
3ffffe30:  3fff3b24 000000ff 00000000 00000011
3ffffe40:  3fff3ab8 4010325e 00040000 402137f4
3ffffe50:  00000011 00000001 64657463 3fff7644
3ffffe60:  3fff4278 3fff76a8 3fff429c 402259ea
3ffffe70:  3fff3b20 00000000 00000000 4021e91f
3ffffe80:  4000050c 00000000 3fff4254 402223ef
3ffffe90:  3fff3b20 00000030 0000001d 40221a5a
3ffffea0:  3fff7644 3fff3b20 3fff3b24 00000043
3ffffeb0:  00000000 00000000 00000000 fffffffb
3ffffec0:  3fff3b20 00000003 3fff4254 fffffffb
3ffffed0:  00000000 3fff3b20 3fff4254 4022245c
3ffffee0:  00000000 3fff3b20 3fff4254 40224293
3ffffef0:  00000000 3fff3b20 3fff4254 402245be
3fffff00:  4021faa4 3fff3774 3fff3c44 401006dc
3fffff10:  0ffabdd3 3ffeb1d4 3ffeb070 3fff3770
3fffff20:  4021faa4 3fff3774 3ffeb058 4021fab2
3fffff30:  35fc1297 3fff3774 00000017 4021fb68
3fffff40:  3ffeb058 3ffeb1d4 10dec254 00000000
3fffff50:  40251ebc 3fff0658 3fff0680 60000600
3fffff60:  35fc1297 3fff0680 3fff0658 40251ec9
3fffff70:  40251f0e 3fffdab0 00000000 3fffdcb0
3fffff80:  3fff0698 3fffdad0 3fff35bc 402150c7
3fffff90:  40000f49 40000f49 3fffdab0 40000f49
3fffffa0:  40000e19 40001878 00000002 00000000
<<<stack<<<

Decoding 37 results
0x4021a438: _vsnprintf_r at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c line 72
0x4021a48c: vsnprintf at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/vsnprintf.c line 42
0x4021c8b8: _svfprintf_r at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c line 531
0x4021e496: new_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 259
0x4021e82b: ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 305
0x4021e834: ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 305
0x4022487f: etharp_raw at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 626
0x40224a6a: etharp_request at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1203
0x40224c1d: etharp_query at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 981
0x40103462: lmacProcessTxSuccess at ?? line ?
0x40224de8: etharp_output_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 887
0x40230000: scan_parse_beacon at ?? line ?
0x4022595c: ip4_output_if_opt_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 962
0x4010325e: lmacProcessTXStartData at ?? line ?
0x402137f4: Print::printf(char const*, ...) at C:\Users\david\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/Print.cpp line 220
0x402259ea: ip4_output_if_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 801
0x4021e91f: ip_chksum_pseudo at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x402223ef: udp_sendto_if_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 463
0x40221a5a: pbuf_alloc_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 1306
0x4022245c: udp_sendto_if at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 463
0x40224293: dhcp_renew at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 1157
0x402245be: dhcp_t1_timeout at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 545
:  (inlined by) dhcp_coarse_tmr at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 433
0x4021faa4: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165
0x401006dc: free at C:\Users\david\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1755
0x4021faa4: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165
0x4021fab2: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 171
0x4021fb68: sys_check_timeouts at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 351
0x40251ebc: ets_timer_handler_isr at ?? line ?
0x40251ec9: ets_timer_handler_isr at ?? line ?
0x40251f0e: ets_timer_handler_isr at ?? line ?
0x402150c7: loop_task at C:\Users\david\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 133

@dalbert2
Copy link
Contributor Author

Please note: this application periodically turns the wifi modem off for several minutes and then on again for several minutes; it uses the WiFi.mode() method to do so:

WiFi.mode(WIFI_OFF);
// do stuff and then later:
WiFi.mode(WIFI_STA);

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 25, 2019

This is unclear to me yet. From 2.4.2:

tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 259

err_t err = glue2git_err(glue2esp_linkoutput(netif->num, p, p->payload, p->len));

This function glue2esp_linkoutput() does not printf() (debug is disabled) then call firmware's netif->linkoutput()

I'll try to WiFi.mode(WIFI_OFF); and let run with short dhcp leases and see if I can get such exception too.

@d-a-v d-a-v self-assigned this Jan 25, 2019
@dalbert2
Copy link
Contributor Author

Thank you @d-a-v !

Overnight, the code took another exception along the same code path but at a different place (still in the system context). Do you know where the size of the system stack is defined? Also, do you know if we're supposed to stop the DHCP client before turning off WiFi since it looks like DHCP continues to fire timeouts when WiFi is disabled?

Exception (0):
epc1=0x3d726961 epc2=0x00000000 epc3=0x00000000 excvaddr=0x3fff404c depc=0x00000000

ctx: sys
sp: 3ffffb00 end: 3fffffb0 offset: 01a0

stack>>>
3ffffca0: 0000000c 00000000 00000020 40100f12
3ffffcb0: 3ffeb3cc 4010083f 04000000 40101fba
3ffffcc0: 40103261 00040000 7fffffff 00000000
3ffffcd0: 00000022 3ffe983c 00040000 40101fba
3ffffce0: 3ffef540 40102186 3fffc200 3ffe983c
3ffffcf0: 00000000 3ffe8304 3ffffda0 4021c8b8
3ffffd00: 3fff3c62 0000002a 3fffc200 00000000
3ffffd10: 00000032 0000001c 00000060 3fff3b55
3ffffd20: 3fff3b20 3fff3b20 3fff3c2c 4021e496
3ffffd30: 4021e82b 3fff3c2c 3fff3c62 4021e834
3ffffd40: 00000608 3ffeafdf 00000008 3ffe983c
3ffffd50: 00000040 3ffffe50 00000001 3fff3b55
3ffffd60: 3fff3c70 00000000 3fff3c2c 4022487f
3ffffd70: 3fff3b20 3fff3b55 3ffeafdf 3ffeafd9
3ffffd80: 00000001 3ffef540 3ffeec70 3fff0868
3ffffd90: 3fff1cd0 3ffe8304 00000040 3fff39bc
3ffffda0: 3fff3b20 00000000 3fff3ebc 40224a6a
3ffffdb0: 3fff7a80 00000001 3ffef540 3fff3f0c
3ffffdc0: 3fff3b20 3fff7a80 3fff3ebc 40224c1d
3ffffdd0: 00000000 00000000 00000000 00000022
3ffffde0: 3ffffdf0 00000000 00000020 3fff3f0c
3ffffdf0: 3fff3ebc 3fff7a80 3fff3b20 40224de8
3ffffe00: 00000000 00000020 40100f12 3ffeb3cc
3ffffe10: 00000014 3fff7a80 3fff3ebc 4022595c
3ffffe20: 3ffeb3cc 4010083f 04000000 40100f12
3ffffe30: 3fff3b24 000000ff 00000000 00000011
3ffffe40: 3fff3ab8 3fff404c 00000000 40101fba
3ffffe50: 00000011 00000001 00000000 3fff3ebc
3ffffe60: 3fff7a80 3fff3f20 3fff79e4 402259ea
3ffffe70: 3fff3b20 00000000 00000000 4021e91f
3ffffe80: 4000050c 00000000 3fff7a5c 402223ef
3ffffe90: 3fff3b20 4bc6a7f0 90a3d70a 40221a5a
3ffffea0: 3fff3ebc 3fff3b20 3fff3b24 00000043
3ffffeb0: 00000001 00000000 4bc6a7f0 fffffffb
3ffffec0: 3fff3b20 00000003 3fff7a5c fffffffb
3ffffed0: 00000000 3fff3b20 3fff7a5c 4022245c
3ffffee0: 00000000 3fff3b20 3fff7a5c 40224293
3ffffef0: 00000000 3fff3b20 3fff7a5c 402245be
3fffff00: 4021faa4 3fff3774 3fff79b4 401006dc
3fffff10: 42f4d7fd 3ffeb1d4 3ffeb070 3fff3770
3fffff20: 4021faa4 3fff3774 3ffeb058 4021fab2
3fffff30: e2344173 3fff3774 00000004 4021fb68
3fffff40: 3ffeb058 3ffeb1d4 96b0535e 00000000
3fffff50: 40251f40 3fff0658 3fff0680 60000600
3fffff60: e2344173 3fff0680 3fff0658 40251f4d
3fffff70: 40251f92 3fffdab0 00000000 3fffdcb0
3fffff80: 3fff0690 3fffdad0 3fff35bc 402150c7
3fffff90: 40000f49 40000f49 3fffdab0 40000f49
3fffffa0: 40000e19 40001878 00000004 0000000d
<<<stack<<<
Decoding 41 results
0x40100f12: pp_post at ?? line ?
0x4010083f: pp_soft_wdt_feed_local at ?? line ?
0x40101fba: wDev_MacTim1Arm at ?? line ?
0x40103261: lmacProcessTXStartData at ?? line ?
0x40101fba: wDev_MacTim1Arm at ?? line ?
0x40102186: wDev_ProcessFiq at ?? line ?
0x4021c8b8: _svfprintf_r at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c line 531
0x4021e496: new_linkoutput at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 259
0x4021e82b: ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 305
0x4021e834: ethernet_output at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 305
0x4022487f: etharp_raw at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 626
0x40224a6a: etharp_request at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1203
0x40224c1d: etharp_query at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 981
0x40224de8: etharp_output_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 887
0x40100f12: pp_post at ?? line ?
0x4022595c: ip4_output_if_opt_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 962
0x4010083f: pp_soft_wdt_feed_local at ?? line ?
0x40100f12: pp_post at ?? line ?
0x40101fba: wDev_MacTim1Arm at ?? line ?
0x402259ea: ip4_output_if_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/ip4.c line 801
0x4021e91f: ip_chksum_pseudo at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/inet_chksum.c line 542
0x402223ef: udp_sendto_if_src at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 463
0x40221a5a: pbuf_alloc_LWIP2 at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 1306
0x4022245c: udp_sendto_if at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/udp.c line 463
0x40224293: dhcp_renew at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 1157
0x402245be: dhcp_t1_timeout at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 545
: (inlined by) dhcp_coarse_tmr at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/dhcp.c line 433
0x4021faa4: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165
0x401006dc: free at C:\Users\david.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1755
0x4021faa4: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 165
0x4021fab2: cyclic_timer at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 171
0x4021fb68: sys_check_timeouts at /local/users/gauchard/arduino/arduino_esp8266/origin/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 351
0x40251f40: ets_timer_handler_isr at ?? line ?
0x40251f4d: ets_timer_handler_isr at ?? line ?
0x40251f92: ets_timer_handler_isr at ?? line ?
0x402150c7: loop_task at C:\Users\david.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 133

@dalbert2 dalbert2 changed the title Exception 0 in LWIP _vsnprintf_r Exception 0 in LWIP2 on DHCP renew with WIFI_OFF Jan 25, 2019
@devyte
Copy link
Collaborator

devyte commented Jan 26, 2019

@dalbert2 could you please try to recreate the issue with an MCVE sketch? something like periodically turning the wifi on-off , and if that doesn't reproduce try adding elements from your app one by one.

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Jan 30, 2019
@dalbert2
Copy link
Contributor Author

Stopping DHCPC before turning off WiFi and starting it again after WiFi is turned back on makes the problem go away.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 31, 2019

@dalbert2 Thanks alot! We will fix before 2.5.0 happens

@d-a-v d-a-v added this to the 2.5.0 milestone Jan 31, 2019
@d-a-v d-a-v added type: bug component: network and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels Jan 31, 2019
@dalbert2
Copy link
Contributor Author

Thanks @d-a-v !

The fix looks (obviously) like this:
wifi_station_dhcpc_stop();
WiFi.mode(WIFI_OFF);
// ...do stuff...
bool ok = WiFi.mode(WIFI_STA);
wifi_station_dhcpc_start();

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 17, 2019

Fix from #5703 is not good whatever improvement we tried so far in #6172.

@devyte
Copy link
Collaborator

devyte commented Oct 29, 2019

This could have been a symptom of #6484 or could be related to #6630.
@dalbert2 what is the status with latest git?

Pushing back for now due to no clear solution in sight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment