Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak - Out of Memory Exception #644

Closed
Argafal opened this issue Feb 4, 2023 · 8 comments
Closed

Memory leak - Out of Memory Exception #644

Argafal opened this issue Feb 4, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@Argafal
Copy link
Contributor

Argafal commented Feb 4, 2023

Exception recorded in 0.5.78:

Unhandled C++ exception: OOM

last failed alloc caller: 0x40223f2e

Decoding stack results
0x40242de8: tcp_input at core/tcp_in.c line 943
0x40248091: ip4_input at core/ipv4/ip4.c line 1467
0x4023fb9d: mem_malloc at core/mem.c line 210
0x4023f261: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x4023f070: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x40267bb9: ethernet_input at glue-esp/lwip-esp.c line 365
0x40267bcb: ethernet_input at glue-esp/lwip-esp.c line 373
0x402392be: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 720
0x40239282: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 708
0x4023aa76: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x40239569: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023ad4c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x40235861: __cvt at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 102
0x4023ad4c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023ac88: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x402363a1: _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 194
0x40235da5: _printf_float at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 330
0x40236400: _printf_i at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 209
0x4023ac88: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x4023b188: _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 658
0x4023ec5d: glue2esp_linkoutput at glue-esp/lwip-esp.c line 301
0x4023ee8b: new_linkoutput at glue-lwip/lwip-git.c line 272
0x4023f2ee: ethernet_output at netif/ethernet.c line 312
0x4024686c: etharp_output_to_arp_index at core/ipv4/etharp.c line 769
0x40246940: etharp_output_LWIP2 at core/ipv4/etharp.c line 885
0x402482b0: ip4_output_if_opt_src at core/ipv4/ip4.c line 1764
0x4023fb9d: mem_malloc at core/mem.c line 210
0x4024025e: pbuf_alloc_LWIP2 at core/pbuf.c line 284
@Argafal Argafal changed the title Memory leak in 0.5.78 Memory leak in 0.5.78? Feb 4, 2023
@stefan123t stefan123t added the bug Something isn't working label Feb 8, 2023
@Argafal Argafal changed the title Memory leak in 0.5.78? Memory leak - Out of Memory Exception Feb 14, 2023
@Argafal
Copy link
Contributor Author

Argafal commented Feb 14, 2023

OOM within two minutes upon boot. Running 0.5.88 on ESP8266.

4:54:14.757 > I: (#0) Requesting Inv SN 1141XXX9
14:54:14.763 > I: (#0) prepareDevInformCmd
14:54:14.763 > I: TX 27B Ch3 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 EB 92 86 00 00 00 00 00 00 00 00 78 BA C5
14:54:14.888 > I: RX 27B Ch61 | 95 72 22 17 79 72 22 17 79 01 00 01 01 65 00 27 00 8B 01 5E 00 26 00 84 00 04 A4
14:54:14.896 > I: RX 27B Ch23 | 95 72 22 17 79 72 22 17 79 02 B2 B3 00 04 93 E1 02 07 01 63 09 43 13 85 01 03 59
14:54:14.905 > I: RX 23B Ch23 | 95 72 22 17 79 72 22 17 79 83 00 00 00 0B 03 E8 00 C6 00 34 10 E2 F6
14:54:14.913 > I: procPyld: cmd:  0xb
14:54:14.913 > I: procPyld: txid: 0x95
14:54:14.916 > I: Payload (42): 00 01 01 65 00 27 00 8B 01 5E 00 26 00 84 00 04 B2 B3 00 04 93 E1 02 07 01 63 09 43 13 85 01 03 00 00 00 0B 03 E8 00 C6 00 34
14:54:14.930 > I: alarm ID incremented to 52
14:54:14.932 > I: (#0) enqueuedCmd: 0x11
14:54:20.180 >
14:54:20.181 > User exception (panic/abort/assert)
14:54:20.183 > --------------- CUT HERE FOR EXCEPTION DECODER ---------------
14:54:20.189 >
14:54:20.189 > Unhandled C++ exception: OOM
14:54:20.193 >
14:54:20.193 > >>>stack>>>
14:54:20.193 >

Decoding stack results
0x40244395: tcp_input at core/tcp_in.c line 501
0x402492f9: ip4_input at core/ipv4/ip4.c line 1467
0x40240e05: mem_malloc at core/mem.c line 210
0x402404c9: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x402402d8: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x40268e19: ethernet_input at glue-esp/lwip-esp.c line 365
0x40268e2b: ethernet_input at glue-esp/lwip-esp.c line 373
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023b4a5: _Balloc at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 128
0x4023bcde: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023a7d1: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023bcde: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023bfb4: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023a7d1: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023bfb4: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232

lumapu added a commit that referenced this issue Feb 15, 2023
…682

added part of mac address to MQTT client ID to seperate multiple ESPs in same network
added dictionary for MQTT to reduce heap-fragmentation
removed `last Alarm` from Live view, because it showed always the same alarm - will change in future
@Argafal
Copy link
Contributor Author

Argafal commented Feb 16, 2023

With 0.5.89, in the exact moment of pressing refresh on the website.

16:29:41.183 > I: (#1) Requesting Inv SN 1161XXX4
16:29:41.189 > I: (#1) enqueuedCmd: 0xb
16:29:41.189 > I: (#1) prepareDevInformCmdI: TX 27B Ch40 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 EE 4B E6 00 00 00 01 00 00 00 00 F0 EA BC 
16:29:41.370 > I: RX 27B Ch75 | 95 74 40 42 54 74 40 42 54 02 00 04 6A 04 01 8C 01 8E 01 38 00 0E 00 0E 00 2B ED 
16:29:41.378 > I: RX 27B Ch75 | 95 74 40 42 54 74 40 42 54 03 00 2B 00 04 5C AE 00 04 58 49 01 8C 01 8C 09 49 1E 
16:29:41.386 > I: RX 27B Ch61 | 95 74 40 42 54 74 40 42 54 84 13 86 00 A4 00 DC 00 07 02 56 00 8A 00 01 FA 97 49 
16:29:41.395 > W: Frame 1 missing: Request Retransmit
16:29:41.397 > I: TX 11B Ch61 | 15 74 40 42 54 86 99 51 75 81 8D 
16:29:42.639 > I: RX 27B Ch3 | 95 74 40 42 54 74 40 42 54 01 00 01 01 39 00 0E 00 0E 00 2B 00 2B 00 04 68 8B 4A 
16:29:42.647 > I: procPyld: cmd:  0xb
16:29:42.647 > I: procPyld: txid: 0x95
16:29:42.650 > I: Payload (62): 00 01 01 39 00 0E 00 0E 00 2B 00 2B 00 04 68 8B 00 04 6A 04 01 8C 01 8E 01 38 00 0E 00 0E 00 2B 00 2B 00 04 5C AE 00 04 58 49 01 8C 01 8C 09 49 13 86 00 A4 00 DC 00 07 02 56 00 8A 00 01 
16:29:42.711 > 
16:29:42.711 > User exception (panic/abort/assert)
16:29:42.714 > --------------- CUT HERE FOR EXCEPTION DECODER ---------------
16:29:42.719 > 
16:29:42.719 > Unhandled C++ exception: OOM
16:29:42.722 > 
16:29:42.722 > >>>stack>>>


Decoding stack results
0x402486d3: ip4_input at core/ipv4/ip4.c line 1290
0x40248363: ip4_input at core/ipv4/ip4.c line 1240
0x40243d61: tcp_input at core/tcp_in.c line 501
0x402407d1: mem_malloc at core/mem.c line 210
0x40240830: do_memp_malloc_pool at core/memp.c line 255
0x40248cc5: ip4_input at core/ipv4/ip4.c line 1467
0x402407d1: mem_malloc at core/mem.c line 210
0x4023fe95: ethernet_input_LWIP2 at netif/ethernet.c line 188
0x4023fca4: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 118
0x402687e1: ethernet_input at glue-esp/lwip-esp.c line 365
0x402687f3: ethernet_input at glue-esp/lwip-esp.c line 373
0x40239718: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 352
0x4023b71a: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023a20d: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023b71a: __d2b at /workdir/repo/newlib/newlib/libc/stdlib/mprec.c line 779
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023a20d: _dtoa_r at /workdir/repo/newlib/newlib/libc/stdlib/dtoa.c line 853
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x40236581: __cvt at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 102
0x4023b9f0: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 232
0x4023b92c: __ssputs_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 182
0x4023be2c: _svfprintf_r at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 658
0x40236ac5: _printf_float at /workdir/repo/newlib/newlib/libc/stdio/nano-vfprintf_float.c line 330
0x4023f891: glue2esp_linkoutput at glue-esp/lwip-esp.c line 301
0x4023fada: new_linkoutput at glue-lwip/lwip-git.c line 277
0x4023ff22: ethernet_output at netif/ethernet.c line 312
0x402474a0: etharp_output_to_arp_index at core/ipv4/etharp.c line 769
0x40247574: etharp_output_LWIP2 at core/ipv4/etharp.c line 885
0x40248ee4: ip4_output_if_opt_src at core/ipv4/ip4.c line 1764
0x40240830: do_memp_malloc_pool at core/memp.c line 255
0x40248f4c: ip4_output_if_opt at core/ipv4/ip4.c line 1572

@lumapu
Copy link
Owner

lumapu commented Feb 16, 2023

das sind alles libraries außerhalb von Ahoy. Ich kenne das nicht. Da du hier scheinbar alleine mit den Fehler bist, könnte ich mir einen Hardwaredefekt vorstellen.
Gibt es eine Art self check, oder könntest du mal etwas anderes drauf spielen?

@humus2002
Copy link

I have reset a WEMOS D1 mini (ESP8266) completey (by installing a blank.bin) and than updated to 0.5.89 although access by 192.168.4.1 is possible, I could not save successfully the configuration data (especially WiFi-settings). WiFi-Scan was not stable although the WiFi is close, some elements on the configuration pages were missing (like the additional DNS settings, PINOUT-settings....) multiple tries to save the WiFi-setting, but the ahoy-dtu does not connect to my WiFi (normal Fritz 7590)

I downgraded now to 0.5.17 and everything is fine again...

so I am convinced that there still is a kind of memory leak bug or similar in the 0.5.89 version..

@Argafal
Copy link
Contributor Author

Argafal commented Feb 17, 2023

I have reset a WEMOS D1 mini (ESP8266) completey (by installing a blank.bin) and than updated to 0.5.89 although access by 192.168.4.1 is possible, I could not save successfully the configuration data (especially WiFi-settings). WiFi-Scan was not stable although the WiFi is close, some elements on the configuration pages were missing (like the additional DNS settings, PINOUT-settings....) multiple tries to save the WiFi-setting, but the ahoy-dtu does not connect to my WiFi (normal Fritz 7590)

I downgraded now to 0.5.17 and everything is fine again...

so I am convinced that there still is a kind of memory leak bug or similar in the 0.5.89 version..

I think this description deserves its own issue. It might be a separate problem, or even a number of problems as many things are mentioned at once. However, I don't see a connection to the OOM I documented with a stack trace above.

@humus2002 Would you please make this a separate new issue? Could you also provide details in that new issue of what you flashed and how you flashed it, and what the exact symptoms of "could not save successfully" were? Let's continue here with the OOM stack trace documented above, okay? Thanks :)

@lumapu Fair enough. Let me dig a little bit more into it and maybe swap out the ESP8266. I just thought it was worth documenting it nevertheless, in case someone else sees the same stack trace on their end.

@lumapu
Copy link
Owner

lumapu commented Mar 8, 2023

@Argafal do you see these OOM exceptions with the latest dev versions, starting from 0.5.93?

@Argafal
Copy link
Contributor Author

Argafal commented Mar 9, 2023

When I opened this issue the OOMs were frequent and came soon after boot up, i.e. they made AHOY hard to use productively.

  • I've seen one solitary OOM on 0.5.93, otherwise it's been running stable.
  • 0.5.95 has been running nicely for 20 hours now. I've also not seen issue Website breaks while ahoy keeps running #660 yet.
  • I will update to 0.5.96 next and observe it for two days.
  • If it works like this, in my opinion this issue could be closed.

@lumapu
Copy link
Owner

lumapu commented May 25, 2023

I think it works fine now

@lumapu lumapu closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants