Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Esp easy update temperature to domoticz generate a reboot #674

Closed
Tixav opened this issue Jan 4, 2018 · 24 comments
Closed

Esp easy update temperature to domoticz generate a reboot #674

Tixav opened this issue Jan 4, 2018 · 24 comments
Labels
Category: Controller Related to interaction with other platforms Type: Bug Considered a bug
Milestone

Comments

@Tixav
Copy link

Tixav commented Jan 4, 2018

I recently update a nodemcu from R147 (which is working for 1 year) to v2.0.0-dev12 then to v2.0.0-dev13 but the result is the same :
I setup a controller (domoticz)
I setup a ds18B20 device (set 12 bits resolution, idx set, ... )
https://www.letscontrolit.com/forum/viewtopic.php?f=6&t=3959
no issue it is working well, but if I checked Send to Controller, then the nodemcu is rebooting randomly , I also check with a weemo D1 mini it is the same !

for the moment, I use this counter measure (by rules), and it is working well without random reboot:
//when temp is updated from device setup page on temp#Temperature do SendToHTTP 192.168.1.4,8080,/json.htm?param=udevice&type=command&idx=510&nvalue=0&svalue=[temp#Temperature] endon

@TD-er
Copy link
Member

TD-er commented Jan 4, 2018

What controller are you using? e.g. http/MQTT?
How is the controller set up?
What values do the 1wire device give? (e.g. NaN)

Also screenshots of the device tab and/or controller would be nice.

@Tixav
Copy link
Author

Tixav commented Jan 4, 2018

What controller are you using? e.g. http/MQTT?: I setup Domoticz Http controller
How is the controller set up? see the printscreen (I update the topic) @ https://www.letscontrolit.com/forum/viewtopic.php?f=6&t=3959
What values do the 1wire device give? (e.g. NaN) : Correct values until reboot, the temperature update to the controller might be the issue
Also screenshots of the device tab and/or controller would be nice. : https://www.letscontrolit.com/forum/viewtopic.php?f=6&t=3959

@Grovkillen Grovkillen added Type: Bug Considered a bug Category: Controller Related to interaction with other platforms labels Jan 4, 2018
@Grovkillen Grovkillen added this to the 2.0.0 milestone Jan 4, 2018
@TD-er
Copy link
Member

TD-er commented Jan 4, 2018

It also happens on the switch input.
When the Domoticz controller is active and you press a number of times (within a few seconds) the button, the ESP device will reboot.
Looks a bit like it is trying to send a number of messages, while the last one is not yet finished.
My test here was with Domoticz MQTT as active controller.

TD-er added a commit to TD-er/ESPEasy that referenced this issue Jan 4, 2018
As described in letscontrolit#673 .
The problem was partly related to the default values stored in flash ("0"), which was not a valid value for the switch type.

When upgrading from an older version of ESPeasy, make sure to check the switch type (normal switch or dimmer) and save the settings for the switch device again, even when nothing was changed.
Default configuration and new added switches will now work like intended.

When a controller is enabled (e.g. Domoticz MQTT or -HTTP) and the button is pressed multiple times, the ESP may reboot. See issue letscontrolit#674.
@TD-er
Copy link
Member

TD-er commented Jan 5, 2018

Just before the crash, it reports Exception (2).

Exception (2):
epc1=0x3fff3a48 epc2=0x00000000 epc3=0x00000000 excvaddr=0x3fff3a48 depc=0x00000000

ctx: cont
sp: 3fff3bf0 end: 3fff43c0 offset: 01a0

Taken from this list, which appears to be:

InstructionFetchErrorCause | Processor internal physical address or data error during instruction fetch

psy0rz pushed a commit that referenced this issue Jan 5, 2018
As described in #673 .
The problem was partly related to the default values stored in flash ("0"), which was not a valid value for the switch type.

When upgrading from an older version of ESPeasy, make sure to check the switch type (normal switch or dimmer) and save the settings for the switch device again, even when nothing was changed.
Default configuration and new added switches will now work like intended.

When a controller is enabled (e.g. Domoticz MQTT or -HTTP) and the button is pressed multiple times, the ESP may reboot. See issue #674.
@psy0rz
Copy link
Member

psy0rz commented Jan 5, 2018

sounds serious, ill try to reproduce it as well.

@TD-er
Copy link
Member

TD-er commented Jan 5, 2018

I was just playing around with running background tasks just to make sure the watchdog is not kicking in. with multiple controller messages in a short time.

So I added something like this:

void sendData(struct EventStruct *event)
{
  if (timePassedSince(lastSend) < 1000) {
    // Frequent subsequent calls to the controller.
    // Perform yield first to make sure the wdt reset will occur.
    //yield();
    delay(1);
  }

Somewhat inspired by #676 and also what was discussed here: esp8266/Arduino#1997 (comment)

But adding that code will lead to much more frequent crashes.
The exception(2) (InstructionFetchError | Attempt to execute instructions from an illegal address, or read error when fetching instruction) is some kind of indication a function call is being made to something that is no longer (or not yet) there.
Something like a pointer dereference to something that's being destroyed or not deleted properly.
For example some inherited class whose destructor is not virtual.
It could also be the result of some buffer overflow, or a string pointer without a 0 termination.

So lots of possibilities. Any suggestion on how to analyse the stack?

@psy0rz
Copy link
Member

psy0rz commented Jan 6, 2018

I think there was some stack analyser for platformio or arduino GUI, but i forgot how it was named. It could translate the stack dump to a nice backtrace.

@TD-er
Copy link
Member

TD-er commented Jan 6, 2018

Hmm, when built using the Arduino IDE, I cannot get it to crash.
But using the same source, built using PlatformIO and uploaded via Atom, I get a crash after just a few presses on the button (using the flash button on the NodeMCU)

It uses the same config, since that was not changed between flashes.

Problem is, I now have a stacktrace via Arduino IDE, but it doesn't make sense, so I guess the images are actually different in build settings. (the .elf file is significantly smaller when built using PlatformIO)
I don't see any line numbers in the stacktrace.
I use the ESP Exception Decoder, described here: http://arduino-esp8266.readthedocs.io/en/latest/Troubleshooting/stack_dump.html

Exception 2: InstructionFetchError: Processor internal physical address or data error during instruction fetch
Decoding 38 results
0x4021067a: PubSubClient::loop() at ?? line ?
0x40210512: PubSubClient::loop() at ?? line ?
0x40211fc5: ESP8266WebServer::handleClient() at ?? line ?
0x40236b89: backgroundtasks() at ?? line ?
0x40236bc0: delayBackground(unsigned long) at ?? line ?
0x40238829: sendData(EventStruct*) at ?? line ?
0x4023915c: Plugin_001(unsigned char, EventStruct*, String&) at ?? line ?
0x40101b41: trc_NeedRTS at ?? line ?
0x40101cd4: trc_NeedRTS at ?? line ?
0x4010020c: _umm_free at umm_malloc.c line ?
0x402023f0: String::changeBuffer(unsigned int) at ?? line ?
0x4010020c: _umm_free at umm_malloc.c line ?
0x4010020c: _umm_free at umm_malloc.c line ?
0x40100688: free at ?? line ?
0x40202390: String::~String() at ?? line ?
0x40229a14: callback(char*, unsigned char*, unsigned int) at ?? line ?
0x401048f9: ets_timer_disarm at ?? line ?
0x40106f01: __udivsi3 at d:\ivan\projects\arduinoesp\toolchain\dl\gcc-xtensa\build-2\xtensa-lx106-elf\libgcc/../../../libgcc/config/xtensa/lib1funcs.S line 540
0x40107114: millis at ?? line ?
0x4010053d: realloc at ?? line ?
0x40216171: timePassedSince(unsigned long) at ?? line ?
0x402023f0: String::changeBuffer(unsigned int) at ?? line ?
0x4020243f: String::reserve(unsigned int) at ?? line ?
0x40202475: String::copy(char const*, unsigned int) at ?? line ?
0x402024ca: String::String(char const*) at ?? line ?
0x40202390: String::~String() at ?? line ?
0x4021d072: PluginCall(unsigned char, EventStruct*, String&) at ?? line ?
0x40203430: esp_yield at ?? line ?
0x402034a8: yield at ?? line ?
0x40210580: PubSubClient::loop() at ?? line ?
0x40203430: esp_yield at ?? line ?
0x402384a1: run10TimesPerSecond() at ?? line ?
0x4023de91: loop at ?? line ?
0x4020347c: loop_wrapper() at core_esp8266_main.cpp line ?
0x40205e14: cont_norm at cont.o line ?

@TD-er
Copy link
Member

TD-er commented Jan 6, 2018

I added the option "-g3" to the PlatformIO build flags and now I believe I have some stacktrace:

Exception 2: InstructionFetchError: Processor internal physical address or data error during instruction fetch
Decoding 45 results
0x4020e3ba: PubSubClient::loop() at C:\Arduino\ESPeasy\arduino-1.8.3\portable\sketchbook\ESPEasy_Gijs\ESPEasy/lib\pubsubclient\src/PubSubClient.cpp line 590
0x4020e256: PubSubClient::loop() at C:\Arduino\ESPeasy\arduino-1.8.3\portable\sketchbook\ESPEasy_Gijs\ESPEasy/lib\pubsubclient\src/PubSubClient.cpp line 590
0x40211f79: ESP8266WebServer::handleClient() at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\libraries\ESP8266WebServer\src/ESP8266WebServer.cpp line 299
0x40236a19: backgroundtasks() at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40236a50: delayBackground(unsigned long) at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40238680: sendData(EventStruct*) at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40238fa4: Plugin_001(unsigned char, EventStruct*, String&) at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x401040fc: lmacTxFrame at ?? line ?
0x40104062: lmacTxFrame at ?? line ?
0x40100793: ppProcessTxQ at ?? line ?
0x40100e32: pp_post at ?? line ?
0x40103c67: lmacMSDUAged at ?? line ?
0x401035c6: lmacRecycleMPDU at ?? line ?
0x4010020c: _umm_free at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1287
0x402023f4: String::changeBuffer(unsigned int) at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x4010020c: _umm_free at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1287
0x4010020c: _umm_free at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1287
0x4010068c: free at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1733
0x40202394: String::~String() at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x40229908: callback(char*, unsigned char*, unsigned int) at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40203434: esp_yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x402034ac: __yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x402034ec: optimistic_yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x4020bd27: WiFiUDP::parsePacket() at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\libraries\ESP8266WiFi\src/WiFiUdp.cpp line 283
0x40236642: checkUDP() at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x401048fd: ets_timer_disarm at ?? line ?
0x401048fd: ets_timer_disarm at ?? line ?
0x4020e26d: PubSubClient::loop() at C:\Arduino\ESPeasy\arduino-1.8.3\portable\sketchbook\ESPEasy_Gijs\ESPEasy/lib\pubsubclient\src/PubSubClient.cpp line 590
0x4010053d: _umm_realloc at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1491
:  (inlined by) realloc at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\umm_malloc/umm_malloc.c line 1709
0x402023f4: String::changeBuffer(unsigned int) at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x40202443: String::reserve(unsigned int) at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x40202479: String::copy(char const*, unsigned int) at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x402024ce: String::String(char const*) at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x40202394: String::~String() at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/WString.cpp line 688
0x4021cffa: PluginCall(unsigned char, EventStruct*, String&) at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40203434: esp_yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x402034ac: __yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x4020e2c0: PubSubClient::loop() at C:\Arduino\ESPeasy\arduino-1.8.3\portable\sketchbook\ESPEasy_Gijs\ESPEasy/lib\pubsubclient\src/PubSubClient.cpp line 590
0x40203434: esp_yield at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x402382f5: run10TimesPerSecond() at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x4023dcbd: loop at C:/Arduino/ESPeasy/arduino-1.8.3/portable/sketchbook/ESPEasy_Gijs/ESPEasy/src/_P047_i2c-soil-moisture-sensor.ino line 515
0x40203480: loop_wrapper at C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266/core_esp8266_main.cpp line 56
0x40205df0: cont_norm at cont.o line ?

@TD-er
Copy link
Member

TD-er commented Jan 7, 2018

Some update. Also related to what @s0170071 already noticed in #676

Things to rule out: (at least not related to this issue)
In PubSubClient.cpp I added 2 checks to make sure no out-of-bound issues with the buffer were causing these crashes:
Changed:
(line 247) } while ((digit & 128) != 0);
To:
} while ((digit & 128) != 0 && len < (MQTT_MAX_PACKET_SIZE -2));

And similar:
Changed:
(line 528) while (*idp) {
to:
while (*idp && pos < (MQTT_MAX_PACKET_SIZE - 2)) {

That made no difference.

In SendData() (Controller.ino) I removed
delayBackground(delayms);

Then you only get the controller to crash when pressing the switch intensively for a number of seconds.
This results in a panic, with the watchdog kicking in:
Panic C:\users\gijs\.platformio\packages\framework-arduinoespressif8266\cores\esp8266\core_esp8266_main.cpp:98 __yield

So I added a yield(); , but that made no difference.

The code is also rather fishy about checking only controller 0 for being enabled or having MQTT use.

I think I can fix the issue with the switch by delaying consecutive updates and only sending the last one when pressing the switch very frequently. But I think the real problem is with the sendData function and the background services. So other plugins may experience similar issues.

@s0170071
Copy link
Contributor

s0170071 commented Jan 7, 2018 via email

@TD-er
Copy link
Member

TD-er commented Jan 7, 2018

I'm only running on core 2.3, since that's the version with ESPeasy

@s0170071
Copy link
Contributor

s0170071 commented Jan 7, 2018

My crash was from an Arduino IDE build. So the problem is still there. Had sth. similar a while ago. In the end it was a uninitialized variable in an interrupt function. One compiler setting initialized it to zero, the other didn't.

@TD-er
Copy link
Member

TD-er commented Jan 7, 2018

Hmm, that may be a good pointer to start looking, since in the Arduino IDE the crash was at least much harder -if not impossible- to reproduce.

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

If you set a message delay of 30000ms at the advanced settings page, this bug gets triggered every time.
(i'm using switch plugin with domoticz mqtt)

I'll try to figure out whats going on and i'll test your PR as well. Also i wonder what happens if a MQTT message arrive and the mqtt callback is called when doing a delaybackground somewhere.

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

i added a bunch of debug statements and indeed its crashing on the mqtt stuff in background tasks.

the easy way out would be to move MQTTclient.loop() to loop() so that its never called from delayBackground()

will try to see if i can pinpoint the exact reason of the crash.

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

woops found a probably unrelated bug in mqtt callback:

if (length>sizeof(c_payload)-1)
  {
    addLog(LOG_LEVEL_ERROR, F("MQTT : Ignored too big message"));
  }

threre is no return, so too big messages would be processed and cause a crash still.

@s0170071
Copy link
Contributor

Don't need mqtt for a crash. See #676. I suspect the stack. Does anyone of you know how where the sp is located?
@TD-er do you think initializing / zeroing the whole memory (ram) is done anywhere in the code already?

@TD-er
Copy link
Member

TD-er commented Jan 10, 2018

@psy0rz Good find. I remember looking at that part and finding it a bit strange, but didn't trigger a "oh-no" reaction ;)

I also noticed on the stacktrace that the loop was being called from loop(), but I couldn't see why. So I was always a bit unsure about the added value of the stacktrace.

@s0170071 I don't think it is cleared, since some things survive a reboot.
I don't think it is necessary to clear it, since the uninitialized part should never be read anyway ;)
But I think it is a good addition to allow to clear the configuration flash (all except the firmware) to allow a clean start and make sure no left over configuration is causing issues.

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

@s0170071 its still possible that they have enabled a mqtt controller in #676 . (especially since it seems to be fixed after removing the delaybackground there)

i'll see if i have that display to make sure.

@s0170071
Copy link
Contributor

"They" was me. And I did not enable anything other than that display. At least not that I am aware of....
Remember: press EDIT button of that display in the web interface the very moment scrolling happens. It's good practice to push the mouse button before and just release it ... You'll figure it out :-)

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

ah ok sorry. was hoping it was the same underlying issue. :)

@psy0rz
Copy link
Member

psy0rz commented Jan 10, 2018

ok we fixed some issues that might be related. can you try again? the next nightly v2.0 release should have the fixes. if its still not fixed let me know. i'll also try to reproduce it with http and the ds18 here. (will add it to the automated test suite)

TD-er added a commit to TD-er/ESPEasy that referenced this issue Jan 10, 2018
As described in letscontrolit#673 .
The problem was partly related to the default values stored in flash ("0"), which was not a valid value for the switch type.

When upgrading from an older version of ESPeasy, make sure to check the switch type (normal switch or dimmer) and save the settings for the switch device again, even when nothing was changed.
Default configuration and new added switches will now work like intended.

When a controller is enabled (e.g. Domoticz MQTT or -HTTP) and the button is pressed multiple times, the ESP may reboot. See issue letscontrolit#674.
psy0rz pushed a commit that referenced this issue Jan 10, 2018
* [switch] Fixed switch behavior and default settings. (#675)

As described in #673 .
The problem was partly related to the default values stored in flash ("0"), which was not a valid value for the switch type.

When upgrading from an older version of ESPeasy, make sure to check the switch type (normal switch or dimmer) and save the settings for the switch device again, even when nothing was changed.
Default configuration and new added switches will now work like intended.

When a controller is enabled (e.g. Domoticz MQTT or -HTTP) and the button is pressed multiple times, the ESP may reboot. See issue #674.

* ABC calibration feature added (#606)

* [Flash info] Detailed flash information (#678)

Last few days a number of issues and forum topic was about the type of flash used on the ESP boards.

This is an extension of the detailed information page.

Perhaps also merge with the newer and more clear layout of pull request #624?
That pull request was only merged to the mega branch.
I kept the changes local, but perhaps they should be placed in the "Storage" section introduced with #624.
Maybe also that pull request should get merged into the v2.0 branch.

* Bugfix/v2.0 crash switch (#682)

* [crashes] Added constructors to initialize all members in structs

Numerous structs are defined, but none of them have default constructors and there is no guarantee the members will be set when used. 
With these default constructors, the parameters at least have an initialized value.

* [PubSubClient] Add bound checks on the internal buffer

Not sure if this was really causing an issue, but proper bound checks are always a good thing.

* [Crash Switch] Disabled delayBackground and added yield() calls

Something really fishy is going on with the delayBackground function, which will result in crashes when pressing the switch multiple times, with Domoticz MQTT enabled as first controller.
Disabled for now and delay(1) added to give background tasks a chance to do their work and make sure the watchdog doesn't perform a reset.

* [CI build errors] Commented out some unused variables

Travis considers them as error and fails the checks.

* [CI check] Out-of-bounds check fix

* actually ignore MQTT messages that are too big.

* moved mqtt stuff outside of backgroundtasks(). fixes #683 in my test scenario

* [Adafruit MPR121] Change deprecated name setThreshholds to setThresholds (#685)

See #684

* fixed plugin id of "Communication - Kamstrup Multical 401". (accidental octal notation)

* changed devicecombobox handling to save a lot of memory on device page. fixes #654 #676 and could be triggered by #683 in some cases.

* [CPPcheck] v2.0 ControllerSettingsStruct some variables not initialized (#692)

Fixing these cppcheck errors:
101.43s$ cppcheck --enable=warning src/*.ino -q --force -I src --include=src/ESPEasy.ino --error-exitcode=1
[src/ESPEasy.ino:500]: (warning) Member variable 'ControllerSettingsStruct::HostName' is not initialized in the constructor.
[src/ESPEasy.ino:500]: (warning) Member variable 'ControllerSettingsStruct::Publish' is not initialized in the constructor.
[src/ESPEasy.ino:500]: (warning) Member variable 'ControllerSettingsStruct::Subscribe' is not initialized in the constructor.
@psy0rz
Copy link
Member

psy0rz commented Jan 12, 2018

should be fixed now, please check tonights nightly's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Controller Related to interaction with other platforms Type: Bug Considered a bug
Projects
None yet
Development

No branches or pull requests

5 participants