-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
currentMillis and targetMills may not be addressing rollover #16
Comments
abqmichael, If you are not already, it will be helpful for debugging to log all incoming and outgoing MQTT messages. Be sure to capture the message published by the node during onConnectionEstablished(). It could well be that the MQTT message going to the node is published to the topic, but the node is unable to respond because it is reconnecting. If the onConnectionEstablished MQTT message is logged, it will be possible to determine if that is the case. Hope this helps. |
Thanks for the attention. I did some further testing that is pointing the finger that something is causing millis() to roll over at 2^32/1000 instead of the 2^32. That is, the rollover happens about every two hours. It might be that something inside millis is counting nano-seconds. However, this first test used your code, the libraries it depends upon and some other libraries like time.h and U8g2lib. So it might be that something inside one of these codes is responsible for the resetting. To that end, I am working on a second test that only uses raw Arduino to minimally test mills. I would be happy to send you either of these codes. Here's how my first test worked: My first test involved writing a program that would create/schedule three delays (quick, medium, slow) and print a status on the slow event... e.g.,; void slow_event() { I also went into EspMQTT and created a method that would let me query the DelayedExecutionList and hooked that up to a method subscribed to an MQTT message (e.g., with client.subscribe("m/twiddle", twiddle);) That method looks like this: void EspMQTTClient::printDelayedExecutionList() { My twiddle method looked like this: void twiddle(const String & payload) { When I ran the test, my events played up until time got just short of 4294967. EspMQTT was running fine at that point so I went in with twiddle and saw that my three events were sitting there waiting for time to get someplace a little short of 4294967. As an aside, it would be nice if EspMQTT had some utilities for managing and querying the callbacks. For example, it would be nice if you could create the callbacks with a name and then be able to query EspMQTT with the horizon time and name of each event in the system. In developing my code and tests, I found that my code seemingly randomly creates multiple copies of some of my events which EspMQTT politely executes. I only found that by running the printDelayedExecutionList. My thought is that it would be easier to track down the bugs that led to this if I had some way of querying and managing the events. |
My next test proves that something is wrong with the underlying system. The code below shows that time gets reset to zero some time around 4294961. I am using the Arduino development environment on a NodeMCU ESP32 board. Per install instructions, I selected the Heltech_WIFI_KIT_32, which I think uses the Espressif IoT Development Framework (I will need to double check that). After about 1 hour 11 minutes, the millis time nears 4294961 and a few seconds after that, it drops to zero. As this may be a problem on other boards (for at least a while), it would be nice if EspMQTT could give a work-around in the event system. I will try a few things and report back. #define CHECK_IN_DELAY 100000 void setup() { void print_status(unsigned long this_loop_time) { // to 14324 Next check in 24324 void loop() { |
I found a bug report from February, 2019, that identifies the problem as being inside espressif. The bug report is titled: unsigned long millis() timer does not wrap / ignores top 10 bits / causes regression #2455 The bug is in cores/esp32/esp32-hal-misc.c unsigned long IRAM_ATTR millis() One commenter recommends it should return 1000ULL and the author says he implemented that. However, it doesn't appear that the fix has made it into a release. |
Final note. I did a fresh install of the ESP32 board library and it looks like the fix is in version 1.02. My problem, apparently, was that though Arduino thought it had the newest library, my install was pointing to an old one. So I uninstalled the ESP32 board library and deleted a subdirectory with the driver code from my Arduino directory. Then did a fresh install and re-ran the count up test. |
Thanks for the workaround @abqmichael |
I have been having a problem with events either getting dropped or my code not handling them properly. (I am using a NodeMCU ESP32 board.) In digging through EspMQTTClient, I identified one corner-case problem in how currentMillis and targetMills are compared to determine whether the event should fire. (This is within EspMQTTClient::loop()).
Currently, these are compared using
unsigned long currentMillis = millis();
...
if (mDelayedExecutionList[i].targetMillis <= currentMillis)
My thought is to imagine that the time returned by millis as it nears 2^32 or 4294967296 (49 days). First I imagined that the time is 2^32 - 1000 and a new delay got set 10000 ms (10 sec) into the future. At this point, the target would be computed as 9000 ms. As targetMillis <= currentMillis, the event would fire right away.
Now imagine the reverse, an event gets set with targetMillis to 2^32 - 100. Then imagine that the user's work took 200 milliseconds to complete so that currentMillis rolled around to a small number. At this point, targetMillis would be a big number while currentMillis would be small and we would be unlikely to see that event get fired for another 49 days.
Sadly, I can't figure my way out of this conundrum. Comparing the numbers isn't right. Subtracting the two times will always return a positive number (unsigned numbers are like that). It might be useful to not generate events that land within some small delta of the epoch. Or it might be just good to document the limiting condition and let the user decide what to do.
The text was updated successfully, but these errors were encountered: