-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losing communication with the mqtt broker causes a delay in processing gpio events. #5075
Comments
Unfortunately, using hardware locks is difficult in my case. Is it possible to come up with something to reduce the impact of mqtt server unavailability on latency? |
The timeout as set in the controller configuration is used as timeout when trying to attempt a connection. But I do strongly advice you to look into some failsafe hardware solution to turn off the motor. Having some (emergency) limit switch which will turn off the power to the motor is always good to have. Not saying it isn't a problem in ESPEasy, or that I will not look into it, but there is no guaranteed minimum response time to act on the switch. So better add something extra for the emergency stop. |
The system uses an industrial frequency regulator motor controller with control via a rs485 interface and keep-alive events from ESP32, in the absence of which an emergency stop of the drive is performed. We can connect hardware limit switches to the drive, but now it’s difficult for me to do this, from the difficulties in disassembling the installation casings to the correct programming of the frequency regulator inputs. We need at least some temporary solution, because it is impossible to take the equipment out of operation, and due to the onset of hot weather, the drive shutdown time decreased and began to coincide with the next attempt to connect to the broker. That is, the scripts detect a break in communication with the broker and initiate the opening of the gate, and when the time comes for the limit switch event to fire, there is a delay due to network operations. And as a result, every minor network failure causes the gate to be tightly blocked. And there are about 1,500 passages through them per day. Maybe you can somehow increase the interval of attempts to connect to the broker to 90 seconds, for example? |
Is the ESP also processing relative high frequency pulse inputs? Can you draw schematically what the ESP has as inputs and what as outputs? |
There are no high frequency signals. Beam speed 0.25m/s The inductive hall sensor detects whether the beam extends beyond the working area. The power reserve before a collision is about 0.1m or 0.4 seconds including the full time for processing the event. |
But what has the ESP to do to stop the motor? |
|
Comments should be prefixed with //, not #, that might confuse the rules parser... 🤔 Not sure where you get support for that |
Are you sure this is about the MQTT reconnect? And if the WiFI connection is so bad at that location, why not considering to use Ethernet? |
There are no problems with this command. The problem is that the raise gpio event begins to be processed with a delay of up to 5 seconds when trying to reconnect to mqtt. sp_whex it shot alias for "serialproxy_writehex"
|
I have attached the ESP32 network layout diagram. ESP32 is located in the village.
all data sources are synchronized via ntp Based on the collected observations, the problem occurs when the MQTT server loses availability, while wifi and internet work stably. Previously, in colder weather, the motor rotated a little slower and the attempt to restore communication with mqtt did not overlap the time interval at the end of the movement. Now in the summer the engine began to work faster and began to coincide. |
What is your RSSI ? Do you have external wifi antenna? |
rssi is low , no external antenna , poor environmental conditions. Build: 20230224 - Mega32 sp_whex command removed from latest espeasy version. look up message |
As you use custom builds you can define timeouts for your builds |
So the unit got disconnected from/by the MQTT broker, but the ESP had not lost its WiFi connection to the AP? Given you have some unpredictable hops between the ESP and the MQTT broker, I think you should increase (!!) the timeout set in the controller. Just to be sure, I will now look into the code to see whether the timeout is handled correctly. Also could you try to make a build based on the current code and let some test node run with the same MQTT broker settings at that location? N.B.2 Just as a precaution, please make sure "Use Last Connected AP from RTC" and "Restart WiFi Lost Conn" are both checked on the tools->Advanced page. When you experience low RSSI values, you might also want to consider checking "Force WiFi B/G" as this allows for a few dBm less signal strength while still maintaining a stable connection. |
The problem with this command is that it was introduced in a pull request (#3980) that was never merged, but closed, The intent of that PR was implemented differently, but the command-alias wasn't ever included in ESPEasy. This implies that you are running a self-built Custom version of ESPEasy, and most likely not using the latest source code, making it very hard to provide support. Stating the fact that you are using a custom build in the initial message would have been quite helpful. Please update your source code to the latest available, as providing support on older releases is really hard because of the way software in general is evolving (updated Arduino framework versions f.i.) and other/many improvements in the ESPEasy code. As an alternative you can use a standard Collection C build (not sure what plugins you need besides P087) or a MAX build (including all plugins) on an ESP32 with 16MB Flash memory, ESP32-S3 with 8MB or 16MB Flash, or an ESP32-C6 (preliminary support) with 8MB or 16MB Flash. |
Can you show the timingstats? N.B. the timingstats will be cleared every time you reload the timing stats page. |
By the way, you can also try to change the |
The broker had no reason. There was a short-term (2-3 minutes) disruption of communication via the Internet. This happens to us sometimes. Ping delay via ethernet connection to mqtt server is 4ms . WiFi endpoint +2ms - 14ms I have a second ESP32 available in the specified location and conditions and with the same controller settings. Works as a key radio on a pole. version: |
I think I found something that for sure needs to be changed and you can also change it yourself and make a new build based on the same code base. The So as a test you can try to lower this to 1 second, while I will look into a more suitable fix. |
I don't remember getting the source tree any other way than just "git clone" head. I can't use another esp32 model because I'm using a factory controller. |
Yes, this was my issue request where the Ethernet issue on this board was resolved. |
I need some time to assemble a test stand with a blinking LED. |
With trying "another board" I meant literally any ESP32 board. By the way, I'm thinking of changing the boolean PubSubClient::readByte(uint8_t * result) {
if (_client == nullptr) {
return false;
}
uint32_t previousMillis = millis();
while(!_client->available()) {
delay(1); // Prevent watchdog crashes
if((int32_t)(millis() - previousMillis) >= static_cast<int32_t>(_client->getTimeout())){
return false;
}
}
*result = _client->read();
return true;
} Still needs some testing, but I guess this may fix quite a lot of 'odd' MQTT related issues as the current code appears to be completely blocking for upto 15 sec. |
Hmm the more I start digging, the more smelly it gets.... // ESP32_CONNECTION_TIMEOUT : Specific case for ESP32, we need to manually provide timeout as default (-1) leads to WDT reset (after 5 seconds).
// By default (4500 milliseconds) 4,5 seconds to avoid reaching 5s default watchdog reset time.
// This is multiplied in WiFiClient.cpp (part of arduino-esp32) by 1000 inside [WiFiClient::connect] method.
// ESP8266 Arduino framework in contrast has fixed 5000ms timeout. No need to define it manually here.
// See: https://github.com/knolleary/pubsubclient/pull/842
#ifndef ESP32_CONNECTION_TIMEOUT
#define ESP32_CONNECTION_TIMEOUT 4500
#endif I have not looked closely at this code since 2018... |
well, judging by the quote, this is only if ESP32_CONNECTION_TIMEOUT was not defined anywhere before. |
I do set the timeout in the |
Can you test the changes I made in this PR: #5076 ? |
@TD-er so it looks like our core controller was buggy. Maybe some forks of this lib were already patched for more bugs? |
I have looked at a few forks not that long ago. Here is one of the more active forks: https://github.com/hmueller01/pubsubclient/tree/dev-fixes |
here not sure if we use MQTT_MAX_TRANSFER_SIZE but uint8_t could be too small: ESPEasy/lib/pubsubclient/src/PubSubClient.cpp Lines 576 to 581 in 8fb409c
|
OK, just looked a bit through the most recent commits of that one and no way we're going to use it.
Look at this commit: hmueller01/pubsubclient3@f064788 When using inheritance, you should absolutely not call member functions in the constructor and here they try to call the constructor of itself from other constructors. If you want to clean-up the constructors by using default values, assign them when declaring the members in the .h file. |
We don't use it but indeed it might be a bit small. Edit: |
Please tell me what decision was ultimately made in the main branch? The Ethernet port itself and the nearest router do not turn off, my connection to the mqtt server is lost and everything freezes for a few seconds. |
Which build are you using? |
Build: ESP_Easy_mega_20240706_custom_ESP32_4M316k_LittleFS_ETH Jul 6 2024 |
Hmm that's so like "5 minutes ago"... ;) Right now I have no idea what may be the problem here. |
maybe we can try to connect to your mqtt server with other username/password and check if behavior is espeasy independent? |
My espeasy board scripts control the gate drive, including processing stops based on limit switch signals.
By analyzing the logs, I discovered that somewhere between the messages Broker C005 connection failed (1/0) and Broker C005 connection failed (3/0), the procedure for processing the GPIO trailer event occurs with significant delays (more than 1 sec).
On the "Advanced" settings page, the "Enable RTOS Multitasking" option is grayed out (blocked from being enabled).
Tell me what can be done? The motor jams because the limit switch signal is processed with an unacceptable delay.
Is it possible to set a small tcp connection timeout around 30-40ms?
The text was updated successfully, but these errors were encountered: