-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodical loss of connection on devices (+- 24H) #292
Comments
Hello @maxstefaniv Thank you for raising the issue. I have a few questions:
We have seen such cases in the past in the CI under very specific condition, depending on answer above I would suggest we have a quick chat to troubleshoot further |
How old was the IoT Edge installation? Is it brand new? Do you have multiple gateways? Would you per case have the edge hub and agent logs? I am available to chat whenever it is possible to troubleshoot this issue. |
Thank you for the prompt reply.
Please let me know if that makes it better. In case you want to IM my skype handle is mandurlevrai |
@maxstefaniv does this happens only after deploying a new decoder or every time after 24h? Make sense to try first with what @Mandur is pointing out just above with the ENABLE_GATEWAY to false. |
@maxstefaniv When I look at your docker inspect logs, I cannot see the environment variable ENABLE_GATEWAY set to false. May I ask to quickly double check. You can run I have a device that is running to try to get a repro, but at the moment I am failing to reproduce the problem, I think it might be benefical to have a call to look at the problem more in depth |
@Mandur Sure I executed it. When are you available. I can add you to my Teams account and we can have a call. |
Sure @maxstefaniv just add me, we can sync add hoc! |
Following the discussion we are testing a new version with updated client librairies to see if that fix the issue |
Applied update LoRaWanNetworkSrvModule from 1.0.5 to 1.0.5.1 on test edge suggested by @Mandur, and it seems to help. If works on Prod edge as well, will close issue. |
As it seems our changes solved the issue, we merged this into dev. This fix will be be included in the next lorawan release. Thank you for raising this! |
Effectively I was monitoring the situation through this weekend on both devices and there were no errors. All devices reauthenticate with gateways and there are no drops. This solution works. Thank you very much @Mandur. |
Expected Behavior
Devices will report telemetry non stop, reaching IoT Hub with no issues.
Current Behavior
After Set Up devices Reported properly for around 24 hours after that started displaying:
could not send message to IoTHub/Edge with error: The operation timed out.
Unless iotEdge is rebooted, all devices will report the same message.
Steps to Reproduce
Context (Environment)
Device (Host) Operating System
Ubuntu 16.04
Architecture
amd64
LoRaWAN Module Version
Logs
Attached support_bundle.zip
support_bundle.zip
Story description:
Starting with the LoRaWAN Gateway (AAEON AIOT-ILRA01), we deployed default modules from Microsoft Git and modified one additional module “DecoderValueSensor” adding logic to decode telemetry and deployed it on the Edge Device.
Afterwards, we enrolled a total of 24 multi-sensors, making sure they are sending telemetry that is reaching the IoT Hub, and stored messages are correctly decoded. Everything was fine.
We left the set up connected both to a network and to a power source. The following day, we connected to the Edge Device through SSH connection and noticed a series of error messages:
could not retrieve device twin with error: The operation timed out.
In response to this error, we restarted the Edge Device
sudo systemctl restart iotedge
, after which all modules restarted and, through checking the logs, error messages no longer appeared, so everything went back to normal.The text was updated successfully, but these errors were encountered: