Periodical loss of connection on devices (+- 24H) #292

maxstefaniv · 2021-05-10T10:50:41Z

Expected Behavior

Devices will report telemetry non stop, reaching IoT Hub with no issues.

Current Behavior

After Set Up devices Reported properly for around 24 hours after that started displaying:
could not send message to IoTHub/Edge with error: The operation timed out.
Unless iotEdge is rebooted, all devices will report the same message.

Steps to Reproduce

Deploy the project as it is (gateway device AIOT-ILRA01) (Devices ELSYS Co2 and Netvox R311W)
Add additional decoder (should not be an issue as this module only decodes the message coming from the device as telemetry.)
Wait 24h

Context (Environment)

Device (Host) Operating System

Ubuntu 16.04

Architecture

amd64

LoRaWAN Module Version

LoRaWanNetworkSrvModule  running          Up 2 hours       loraedge/lorawannetworksrvmodule:1.0.5
LoRaWanPktFwdModule      running          Up 2 hours       loraedge/lorawanpktfwdmodule:1.0.5
edgeAgent                running          Up 2 hours       mcr.microsoft.com/azureiotedge-agent:1.0.9.5
edgeHub                  running          Up 2 hours       mcr.microsoft.com/azureiotedge-hub:1.0.9.5

Logs

Attached support_bundle.zip
support_bundle.zip

Story description:

Starting with the LoRaWAN Gateway (AAEON AIOT-ILRA01), we deployed default modules from Microsoft Git and modified one additional module “DecoderValueSensor” adding logic to decode telemetry and deployed it on the Edge Device.
Afterwards, we enrolled a total of 24 multi-sensors, making sure they are sending telemetry that is reaching the IoT Hub, and stored messages are correctly decoded. Everything was fine.
We left the set up connected both to a network and to a power source. The following day, we connected to the Edge Device through SSH connection and noticed a series of error messages:
could not retrieve device twin with error: The operation timed out.
In response to this error, we restarted the Edge Device sudo systemctl restart iotedge, after which all modules restarted and, through checking the logs, error messages no longer appeared, so everything went back to normal.

Can you propose any other solutions that will allow us to detect this anomaly ASAP?
Is this behavior normal, have you encountered it before?

The text was updated successfully, but these errors were encountered:

Mandur · 2021-05-11T08:34:48Z

Hello @maxstefaniv Thank you for raising the issue. I have a few questions:

How old was the IoT Edge installation? Is it brand new?
Do you have multiple gateways?
Would you per case have the edge hub and agent logs?

We have seen such cases in the past in the CI under very specific condition, depending on answer above I would suggest we have a quick chat to troubleshoot further

maxstefaniv · 2021-05-11T11:11:09Z

How old was the IoT Edge installation? Is it brand new?
The Resource group with IoT hub was created on 27 April and Edge Device given its Connection string on 29 April, on Friday 30 April I enrolled 24 devices and first error was detected on May 3rd (error occurred through the weekend). I would say it is Brand new.

Do you have multiple gateways?
I have one gateway for each resource group.
first (raspberry Pi) for testing purposes with only 2 devices
second (AIOT-ILRA01) deployed with all devices (I am interested the most in fixing this one)

Would you per case have the edge hub and agent logs?
I have them from the first time it happened:
edgeHub partial log.txt
edgeAgent logs.txt

I am available to chat whenever it is possible to troubleshoot this issue.

Mandur · 2021-05-11T12:53:26Z

Thank you for the prompt reply.
I assume this is similar to the issue I am currently troubleshooting. In order to have a fix at the moment, I recommend setting the environment variable ENABLE_GATEWAY to false on the LoRaNetwork Server.
You can read more about the setting on the reply here, but basically this setting will skip the edge queue and directly interact with IoT Hub, therefore you won't be able to :

route messages through iot edge to other edge modules. (Decoders are not affect because are http invocation and not based on message routing)
No web proxy support
No local queuing in case of intermitted internet network connection

Please let me know if that makes it better. In case you want to IM my skype handle is mandurlevrai

ronniesa · 2021-05-11T13:12:29Z

@maxstefaniv does this happens only after deploying a new decoder or every time after 24h?

Make sense to try first with what @Mandur is pointing out just above with the ENABLE_GATEWAY to false.

maxstefaniv · 2021-05-11T19:29:50Z

@Mandur Hello, yeah "ENABLE_GATEWAY to false" was already set otherwise I was getting an error, that was a fix I already scaned from existing issues.

@ronniesa It happens everytime after 24 hours.

Mandur · 2021-05-12T06:24:40Z

@maxstefaniv When I look at your docker inspect logs, I cannot see the environment variable ENABLE_GATEWAY set to false. May I ask to quickly double check. You can run
docker exec LoRaWanNetworkSrvModule bash -c printenv |grep ENABLE_GATEWAY
(I am just double checking as it solved the issue for me)

I have a device that is running to try to get a repro, but at the moment I am failing to reproduce the problem, I think it might be benefical to have a call to look at the problem more in depth

maxstefaniv · 2021-05-12T07:04:12Z

@Mandur Sure I executed it.

When are you available. I can add you to my Teams account and we can have a call.

Mandur · 2021-05-12T07:34:01Z

Sure @maxstefaniv just add me, we can sync add hoc!

Mandur · 2021-05-12T18:50:17Z

Following the discussion we are testing a new version with updated client librairies to see if that fix the issue

maxstefaniv · 2021-05-13T13:07:45Z

Applied update LoRaWanNetworkSrvModule from 1.0.5 to 1.0.5.1 on test edge suggested by @Mandur, and it seems to help. If works on Prod edge as well, will close issue.

* Upgrade device sdk versions * Fix Mac issues * Fix indentation * Correct test

Mandur · 2021-05-16T12:38:49Z

As it seems our changes solved the issue, we merged this into dev. This fix will be be included in the next lorawan release.
Feel free to reopen if there are any more issues you face with it.

Thank you for raising this!

maxstefaniv · 2021-05-16T23:40:20Z

Effectively I was monitoring the situation through this weekend on both devices and there were no errors. All devices reauthenticate with gateways and there are no drops. This solution works. Thank you very much @Mandur.

maxstefaniv added the bug Something isn't working label May 10, 2021

Mandur mentioned this issue May 13, 2021

Fix #293 #292 #294

Merged

Mandur closed this as completed in #294 May 16, 2021

Mandur added a commit that referenced this issue May 16, 2021

Fix #293 #292 (#294)

83caedb

* Upgrade device sdk versions * Fix Mac issues * Fix indentation * Correct test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodical loss of connection on devices (+- 24H) #292

Periodical loss of connection on devices (+- 24H) #292

maxstefaniv commented May 10, 2021 •

edited

Loading

Mandur commented May 11, 2021

maxstefaniv commented May 11, 2021

Mandur commented May 11, 2021

ronniesa commented May 11, 2021 •

edited

Loading

maxstefaniv commented May 11, 2021

Mandur commented May 12, 2021

maxstefaniv commented May 12, 2021

Mandur commented May 12, 2021

Mandur commented May 12, 2021

maxstefaniv commented May 13, 2021 •

edited

Loading

Mandur commented May 16, 2021

maxstefaniv commented May 16, 2021

Periodical loss of connection on devices (+- 24H) #292

Periodical loss of connection on devices (+- 24H) #292

Comments

maxstefaniv commented May 10, 2021 • edited Loading

Expected Behavior

Current Behavior

Steps to Reproduce

Context (Environment)

Device (Host) Operating System

Architecture

LoRaWAN Module Version

Logs

Story description:

Mandur commented May 11, 2021

maxstefaniv commented May 11, 2021

Mandur commented May 11, 2021

ronniesa commented May 11, 2021 • edited Loading

maxstefaniv commented May 11, 2021

Mandur commented May 12, 2021

maxstefaniv commented May 12, 2021

Mandur commented May 12, 2021

Mandur commented May 12, 2021

maxstefaniv commented May 13, 2021 • edited Loading

Mandur commented May 16, 2021

maxstefaniv commented May 16, 2021

maxstefaniv commented May 10, 2021 •

edited

Loading

ronniesa commented May 11, 2021 •

edited

Loading

maxstefaniv commented May 13, 2021 •

edited

Loading