Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMQPWS/AMQP Connection Does Not Detect Disconnection in a Timely Manner #1229

Open
jdavmosafefleet opened this issue Feb 5, 2025 · 0 comments
Labels

Comments

@jdavmosafefleet
Copy link

We are migrating our device connections from MQTT to AMQP over WebSockets (AMQPWS) to leverage features such as message abandon and completion. The devices are vehicles, and we have been conducting tests to evaluate this transition.

During testing, we observed that when a device experiences a temporary disconnection (e.g., passing through a tunnel) that lasts for approximately 5 minutes, the Azure IoT SDK does not immediately detect the network loss. Instead, the disconnection event is only triggered approximately 18 minutes after the initial loss of internet access.

We tested both AMQP and AMQPWS, and both exhibit the same behavior. This issue prevents us from releasing this feature, as the delayed disconnection detection could lead to significant operational issues.

Steps to Reproduce:

  • Connect a device using AMQPWS to Azure IoT Hub.

  • Simulate a network disconnection lasting at least 2 minutes.

  • Restore the internet connection.

  • Observe that no disconnection event is triggered immediately upon reconnection.

  • Notice that the SDK only detects the disconnection event approximately 18 minutes later.

One test with the retry policy disabled

rhea:events [connection-1] Connection got event: disconnected +18m

One test with the retry policy enabled

rhea:events [connection-1] Connection got event: disconnected +22m

Expected Behavior:

The SDK should promptly detect a disconnection and attempt a reconnection as soon as network access is restored.

Observed Behavior:

The disconnection event is only triggered approximately 18 minutes after the initial network loss.

Test Scenarios:

Lab Environment: Device connected to a regular internet connection, manually disconnecting the network for at least 5 minutes.

Real-World Test: Device operating in a vehicle driving through the city, where natural disconnections occur (e.g., tunnels, poor signal areas). In this case, logs extracted after a full day of operation show the same delayed disconnection detection.

Local Test: Issue is not present when testing locally on a regular laptop.

Environment Details:

  • Architecture: 64-bit ARM

  • Device OS: Ubuntu 18.04.6 LTS (Bionic Beaver)

  • Application Environment: Running in a Docker container

  • Docker Version: 20.10.21, build 20.10.21-0ubuntu1~18.04.3

  • Container Base Image: node:20.10-alpine

Additional Information:

  • We have tested with and without the retry policy enabled, but the issue persists in both cases.
  • The issue occurs regardless of whether the disconnection is intentional (lab test) or natural (real-world test in a vehicle).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant