Create new device for tests that deploy configs #2753

damonbarry · 2020-03-30T04:49:08Z

The new end-to-end tests have become increasingly flaky as more tests (and more complicated tests) have been added. This is largely because most of the tests share the same edge device identity, which leads to at least two problems:

As more configurations are deployed, edge hub works increasingly harder to route traffic, often to devices/modules from previous tests that have already been removed. Tests that run near the end of the suite end up failing randomly, simply because we aren't waiting long enough for edge hub to work as designed under rapid configuration churn (nor can we afford to wait; the tests need to run reasonably fast).
The currently-running test has no way to know when edge hub has changed it's routes to match the current deployment, as opposed to the previous deployment from the previous test. This means that when a test's modules start sending messages, those first messages might be delivered to the wrong destination or not delivered at all. Many tests carefully count received messages so they're sensitive to this problem.

With this change, we create a new edge device identity for any test/fixture that deploys its own configuration. This allows edge hub to start up fresh each time, and tests can more readily verify the exact scenario they're testing. And we still get the speed benefit of only installing iotedged once per run (as opposed to the old end-to-end framework, which installs everything from scratch for every test--reliable but slow).

Included changes:

Add a DeviceId class that can generate unique IDs from a common base. The base ID is used as the prefix for all logs in a single test run.
Remove DeviceId property from the Context class. Early on, it was nice to be able to pass in a device ID instead of having the tests generate one, but it's not really that useful anymore, and the new approach is much simpler.
Promote the DeviceId property on EdgeRuntime to public.
In code locations that formerly referred to Context.Current.DeviceId: use DeviceId.Generate() to get a new device ID if needed, otherwise use EdgeRuntime.DeviceId to get the parent edge device's ID.
Remove the IdentityLimits class. The DeviceId class now centralizes and encapsulates the creation of device IDs so callers don't have to remember to check length limits after creating their own IDs.
Push SAS- and X509-specific code from ManualProvisioningFixture into the appropriate derived classes. ManualProvisioningFixture could probably go away but would require more careful thought so holding off on that.
In SasManualProvisioningFixture, change the [OneTimeSetUp] method to [SetUp] so that each test under this fixture gets a new edge device identity. The exception to this rule is tests under the derived CustomCertificatesFixture (TransparentGateway tests), which only gets one shared device by overriding [SetUp] with an empty method, and instead calling the parent logic from [OneTimeSetUp].
In X509ManualProvisioningFixture, remove some unused variables.
In the ValidateMetrics test we were deploying a vanilla agent image first so that the agent would get updated when we deploy it with special config later (see Improve ValidateMetrics test on Windows #2583). Now that we configure iotedged with a fresh device before starting the test, it will automatically start out with the vanilla agent so I removed the workaround code from the test.
In the Module test class, several tests use the temp sensor module so the TempSensorModule helper class was created to hand out unique module names to avoid edge hub routing conflicts. But now that we restart iotedged with a fresh device ID for each test, we don't need the helper class anymore. I removed it.
Refactor some test code to follow style guidelines, e.g., don't explicitly call default base constructor, don't explicitly decorate class members with private, etc.

This reverts commit 5b1fbc8.

Docker images take a _long_ time to download on Raspberry Pi. Since we recently started deleting all images before running the tests (#2736), the first test to run (currently it's `QuickstartCerts`) has to download images from scratch. Also, the `TempFilterFunc` test has to download a rather large image that doesn't share anything in common with our other images. Both these tests tend to timeout before they can download everything, even after the recent timeout increase to 10 min (#2761). This change expands the Linux arm32v7-specific build setup to: 1. git clean as sudo - we need to clean up the source code tree before running the tests, especially the previous run's test logs and certificates (the latter are owned by root, so the `checkout` task in `e2e-setup.yaml` can't delete them). 2. Parse `context.json` to get the list of required images for this test run. 3. `docker pull` the required images _before any old images are removed_, to take advantage of layer caching which speeds up downloads significantly. 4. Remove old containers, images, docker networks, and volumes. With these changes in place, tests on the Pi are completing in 30 seconds-2.5 minutes. As a result this change also reduces the test timeout to 6 minutes so that failed tests don't slow the run down as much as the previous 10 minute timeout. A few tests that were failing before are still failing and need to be investigated separately: - `TransparentGateway` for MQTT + CA/self-signed certs - `TempFilterFunc` and `ValidateMetrics` (see #2753) - `PriorityQueueModuleToModuleMessages` (see #2762)

dylanbronson

LGTM - Linux went all green with this PR. Windows still has issues, but unrelated to this PR it seems

Given recent improvements to the end-to-end test framework (#2635, #2753), the logic that splits deployments into multiple stages (#2248) is no longer needed. This change removes that logic. In verifying the changes, I also ran into occasional failures in the `ValidateMetrics` test because the stringized JSON labels that were recently added to Edge Agent containers failed to compare. The expected and actual twin reported properties always had the same JSON properties, but sometimes in different orders: ``` systemModules.edgeAgent.settings.createOptions: {"Labels":{"net.azure-devices.edge.create-options":"{}","net.azure-devices.edge.env":"{\"RuntimeLogLevel\":{\"value\":\"debug\"},\"PerformanceMetricsUpdateFrequency\":{\"value\":\"00:00:20\"}}","net.azure-devices.edge.owner":"Microsoft.Azure.Devices.Edge.Agent"} systemModules.edgeAgent.settings.createOptions: {"Labels":{"net.azure-devices.edge.create-options":"{}","net.azure-devices.edge.env":"{\"PerformanceMetricsUpdateFrequency\":{\"value\":\"00:00:20\"},\"RuntimeLogLevel\":{\"value\":\"debug\"}}","net.azure-devices.edge.owner":"Microsoft.Azure.Devices.Edge.Agent"} ``` The logic that compares these properties is trying to confirm that the configuration has been deployed, and the agent's labels aren't related to the deployment so I just removed them from the comparison.

Given recent improvements to the end-to-end test framework (Azure#2635, Azure#2753), the logic that splits deployments into multiple stages (Azure#2248) is no longer needed. This change removes that logic. In verifying the changes, I also ran into occasional failures in the `ValidateMetrics` test because the stringized JSON labels that were recently added to Edge Agent containers failed to compare. The expected and actual twin reported properties always had the same JSON properties, but sometimes in different orders: ``` systemModules.edgeAgent.settings.createOptions: {"Labels":{"net.azure-devices.edge.create-options":"{}","net.azure-devices.edge.env":"{\"RuntimeLogLevel\":{\"value\":\"debug\"},\"PerformanceMetricsUpdateFrequency\":{\"value\":\"00:00:20\"}}","net.azure-devices.edge.owner":"Microsoft.Azure.Devices.Edge.Agent"} systemModules.edgeAgent.settings.createOptions: {"Labels":{"net.azure-devices.edge.create-options":"{}","net.azure-devices.edge.env":"{\"PerformanceMetricsUpdateFrequency\":{\"value\":\"00:00:20\"},\"RuntimeLogLevel\":{\"value\":\"debug\"}}","net.azure-devices.edge.owner":"Microsoft.Azure.Devices.Edge.Agent"} ``` The logic that compares these properties is trying to confirm that the configuration has been deployed, and the agent's labels aren't related to the deployment so I just removed them from the comparison.

damonbarry changed the title ~~E2e device per test~~ Create new device for tests that deploy configs Mar 30, 2020

damonbarry added 22 commits March 30, 2020 16:29

Remove private keywork per style guidelines

5a0a27d

Don't explicitly call constructor with no args

f3b1fec

Remove empty constructor

9ebc78b

Use a constant for x509 device suffix

3d1c727

Remove unused properties

e0f56b6

Create variable to eliminate duplication

c8a6ffe

Move SAS/X509 logic into appropriate fixtures

19708d2

Rename helper method

cf5d4d3

Move EdgeRuntime member into derived classes

3bc6d43

Create a new device for every SAS-manual test

e1e2405

Create device once in CustomCertificatesFixture

8f87ba4

Don't allow users to specify their own deviceId

a845bcb

Add DeviceId class for centralized generation

c5f2444

Update a comment

31f2de4

Remove unused ctor parameter

10a0b0e

Don't double-increment DeviceId count

395d23e

Remove TempSensor module name generator

ce65cdc

Don't deploy vanilla agent in metrics test

97f4049

Fix StyleCop errors

3ba46eb

Give Pi more time to download docker images

98155c0

Revert "Give Pi more time to download docker images"

ded14f1

This reverts commit 5b1fbc8.

Profile SAS provisioning in fixture not tests

0f18b42

damonbarry force-pushed the e2e-device-per-test branch from 184ae7c to 0f18b42 Compare March 30, 2020 23:34

damonbarry mentioned this pull request Apr 2, 2020

Pre-pull docker images for Pi end-to-end tests #2778

Merged

damonbarry marked this pull request as ready for review April 10, 2020 16:13

dylanbronson and others added 2 commits April 10, 2020 10:32

Merge branch 'master' into e2e-device-per-test

bd133fb

Changing priorityQueue test to take correct DeviceId

244ba7b

dylanbronson approved these changes Apr 10, 2020

View reviewed changes

Merge branch 'master' into e2e-device-per-test

38684e5

damonbarry added the ready-to-merge label Apr 10, 2020

kodiakhq bot merged commit 4744cf6 into Azure:master Apr 10, 2020

damonbarry deleted the e2e-device-per-test branch April 10, 2020 22:58

damonbarry mentioned this pull request Jul 24, 2020

Don't stage deployments #3237

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new device for tests that deploy configs #2753

Create new device for tests that deploy configs #2753

damonbarry commented Mar 30, 2020 •

edited

Loading

dylanbronson left a comment

Create new device for tests that deploy configs #2753

Create new device for tests that deploy configs #2753

Conversation

damonbarry commented Mar 30, 2020 • edited Loading

dylanbronson left a comment

Choose a reason for hiding this comment

damonbarry commented Mar 30, 2020 •

edited

Loading