Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use self-update for initial update #3184

Merged
merged 6 commits into from
Aug 23, 2024
Merged

Conversation

nagworld9
Copy link
Contributor

@nagworld9 nagworld9 commented Aug 13, 2024

Description

We use self-update for initial update due to:-

  • New vms that are enrolled into RSM, they get isVMEnabledForRSMUpgrades as True and isVersionFromRSM as False in first goal state. we don't apply the update if isVersionFromRSM is false. Consequently, new vms remain on pre-installed agent until RSM drives a new version update. In the meantime, agent may process the extensions with the baked version.This can potentially lead to issues due to incompatibility.
  • If current version is N, and we are deploying N+1. We find an issue on N+1 and remove N+1 from PIR. If CRP created the initial goal state for a new vm before the delete, the version in the goal state would be N+1; If the agent starts processing the goal state after the deleting, it won't find N+1 and update will fail and the vm will use baked version.

Also added new test to test scenario. This scenario required custom image and created image manually. Steps added to wiki.
Daily run use this image which defined in test_suite.yml.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

azurelinuxagent/ga/agent_update_handler.py Outdated Show resolved Hide resolved
azurelinuxagent/ga/agent_update_handler.py Outdated Show resolved Hide resolved
@@ -54,6 +54,8 @@ variable:
- publish_hostname
- recover_network_interface
- cgroup_v2_disabled
- agent_wait_for_cloud_init
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no point in adding those to the daily run, is there? the custom image has not changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time when I updated lisa, I didn't know we have this test to validate against new lisa. I sent pr without checking it. Few later days I figured we have this test, and test was not passing on new lisa since lisa changed schema in templates. So custom template we use in this test is failing. That's why I added this as part of normal run to catch it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. I think adding them to the daily run just adds more overhead/delays to the run.

To address changes in lisa, we could create a new runbook, or a command-line option to the current runbook, to just execute these 2 tests. Then we could execute it every time we take a new drop of lisa and we update our container image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, though

Copy link
Contributor Author

@nagworld9 nagworld9 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole point I want to make here if we treat them special, people tend to forget since they are not part of regular run. To run them we don't need any special settings, one could run like regular test_suite, pass test_suit list in pipeline or to the runbook now.

But I agree adding to daily run is overhead when custom image has not changed. It's appropriate to add when we build custom-images on every run. I'll add a note and comment it out here.

tests_e2e/orchestrator/lib/agent_test_loader.py Outdated Show resolved Hide resolved
Copy link
Contributor

@maddieford maddieford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes LGTM 👍

if image not in self.images:
raise Exception(f"Invalid image reference in test suite {suite.name}: Can't find {image} in images.yml")
raise Exception(f"Invalid image reference in test suite {suite.name}: Can't find {image} in images.yml or image from a shared gallery")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "or image from a shared gallery", don't we skip this check if image is from a gallery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this msg to let user know that we only accept images from image.yml or gallery image when we get this exception.

@@ -208,7 +224,7 @@ def _load_test_suite(description_file: Path) -> TestSuiteInfo:
rest of the tests in the suite will not be executed). By default, a failure on a test does not stop execution of
the test suite.
* images - A string, or a list of strings, specifying the images on which the test suite must be executed. Each value
can be the name of a single image (e.g."ubuntu_2004"), or the name of an image set (e.g. "endorsed"). The
can be the name of a single image (e.g."ubuntu_2004"), or the name of an image set (e.g. "endorsed") or shared gallery image. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please also add an example of a shared gallery image value here
(e.g. "gallery/wait-cloud-init/1.0.2")


class InitialAgentUpdate(AgentVmTest):
"""
This test verifies that the Agent does initial update one very first goal state before it starts processing extensions for new vms that are enrolled into RSM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
This test verifies that the Agent does initial update one very first goal state before it starts processing extensions for new vms that are enrolled into RSM
This test verifies that the Agent does initial update on very first goal state before it starts processing extensions for new vms that are enrolled into RSM

@@ -54,6 +54,9 @@ variable:
- publish_hostname
- recover_network_interface
- cgroup_v2_disabled
# agent_wait_for_cloud_init and initial_agent_update runs on custom image, as we don't update the custom image daily in the test run, So we are skipping but make sure to run when we added breaking changes like new lisa version, pipeline changes etc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your concern about people forgetting these tests is valid. However, this comment won't help with that.

I'd suggest defining a new test suite ("manual_tests"?) and adding these tests to that suite. Then updating our documentation asking to ensure running those tests after updating LISA and at least once before any release.

Could be done on a separate PR, if you prefer so.

narrieta
narrieta previously approved these changes Aug 21, 2024
@nagworld9 nagworld9 merged commit 4412778 into Azure:develop Aug 23, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants