Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use self-update for initial update #3184

Merged
merged 6 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 79 additions & 29 deletions azurelinuxagent/ga/agent_update_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,32 @@ def get_agent_update_handler(protocol):
return AgentUpdateHandler(protocol)


RSM_UPDATE_STATE_FILE = "waagent_rsm_update"
INITIAL_UPDATE_STATE_FILE = "waagent_initial_update"


class AgentUpdateHandler(object):
"""
This class handles two type of agent updates. Handler initializes the updater to SelfUpdateVersionUpdater and switch to appropriate updater based on below conditions:
RSM update: This is the update requested by RSM. The contract between CRP and agent is we get following properties in the goal state:
RSM update: This update requested by RSM and contract between CRP and agent is we get following properties in the goal state:
version: it will have what version to update
isVersionFromRSM: True if the version is from RSM deployment.
isVMEnabledForRSMUpgrades: True if the VM is enabled for RSM upgrades.
if vm enabled for RSM upgrades, we use RSM update path. But if requested update is not by rsm deployment
if vm enabled for RSM upgrades, we use RSM update path. But if requested update is not by rsm deployment( if isVersionFromRSM:False)
we ignore the update.
Self update: We fallback to this if above is condition not met. This update to the largest version available in the manifest
Self update: We fallback to this if above condition not met. This update to the largest version available in the manifest.
Also, we use self-update for initial update due to [1][2]
Note: Self-update don't support downgrade.

Handler keeps the rsm state of last update is with RSM or not on every new goal state. Once handler decides which updater to use, then
does following steps:
[1] New vms that are enrolled into RSM, they get isVMEnabledForRSMUpgrades as True and isVersionFromRSM as False in first goal state. As per RSM update flow mentioned above,
we don't apply the update if isVersionFromRSM is false. Consequently, new vms remain on pre-installed agent until RSM drives a new version update. In the meantime, agent may process the extensions with the baked version.
This can potentially lead to issues due to incompatibility.
[2] If current version is N, and we are deploying N+1. We find an issue on N+1 and remove N+1 from PIR. If CRP created the initial goal state for a new vm
before the delete, the version in the goal state would be N+1; If the agent starts processing the goal state after the deleting, it won't find N+1 and update will fail and
the vm will use baked version.

Handler updates the state if current update mode is changed from last update mode(RSM or Self-Update) on new goal state. Once handler decides which updater to use, then
updater does following steps:
1. Retrieve the agent version from the goal state.
2. Check if we allowed to update for that version.
3. Log the update message.
Expand All @@ -63,8 +75,8 @@ def __init__(self, protocol):
self._daemon_version = self._get_daemon_version_for_update()
self._last_attempted_update_error_msg = ""

# restore the state of rsm update. Default to self-update if last update is not with RSM.
if not self._get_is_last_update_with_rsm():
# Restore the state of rsm update. Default to self-update if last update is not with RSM or if agent doing initial update
if not self._get_is_last_update_with_rsm() or self._is_initial_update():
self._updater = SelfUpdateVersionUpdater(self._gs_id)
else:
self._updater = RSMVersionUpdater(self._gs_id, self._daemon_version)
Expand All @@ -78,32 +90,61 @@ def _get_daemon_version_for_update():
# use the min version as 2.2.53 as we started setting the daemon version starting 2.2.53.
return FlexibleVersion("2.2.53")

@staticmethod
def _get_initial_update_state_file():
"""
This file keeps if initial update is attempted or not
"""
return os.path.join(conf.get_lib_dir(), INITIAL_UPDATE_STATE_FILE)

def _save_initial_update_state_file(self):
"""
Save the file if agent attempted initial update
"""
try:
with open(self._get_initial_update_state_file(), "w"):
pass
except Exception as e:
msg = "Error creating the initial update state file ({0}): {1}".format(self._get_initial_update_state_file(), ustr(e))
logger.warn(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)

def _is_initial_update(self):
"""
Returns True if state file doesn't exit as presence of file consider as initial update already attempted
"""
return not os.path.exists(self._get_initial_update_state_file())

@staticmethod
def _get_rsm_update_state_file():
"""
This file keeps if last attempted update is rsm or not.
"""
return os.path.join(conf.get_lib_dir(), "rsm_update.json")
return os.path.join(conf.get_lib_dir(), RSM_UPDATE_STATE_FILE)

def _save_rsm_update_state(self):
def _save_rsm_update_state_file(self):
"""
Save the rsm state empty file when we switch to RSM
"""
try:
with open(self._get_rsm_update_state_file(), "w"):
pass
except Exception as e:
logger.warn("Error creating the RSM state ({0}): {1}", self._get_rsm_update_state_file(), ustr(e))
msg = "Error creating the RSM state file ({0}): {1}".format(self._get_rsm_update_state_file(), ustr(e))
logger.warn(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)

def _remove_rsm_update_state(self):
def _remove_rsm_update_state_file(self):
"""
Remove the rsm state file when we switch to self-update
"""
try:
if os.path.exists(self._get_rsm_update_state_file()):
os.remove(self._get_rsm_update_state_file())
except Exception as e:
logger.warn("Error removing the RSM state ({0}): {1}", self._get_rsm_update_state_file(), ustr(e))
msg = "Error removing the RSM state file ({0}): {1}".format(self._get_rsm_update_state_file(), ustr(e))
logger.warn(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)

def _get_is_last_update_with_rsm(self):
"""
Expand Down Expand Up @@ -152,25 +193,29 @@ def run(self, goal_state, ext_gs_updated):

agent_family = self._get_agent_family_manifest(goal_state)

# Updater will return True or False if we need to switch the updater
# If self-updater receives RSM update enabled, it will switch to RSM updater
# If RSM updater receives RSM update disabled, it will switch to self-update
# No change in updater if GS not updated
is_rsm_update_enabled = self._updater.is_rsm_update_enabled(agent_family, ext_gs_updated)
# Always agent uses self-update for initial update regardless vm enrolled into RSM or not
# So ignoring the check for updater switch for the initial goal state/update
if not self._is_initial_update():

if not is_rsm_update_enabled and isinstance(self._updater, RSMVersionUpdater):
msg = "VM not enabled for RSM updates, switching to self-update mode"
logger.info(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)
self._updater = SelfUpdateVersionUpdater(self._gs_id)
self._remove_rsm_update_state()
# Updater will return True or False if we need to switch the updater
# If self-updater receives RSM update enabled, it will switch to RSM updater
# If RSM updater receives RSM update disabled, it will switch to self-update
# No change in updater if GS not updated
is_rsm_update_enabled = self._updater.is_rsm_update_enabled(agent_family, ext_gs_updated)

if is_rsm_update_enabled and isinstance(self._updater, SelfUpdateVersionUpdater):
msg = "VM enabled for RSM updates, switching to RSM update mode"
logger.info(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)
self._updater = RSMVersionUpdater(self._gs_id, self._daemon_version)
self._save_rsm_update_state()
if not is_rsm_update_enabled and isinstance(self._updater, RSMVersionUpdater):
msg = "VM not enabled for RSM updates, switching to self-update mode"
logger.info(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)
self._updater = SelfUpdateVersionUpdater(self._gs_id)
self._remove_rsm_update_state_file()

if is_rsm_update_enabled and isinstance(self._updater, SelfUpdateVersionUpdater):
msg = "VM enabled for RSM updates, switching to RSM update mode"
logger.info(msg)
add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False)
self._updater = RSMVersionUpdater(self._gs_id, self._daemon_version)
self._save_rsm_update_state_file()

# If updater is changed in previous step, we allow update as it consider as first attempt. If not, it checks below condition
# RSM checks new goal state; self-update checks manifest download interval
Expand Down Expand Up @@ -218,6 +263,11 @@ def run(self, goal_state, ext_gs_updated):
add_event(op=WALAEventOperation.AgentUpgrade, is_success=False, message=error_msg, log_event=False)
self._last_attempted_update_error_msg = error_msg

# save initial update state when agent is doing first update
finally:
if self._is_initial_update():
self._save_initial_update_state_file()

def get_vmagent_update_status(self):
"""
This function gets the VMAgent update status as per the last attempted update.
Expand Down
3 changes: 2 additions & 1 deletion azurelinuxagent/pa/deprovision/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ def del_lib_dir_files(self, warnings, actions): # pylint: disable=W0613
'published_hostname',
'fast_track.json',
'initial_goal_state',
'rsm_update.json'
'waagent_rsm_update',
'waagent_initial_update'
]
known_files_glob = [
'Extensions.*.xml',
Expand Down
43 changes: 40 additions & 3 deletions tests/ga/test_agent_update_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@

from azurelinuxagent.common.protocol.util import ProtocolUtil
from azurelinuxagent.common.version import CURRENT_VERSION, AGENT_NAME
from azurelinuxagent.ga.agent_update_handler import get_agent_update_handler
from azurelinuxagent.ga.agent_update_handler import get_agent_update_handler, INITIAL_UPDATE_STATE_FILE, \
RSM_UPDATE_STATE_FILE
from azurelinuxagent.ga.guestagent import GuestAgent
from tests.ga.test_update import UpdateTestCase
from tests.lib.http_request_predicates import HttpRequestPredicates
Expand All @@ -28,7 +29,7 @@ def setUp(self):
clear_singleton_instances(ProtocolUtil)

@contextlib.contextmanager
def _get_agent_update_handler(self, test_data=None, autoupdate_frequency=0.001, autoupdate_enabled=True, protocol_get_error=False, mock_get_header=None, mock_put_header=None):
def _get_agent_update_handler(self, test_data=None, autoupdate_frequency=0.001, autoupdate_enabled=True, initial_update_attempted=True, protocol_get_error=False, mock_get_header=None, mock_put_header=None):
# Default to DATA_FILE of test_data parameter raises the pylint warning
# W0102: Dangerous default value DATA_FILE (builtins.dict) as argument (dangerous-default-value)
test_data = DATA_FILE if test_data is None else test_data
Expand Down Expand Up @@ -57,6 +58,9 @@ def put_handler(url, *args, **_):

protocol.set_http_handlers(http_get_handler=http_get_handler, http_put_handler=http_put_handler)

if initial_update_attempted:
open(os.path.join(conf.get_lib_dir(), INITIAL_UPDATE_STATE_FILE), "a").close()

with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=autoupdate_enabled):
with patch("azurelinuxagent.common.conf.get_autoupdate_frequency", return_value=autoupdate_frequency):
with patch("azurelinuxagent.common.conf.get_autoupdate_gafamily", return_value="Prod"):
Expand Down Expand Up @@ -452,7 +456,7 @@ def test_it_should_save_rsm_state_of_the_most_recent_goal_state(self):
with self.assertRaises(AgentUpgradeExitException):
agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True)

state_file = os.path.join(conf.get_lib_dir(), "rsm_update.json")
state_file = os.path.join(conf.get_lib_dir(), RSM_UPDATE_STATE_FILE)
self.assertTrue(os.path.exists(state_file), "The rsm state file was not saved (can't find {0})".format(state_file))

# check if state gets updated if most recent goal state has different values
Expand Down Expand Up @@ -535,3 +539,36 @@ def http_get_handler(uri, *_, **__):
self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if
"Downloaded agent package: WALinuxAgent-9.9.9.10 is missing agent handler manifest file" in kwarg['message'] and kwarg[
'op'] == WALAEventOperation.AgentUpgrade]), "Agent update should fail")

def test_it_should_use_self_update_for_first_update_always(self):
self.prepare_agents(count=1)

# mock the goal state as vm enrolled into RSM
data_file = DATA_FILE.copy()
data_file['ext_conf'] = "wire/ext_conf_rsm_version.xml"
with self._get_agent_update_handler(test_data=data_file, initial_update_attempted=False) as (agent_update_handler, mock_telemetry):
with self.assertRaises(AgentUpgradeExitException) as context:
agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True)
# Verifying agent used self-update for initial update
self._assert_update_discovered_from_agent_manifest(mock_telemetry, version="99999.0.0.0")
self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), "99999.0.0.0"])
self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason))

state_file = os.path.join(conf.get_lib_dir(), INITIAL_UPDATE_STATE_FILE)
self.assertTrue(os.path.exists(state_file),
"The first update state file was not saved (can't find {0})".format(state_file))

def test_it_should_honor_any_update_type_after_first_update(self):
self.prepare_agents(count=1)

data_file = DATA_FILE.copy()
data_file['ext_conf'] = "wire/ext_conf_rsm_version.xml"
# mocking initial update attempt as true
with self._get_agent_update_handler(test_data=data_file, initial_update_attempted=True) as (agent_update_handler, mock_telemetry):
with self.assertRaises(AgentUpgradeExitException) as context:
agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True)

# Verifying agent honored RSM update
self._assert_agent_rsm_version_in_goal_state(mock_telemetry, version="9.9.9.10")
self._assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10", str(CURRENT_VERSION)])
self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason))
10 changes: 9 additions & 1 deletion tests/ga/test_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

from datetime import datetime, timedelta
from threading import current_thread

from azurelinuxagent.ga.agent_update_handler import INITIAL_UPDATE_STATE_FILE
from azurelinuxagent.ga.guestagent import GuestAgent, GuestAgentError, \
AGENT_ERROR_FILE
from tests.common.osutil.test_default import TestOSUtil
Expand Down Expand Up @@ -1282,6 +1284,9 @@ def update_goal_state_and_run_handler(autoupdate_enabled=True):

protocol.set_http_handlers(http_get_handler=get_handler, http_put_handler=put_handler)

# mocking first agent update attempted
open(os.path.join(conf.get_lib_dir(), INITIAL_UPDATE_STATE_FILE), "a").close()

# Case 1: rsm version missing in GS when vm opt-in for rsm upgrades; report missing rsm version error
protocol.mock_wire_data.set_extension_config("wire/ext_conf_version_missing_in_agent_family.xml")
update_goal_state_and_run_handler()
Expand Down Expand Up @@ -1481,7 +1486,10 @@ def create_conf_mocks(self, autoupdate_frequency, hotfix_frequency, normal_frequ

@contextlib.contextmanager
def __get_update_handler(self, iterations=1, test_data=None,
reload_conf=None, autoupdate_frequency=0.001, hotfix_frequency=1.0, normal_frequency=2.0):
reload_conf=None, autoupdate_frequency=0.001, hotfix_frequency=1.0, normal_frequency=2.0, initial_update_attempted=True):

if initial_update_attempted:
open(os.path.join(conf.get_lib_dir(), INITIAL_UPDATE_STATE_FILE), "a").close()

test_data = DATA_FILE if test_data is None else test_data
# In _get_update_handler() contextmanager, yield is used inside an if-else block and that's creating a false positive pylint warning
Expand Down
Loading
Loading