Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log collector should not fetch full goal state with extensions #2713

Merged
merged 29 commits into from
Jan 9, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
da72c37
Update version to dummy 1.0.0.0'
maddieford Nov 8, 2022
59dbd22
Revert version change
maddieford Nov 8, 2022
633a826
Merge remote-tracking branch 'upstream/develop' into develop
maddieford Nov 21, 2022
14a743f
Merge remote-tracking branch 'upstream/develop' into develop
maddieford Dec 8, 2022
55e261a
Update goalstate to take list of properties to process
maddieford Dec 8, 2022
908985e
Update protocol to not process extensions in goal state update
maddieford Dec 8, 2022
8fc6471
Update logcollector to not process extensions when updating goal state
maddieford Dec 8, 2022
3d370d0
Remove comments
maddieford Dec 8, 2022
d5bb016
Remove import enum
maddieford Dec 8, 2022
2633959
Update parameter name to goalstate_properties
maddieford Dec 8, 2022
b8d98a5
Add default value for goalstate_properties
maddieford Dec 8, 2022
e578129
Use integers and bitwise operations for goal state properties
maddieford Dec 9, 2022
c642900
Initialize protocol for logcollector with only necessary goal state p…
maddieford Dec 12, 2022
041dee3
Separate update into reset and made goal state properties a private m…
maddieford Dec 15, 2022
cbfb6ad
Remove goal_state_properties param from _detect_protocol
maddieford Dec 15, 2022
05f4b79
Update tests to remove force_update
maddieford Dec 15, 2022
dc2fe69
Remove unused import
maddieford Dec 15, 2022
a8c5a7f
Remove update_goal_State from wire protocol
maddieford Dec 15, 2022
011fbef
Remove update_goal_State from wire protocol
maddieford Dec 15, 2022
b833735
Separate extensionsconfig and certificates property
maddieford Dec 15, 2022
d762a03
Address PR comments
maddieford Dec 20, 2022
11f9c3a
Add flag to determine if goal state should be init
maddieford Dec 20, 2022
212a72b
Add test case for goal_state_properties
maddieford Dec 21, 2022
7805dc1
Correct pylint errors
maddieford Dec 21, 2022
b1d21ea
Add reset_goal_state test with goal_state_properties
maddieford Dec 21, 2022
c61a067
Add reset test cases
maddieford Dec 21, 2022
dd9d8da
Remove Certificates property in GoalState
maddieford Dec 22, 2022
7711ae2
Revert certs change
maddieford Dec 22, 2022
3926b35
Merge branch 'develop' into update_protocol_init
maddieford Jan 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion azurelinuxagent/common/logcollector.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

# Please note: be careful when adding agent dependencies in this module.
# This module uses its own logger and logs to its own file, not to the agent log.
from azurelinuxagent.common.protocol.goal_state import GoalStateProperties
from azurelinuxagent.common.protocol.util import get_protocol_util

_EXTENSION_LOG_DIR = get_ext_log_dir()
Expand Down Expand Up @@ -118,7 +119,7 @@ def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path):
@staticmethod
def _initialize_telemetry():
protocol = get_protocol_util().get_protocol()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like get_protocal() still initializing goal state with all properties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - we probably need to add a flag to get_protocol to not initialize the shared goal state (then the call to reset in the next line would initialize it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added flag to get_protocol()

protocol.client.update_goal_state(force_update=True)
protocol.client.reset_goal_state(goalstate_properties=GoalStateProperties.RoleConfig | GoalStateProperties.HostingEnv)
# Initialize the common parameters for telemetry events
initialize_event_logger_vminfo_common_parameters(protocol)

Expand Down
121 changes: 84 additions & 37 deletions azurelinuxagent/common/protocol/goal_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,19 @@
_GET_GOAL_STATE_MAX_ATTEMPTS = 6


class GoalStateProperties(object):
"""
Enum for defining the properties that we fetch in the goal state
"""
RoleConfig = 0x1
HostingEnv = 0x2
SharedConfig = 0x4
ExtensionsConfig = 0x8
Certificates = 0x10
RemoteAccessInfo = 0x20
All = RoleConfig | HostingEnv | SharedConfig | ExtensionsConfig | Certificates | RemoteAccessInfo


class GoalStateInconsistentError(ProtocolError):
"""
Indicates an inconsistency in the goal state (e.g. missing tenant certificate)
Expand All @@ -57,7 +70,7 @@ def __init__(self, msg, inner=None):


class GoalState(object):
def __init__(self, wire_client, silent=False):
def __init__(self, wire_client, goal_state_properties=GoalStateProperties.All, silent=False):
"""
Fetches the goal state using the given wire client.

Expand All @@ -72,6 +85,7 @@ def __init__(self, wire_client, silent=False):
self._wire_client = wire_client
self._history = None
self._extensions_goal_state = None # populated from vmSettings or extensionsConfig
self._goal_state_properties = goal_state_properties
self.logger = logger.Logger(logger.DEFAULT_LOGGER)
self.logger.silent = silent

Expand Down Expand Up @@ -99,35 +113,59 @@ def incarnation(self):

@property
def container_id(self):
return self._container_id
if not self._goal_state_properties & GoalStateProperties.RoleConfig:
raise ProtocolError("RoleConfig is not in goal state properties")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container id

else:
return self._container_id

@property
def role_instance_id(self):
return self._role_instance_id
if not self._goal_state_properties & GoalStateProperties.RoleConfig:
raise ProtocolError("RoleConfig is not in goal state properties")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role instance id

else:
return self._role_instance_id

@property
def role_config_name(self):
return self._role_config_name
if not self._goal_state_properties & GoalStateProperties.RoleConfig:
raise ProtocolError("RoleConfig is not in goal state properties")
else:
return self._role_config_name

@property
def extensions_goal_state(self):
return self._extensions_goal_state
if not self._goal_state_properties & GoalStateProperties.ExtensionsConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i did not notice this before. the extensions goal state may come from vmSettings or ExtensionsConfig. The latter is for fabric and the former for fast track.

Let's rename GoalStateProperties.ExtensionsConfig to GoalStateProperties.ExtensionsGoalState

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in _update(), we need to retrieve the vmSettings only when requesting GoalStateProperties.ExtensionsGoalState

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add this check

raise ProtocolError("ExtensionsConfig is not in goal state properties")
else:
return self._extensions_goal_state

@property
def certs(self):
return self._certs
if not self._goal_state_properties & GoalStateProperties.Certificates:
raise ProtocolError("Certificates is not in goal state properties")
else:
return self._certs

@property
def hosting_env(self):
return self._hosting_env
if not self._goal_state_properties & GoalStateProperties.HostingEnv:
raise ProtocolError("HostingEnvironment is not in goal state properties")
else:
return self._hosting_env

@property
def shared_conf(self):
return self._shared_conf
if not self._goal_state_properties & GoalStateProperties.SharedConfig:
raise ProtocolError("SharedConfig is not in goal state properties")
else:
return self._shared_conf

@property
def remote_access(self):
return self._remote_access
if not self._goal_state_properties & GoalStateProperties.RemoteAccessInfo:
raise ProtocolError("RemoteAccessInfo is not in goal state properties")
else:
return self._remote_access

def fetch_agent_manifest(self, family_name, uris):
"""
Expand Down Expand Up @@ -356,40 +394,48 @@ def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc):
self.logger.info(message)
add_event(op=WALAEventOperation.GoalState, message=message)

role_instance = find(xml_doc, "RoleInstance")
role_instance_id = findtext(role_instance, "InstanceId")
role_config = find(role_instance, "Configuration")
role_config_name = findtext(role_config, "ConfigName")
container = find(xml_doc, "Container")
container_id = findtext(container, "ContainerId")
role_instance_id = None
role_config_name = None
container_id = None
if GoalStateProperties.RoleConfig & self._goal_state_properties:
role_instance = find(xml_doc, "RoleInstance")
role_instance_id = findtext(role_instance, "InstanceId")
role_config = find(role_instance, "Configuration")
role_config_name = findtext(role_config, "ConfigName")
container = find(xml_doc, "Container")
container_id = findtext(container, "ContainerId")

extensions_config_uri = findtext(xml_doc, "ExtensionsConfig")
if extensions_config_uri is None:
if not (GoalStateProperties.ExtensionsConfig & self._goal_state_properties) or extensions_config_uri is None:
extensions_config = ExtensionsGoalStateFactory.create_empty(incarnation)
else:
xml_text = self._wire_client.fetch_config(extensions_config_uri, self._wire_client.get_header())
extensions_config = ExtensionsGoalStateFactory.create_from_extensions_config(incarnation, xml_text, self._wire_client)
self._history.save_extensions_config(extensions_config.get_redacted_text())

hosting_env_uri = findtext(xml_doc, "HostingEnvironmentConfig")
xml_text = self._wire_client.fetch_config(hosting_env_uri, self._wire_client.get_header())
hosting_env = HostingEnv(xml_text)
self._history.save_hosting_env(xml_text)

shared_conf_uri = findtext(xml_doc, "SharedConfig")
xml_text = self._wire_client.fetch_config(shared_conf_uri, self._wire_client.get_header())
shared_config = SharedConfig(xml_text)
self._history.save_shared_conf(xml_text)
# SharedConfig.xml is used by other components (Azsec and Singularity/HPC Infiniband), so save it to the agent's root directory as well
shared_config_file = os.path.join(conf.get_lib_dir(), SHARED_CONF_FILE_NAME)
try:
fileutil.write_file(shared_config_file, xml_text)
except Exception as e:
logger.warn("Failed to save {0}: {1}".format(shared_config, e))
hosting_env = None
if GoalStateProperties.HostingEnv & self._goal_state_properties:
hosting_env_uri = findtext(xml_doc, "HostingEnvironmentConfig")
xml_text = self._wire_client.fetch_config(hosting_env_uri, self._wire_client.get_header())
hosting_env = HostingEnv(xml_text)
self._history.save_hosting_env(xml_text)

shared_config = None
if GoalStateProperties.SharedConfig & self._goal_state_properties:
shared_conf_uri = findtext(xml_doc, "SharedConfig")
xml_text = self._wire_client.fetch_config(shared_conf_uri, self._wire_client.get_header())
shared_config = SharedConfig(xml_text)
self._history.save_shared_conf(xml_text)
# SharedConfig.xml is used by other components (Azsec and Singularity/HPC Infiniband), so save it to the agent's root directory as well
shared_config_file = os.path.join(conf.get_lib_dir(), SHARED_CONF_FILE_NAME)
try:
fileutil.write_file(shared_config_file, xml_text)
except Exception as e:
logger.warn("Failed to save {0}: {1}".format(shared_config, e))

certs = EmptyCertificates()
certs_uri = findtext(xml_doc, "Certificates")
if certs_uri is not None:
if (GoalStateProperties.Certificates & self._goal_state_properties) and certs_uri is not None:
xml_text = self._wire_client.fetch_config(certs_uri, self._wire_client.get_header_for_cert())
certs = Certificates(xml_text, self.logger)
# Log and save the certificates summary (i.e. the thumbprint but not the certificate itself) to the goal state history
Expand All @@ -403,11 +449,12 @@ def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc):
self._history.save_certificates(json.dumps(certs.summary))

remote_access = None
remote_access_uri = findtext(container, "RemoteAccessInfo")
if remote_access_uri is not None:
xml_text = self._wire_client.fetch_config(remote_access_uri, self._wire_client.get_header_for_cert())
remote_access = RemoteAccess(xml_text)
self._history.save_remote_access(xml_text)
if GoalStateProperties.RemoteAccessInfo & self._goal_state_properties:
remote_access_uri = findtext(container, "RemoteAccessInfo")
if remote_access_uri is not None:
xml_text = self._wire_client.fetch_config(remote_access_uri, self._wire_client.get_header_for_cert())
remote_access = RemoteAccess(xml_text)
self._history.save_remote_access(xml_text)

self._incarnation = incarnation
self._role_instance_id = role_instance_id
Expand Down
36 changes: 23 additions & 13 deletions azurelinuxagent/common/protocol/wire.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@
from azurelinuxagent.common.exception import ProtocolNotFoundError, \
ResourceGoneError, ExtensionDownloadError, InvalidContainerError, ProtocolError, HttpError, ExtensionErrorCodes
from azurelinuxagent.common.future import httpclient, bytebuffer, ustr
from azurelinuxagent.common.protocol.goal_state import GoalState, TRANSPORT_CERT_FILE_NAME, TRANSPORT_PRV_FILE_NAME
from azurelinuxagent.common.protocol.goal_state import GoalState, TRANSPORT_CERT_FILE_NAME, TRANSPORT_PRV_FILE_NAME, \
GoalStateProperties
from azurelinuxagent.common.protocol.hostplugin import HostPluginProtocol
from azurelinuxagent.common.protocol.restapi import DataContract, ProvisionStatus, VMInfo, VMStatus
from azurelinuxagent.common.telemetryevent import GuestAgentExtensionEventsSchema
Expand Down Expand Up @@ -84,10 +85,7 @@ def detect(self):

# Initialize the goal state, including all the inner properties
logger.info('Initializing goal state during protocol detection')
self.client.update_goal_state(force_update=True)

def update_goal_state(self, silent=False):
self.client.update_goal_state(silent=silent)
self.client.reset_goal_state()

def update_host_plugin_from_goal_state(self):
self.client.update_host_plugin_from_goal_state()
Expand Down Expand Up @@ -778,18 +776,30 @@ def update_host_plugin(self, container_id, role_config_name):
self._host_plugin.update_container_id(container_id)
self._host_plugin.update_role_config_name(role_config_name)

def update_goal_state(self, force_update=False, silent=False):
def update_goal_state(self, silent=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the update_goal_state vs reset_goal_state. When do we use each of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're removing the force_update parameter and using reset_goal_state in all cases where force_update was True.

Reset will initialize the goal state instead of updating

"""
Updates the goal state if the incarnation or etag changed or if 'force_update' is True
Updates the goal state if the incarnation or etag changed
"""
try:
if force_update and not silent:
if not silent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this message since the force flag was removed

logger.info("Forcing an update of the goal state.")

if self._goal_state is None or force_update:
self._goal_state = GoalState(self, silent=silent)
else:
self._goal_state.update(silent=silent)
self._goal_state.update(silent=silent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep a call to reset here (minus the check for the force flag). This is needed by the initialization of the goal state on line 430 that you pointed out

            if self._goal_state is None:
                self._goal_state = GoalState(self, silent=silent)
            else:
                self._goal_state.update(silent=silent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the None check back


except ProtocolError:
raise
except Exception as exception:
raise ProtocolError("Error fetching goal state: {0}".format(ustr(exception)))

def reset_goal_state(self, goal_state_properties=GoalStateProperties.All, silent=False):
"""
Resets the goal state
"""
try:
if not silent:
logger.info("Forcing an update of the goal state.")

self._goal_state = GoalState(self, goal_state_properties=goal_state_properties, silent=silent)

except ProtocolError:
raise
Expand Down Expand Up @@ -925,7 +935,7 @@ def upload_status_blob(self):

if extensions_goal_state.status_upload_blob is None:
# the status upload blob is in ExtensionsConfig so force a full goal state refresh
self.update_goal_state(force_update=True, silent=True)
self.reset_goal_state(silent=True)
extensions_goal_state = self.get_goal_state().extensions_goal_state

if extensions_goal_state.status_upload_blob is None:
Expand Down
5 changes: 3 additions & 2 deletions azurelinuxagent/daemon/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from azurelinuxagent.common.event import add_event, WALAEventOperation, initialize_event_logger_vminfo_common_parameters
from azurelinuxagent.common.future import ustr
from azurelinuxagent.common.osutil import get_osutil
from azurelinuxagent.common.protocol.goal_state import GoalState, GoalStateProperties
from azurelinuxagent.common.protocol.util import get_protocol_util
from azurelinuxagent.common.rdma import setup_rdma_device
from azurelinuxagent.common.utils import textutil
Expand Down Expand Up @@ -160,9 +161,9 @@ def daemon(self, child_args=None):
# current values.
protocol = self.protocol_util.get_protocol()

protocol.client.update_goal_state(force_update=True)
goal_state = GoalState(protocol, goal_state_properties=GoalStateProperties.SharedConfig)

setup_rdma_device(nd_version, protocol.client.get_shared_conf())
setup_rdma_device(nd_version, goal_state.shared_conf)
except Exception as e:
logger.error("Error setting up rdma device: %s" % e)
else:
Expand Down
2 changes: 1 addition & 1 deletion azurelinuxagent/ga/update.py
Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ def _try_update_goal_state(self, protocol):
try:
max_errors_to_log = 3

protocol.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narrieta On line 430 in update.py we discussed changing
self._try_update_goal_state(protocol)
to
self._reset_goal_state()

It looks like we sleep until try_update_goal_state returns True for success.. Should we implement similar logic for reset_goal_state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i added a comment on _try_update_goal_state


self._goal_state = protocol.get_goal_state()

Expand Down
4 changes: 2 additions & 2 deletions tests/common/test_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,12 +161,12 @@ def create_event_and_return_container_id(): # pylint: disable=inconsistent-retu
self.assertEqual(contained_id, 'c6d5526c-5ac2-4200-b6e2-56f2b70c5ab2', "Incorrect container ID")

protocol.mock_wire_data.set_container_id('AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE')
protocol.update_goal_state()
protocol.client.update_goal_state()
contained_id = create_event_and_return_container_id()
self.assertEqual(contained_id, 'AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE', "Incorrect container ID")

protocol.mock_wire_data.set_container_id('11111111-2222-3333-4444-555555555555')
protocol.update_goal_state()
protocol.client.update_goal_state()
contained_id = create_event_and_return_container_id()
self.assertEqual(contained_id, '11111111-2222-3333-4444-555555555555', "Incorrect container ID")

Expand Down
Loading