Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log collector should not fetch full goal state with extensions #2713

Merged
merged 29 commits into from
Jan 9, 2023

Conversation

maddieford
Copy link
Contributor

Description

The Log Collector should not fetch the full goal state with extensions and certificates. This can cause a race condition between the log collector and extension handler when downloading certificates.

This PR adds a GoalStateProperties enum which defines the properties that can be fetched in the goal state and updates fetch_full_wire_server_goal_state to take a list of properties from the goal state to populate. This supports fetching different combinations of propreties in the goal state.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Merging #2713 (3926b35) into develop (1fe3de8) will increase coverage by 0.06%.
The diff coverage is 88.23%.

@@             Coverage Diff             @@
##           develop    #2713      +/-   ##
===========================================
+ Coverage    71.96%   72.02%   +0.06%     
===========================================
  Files          104      104              
  Lines        15765    15807      +42     
  Branches      2244     2259      +15     
===========================================
+ Hits         11345    11385      +40     
+ Misses        3909     3905       -4     
- Partials       511      517       +6     
Impacted Files Coverage Δ
azurelinuxagent/common/logcollector.py 88.35% <33.33%> (+0.04%) ⬆️
azurelinuxagent/daemon/main.py 71.42% <33.33%> (+0.29%) ⬆️
azurelinuxagent/common/protocol/wire.py 77.37% <64.70%> (-0.88%) ⬇️
azurelinuxagent/common/protocol/goal_state.py 95.59% <97.29%> (+0.83%) ⬆️
azurelinuxagent/common/protocol/util.py 80.32% <100.00%> (ø)
azurelinuxagent/ga/update.py 90.70% <100.00%> (ø)
azurelinuxagent/common/cgroupconfigurator.py 71.93% <0.00%> (-0.32%) ⬇️
...inuxagent/common/protocol/extensions_goal_state.py 77.30% <0.00%> (+2.83%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@maddieford maddieford marked this pull request as ready for review December 12, 2022 22:36
RoleConfig = 1
HostingEnv = 2
SharedConfig = 4
ExtensionsConfig_Certs = 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the extensionconfig and certs are two separate properties even though both needs to be fetched and parsed inherently for extension run. So, separating those makes things clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to separate the two

@@ -117,8 +118,11 @@ def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path):

@staticmethod
def _initialize_telemetry():
protocol = get_protocol_util().get_protocol()
protocol.client.update_goal_state(force_update=True)
goalstate_properties = GoalStateProperties.RoleConfig | GoalStateProperties.HostingEnv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the naked eye, it looks like either of those but logically it's all of them :)

@@ -343,7 +359,7 @@ def _fetch_vm_settings(wire_client, force_update=False):

return vm_settings, vm_settings_updated

def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc):
def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc, goalstate_properties=GoalStateProperties.default_properties()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks like internal to Goalstate and only called in _update. So I don't see the need for default there as parameter needs to come from parent methods.

I see samething few other places too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed goal_state_properties parameter for this method and created private member for the goal state class

@@ -188,7 +189,7 @@ def _clear_wireserver_endpoint(self):
return
logger.error("Failed to clear wiresever endpoint: {0}", e)

def _detect_protocol(self):
def _detect_protocol(self, goalstate_properties=GoalStateProperties.default_properties()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing on default paramter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed goal_state_properties from detect

RemoteAccessInfo = 16

@staticmethod
def default_properties():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, though "default" in the name doesn't quite fit.

An alternative is to add an "All" item to the enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add All item

HostingEnv = 2
SharedConfig = 4
ExtensionsConfig_Certs = 8
RemoteAccessInfo = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: using hex instead of decimal can help visualize exactly which bit is being used for the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -160,20 +176,20 @@ def update_host_plugin_headers(wire_client):
# Fetching the goal state updates the HostGAPlugin so simply trigger the request
GoalState._fetch_goal_state(wire_client)

def update(self, silent=False):
def update(self, goalstate_properties=GoalStateProperties.default_properties(), silent=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update method should probably not take the properties as parameter, since it opens the possibility of passing a value different to the one passed to init() and that would lead to an inconsistent goal state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed

@@ -72,7 +73,7 @@ def __init__(self, endpoint):
raise ProtocolError("WireProtocol endpoint is None")
self.client = WireClient(endpoint)

def detect(self):
def detect(self, goalstate_properties=GoalStateProperties.default_properties()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to expose the properties in this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been removed

@@ -485,7 +485,7 @@ def _try_update_goal_state(self, protocol):
try:
max_errors_to_log = 3

protocol.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narrieta On line 430 in update.py we discussed changing
self._try_update_goal_state(protocol)
to
self._reset_goal_state()

It looks like we sleep until try_update_goal_state returns True for success.. Should we implement similar logic for reset_goal_state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i added a comment on _try_update_goal_state

@@ -118,7 +119,7 @@ def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path):
@staticmethod
def _initialize_telemetry():
protocol = get_protocol_util().get_protocol()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like get_protocal() still initializing goal state with all properties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - we probably need to add a flag to get_protocol to not initialize the shared goal state (then the call to reset in the next line would initialize it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added flag to get_protocol()

@@ -778,18 +776,30 @@ def update_host_plugin(self, container_id, role_config_name):
self._host_plugin.update_container_id(container_id)
self._host_plugin.update_role_config_name(role_config_name)

def update_goal_state(self, force_update=False, silent=False):
def update_goal_state(self, silent=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get the update_goal_state vs reset_goal_state. When do we use each of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're removing the force_update parameter and using reset_goal_state in all cases where force_update was True.

Reset will initialize the goal state instead of updating

Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on goal_state.py, I'm reviewing the usages of the goal state next

@@ -99,35 +113,59 @@ def incarnation(self):

@property
def container_id(self):
return self._container_id
if not self._goal_state_properties & GoalStateProperties.RoleConfig:
raise ProtocolError("RoleConfig is not in goal state properties")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container id


@property
def role_instance_id(self):
return self._role_instance_id
if not self._goal_state_properties & GoalStateProperties.RoleConfig:
raise ProtocolError("RoleConfig is not in goal state properties")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role instance id


@property
def extensions_goal_state(self):
return self._extensions_goal_state
if not self._goal_state_properties & GoalStateProperties.ExtensionsConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i did not notice this before. the extensions goal state may come from vmSettings or ExtensionsConfig. The latter is for fabric and the former for fast track.

Let's rename GoalStateProperties.ExtensionsConfig to GoalStateProperties.ExtensionsGoalState

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in _update(), we need to retrieve the vmSettings only when requesting GoalStateProperties.ExtensionsGoalState

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add this check

"""
try:
if force_update and not silent:
if not silent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this message since the force flag was removed

self._goal_state = GoalState(self, silent=silent)
else:
self._goal_state.update(silent=silent)
self._goal_state.update(silent=silent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep a call to reset here (minus the check for the force flag). This is needed by the initialization of the goal state on line 430 that you pointed out

            if self._goal_state is None:
                self._goal_state = GoalState(self, silent=silent)
            else:
                self._goal_state.update(silent=silent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the None check back

@@ -485,7 +485,7 @@ def _try_update_goal_state(self, protocol):
try:
max_errors_to_log = 3

protocol.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. i added a comment on _try_update_goal_state

@maddieford maddieford merged commit e9f495d into Azure:develop Jan 9, 2023
@maddieford maddieford deleted the update_protocol_init branch April 7, 2023 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants