Log collector should not fetch full goal state with extensions #2713

maddieford · 2022-12-08T02:53:43Z

Description

The Log Collector should not fetch the full goal state with extensions and certificates. This can cause a race condition between the log collector and extension handler when downloading certificates.

This PR adds a GoalStateProperties enum which defines the properties that can be fetched in the goal state and updates fetch_full_wire_server_goal_state to take a list of properties from the goal state to populate. This supports fetching different combinations of propreties in the goal state.

Issue #

PR information

The title of the PR is clear and informative.
There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
If applicable, the PR references the bug/issue that it fixes in the description.
New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

I have read the contribution guidelines.

…roperties

codecov · 2022-12-12T21:55:53Z

Codecov Report

Merging #2713 (3926b35) into develop (1fe3de8) will increase coverage by 0.06%.
The diff coverage is 88.23%.

@@             Coverage Diff             @@
##           develop    #2713      +/-   ##
===========================================
+ Coverage    71.96%   72.02%   +0.06%     
===========================================
  Files          104      104              
  Lines        15765    15807      +42     
  Branches      2244     2259      +15     
===========================================
+ Hits         11345    11385      +40     
+ Misses        3909     3905       -4     
- Partials       511      517       +6

Impacted Files	Coverage Δ
azurelinuxagent/common/logcollector.py	`88.35% <33.33%> (+0.04%)`	⬆️
azurelinuxagent/daemon/main.py	`71.42% <33.33%> (+0.29%)`	⬆️
azurelinuxagent/common/protocol/wire.py	`77.37% <64.70%> (-0.88%)`	⬇️
azurelinuxagent/common/protocol/goal_state.py	`95.59% <97.29%> (+0.83%)`	⬆️
azurelinuxagent/common/protocol/util.py	`80.32% <100.00%> (ø)`
azurelinuxagent/ga/update.py	`90.70% <100.00%> (ø)`
azurelinuxagent/common/cgroupconfigurator.py	`71.93% <0.00%> (-0.32%)`	⬇️
...inuxagent/common/protocol/extensions_goal_state.py	`77.30% <0.00%> (+2.83%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

nagworld9 · 2022-12-13T21:34:12Z

azurelinuxagent/common/protocol/goal_state.py

+    RoleConfig = 1
+    HostingEnv = 2
+    SharedConfig = 4
+    ExtensionsConfig_Certs = 8


Technically the extensionconfig and certs are two separate properties even though both needs to be fetched and parsed inherently for extension run. So, separating those makes things clear

Updated to separate the two

nagworld9 · 2022-12-13T21:37:20Z

azurelinuxagent/common/logcollector.py

@@ -117,8 +118,11 @@ def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path):

    @staticmethod
    def _initialize_telemetry():
-        protocol = get_protocol_util().get_protocol()
-        protocol.client.update_goal_state(force_update=True)
+        goalstate_properties = GoalStateProperties.RoleConfig | GoalStateProperties.HostingEnv


On the naked eye, it looks like either of those but logically it's all of them :)

nagworld9 · 2022-12-13T21:40:20Z

azurelinuxagent/common/protocol/goal_state.py

@@ -343,7 +359,7 @@ def _fetch_vm_settings(wire_client, force_update=False):

        return vm_settings, vm_settings_updated

-    def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc):
+    def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc, goalstate_properties=GoalStateProperties.default_properties()):


This method looks like internal to Goalstate and only called in _update. So I don't see the need for default there as parameter needs to come from parent methods.

I see samething few other places too.

Removed goal_state_properties parameter for this method and created private member for the goal state class

nagworld9 · 2022-12-13T21:41:52Z

azurelinuxagent/common/protocol/util.py

@@ -188,7 +189,7 @@ def _clear_wireserver_endpoint(self):
                return
            logger.error("Failed to clear wiresever endpoint: {0}", e)

-    def _detect_protocol(self):
+    def _detect_protocol(self, goalstate_properties=GoalStateProperties.default_properties()):


Same thing on default paramter

Removed goal_state_properties from detect

narrieta · 2022-12-14T18:13:55Z

azurelinuxagent/common/protocol/goal_state.py

+    RemoteAccessInfo = 16
+
+    @staticmethod
+    def default_properties():


This is OK, though "default" in the name doesn't quite fit.

An alternative is to add an "All" item to the enum.

Updated to add All item

narrieta · 2022-12-14T18:16:20Z

azurelinuxagent/common/protocol/goal_state.py

+    HostingEnv = 2
+    SharedConfig = 4
+    ExtensionsConfig_Certs = 8
+    RemoteAccessInfo = 16


Minor: using hex instead of decimal can help visualize exactly which bit is being used for the value.

narrieta · 2022-12-14T18:27:08Z

azurelinuxagent/common/protocol/goal_state.py

@@ -160,20 +176,20 @@ def update_host_plugin_headers(wire_client):
        # Fetching the goal state updates the HostGAPlugin so simply trigger the request
        GoalState._fetch_goal_state(wire_client)

-    def update(self, silent=False):
+    def update(self, goalstate_properties=GoalStateProperties.default_properties(), silent=False):


The update method should probably not take the properties as parameter, since it opens the possibility of passing a value different to the one passed to init() and that would lead to an inconsistent goal state.

This has been removed

narrieta · 2022-12-14T18:32:47Z

azurelinuxagent/common/protocol/wire.py

@@ -72,7 +73,7 @@ def __init__(self, endpoint):
            raise ProtocolError("WireProtocol endpoint is None")
        self.client = WireClient(endpoint)

-    def detect(self):
+    def detect(self, goalstate_properties=GoalStateProperties.default_properties()):


No need to expose the properties in this method

This has been removed

…ember

maddieford · 2022-12-15T03:40:35Z

azurelinuxagent/ga/update.py

@@ -485,7 +485,7 @@ def _try_update_goal_state(self, protocol):
        try:
            max_errors_to_log = 3

-            protocol.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
+            protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)


@narrieta On line 430 in update.py we discussed changing
self._try_update_goal_state(protocol)
to
self._reset_goal_state()

It looks like we sleep until try_update_goal_state returns True for success.. Should we implement similar logic for reset_goal_state?

good point. i added a comment on _try_update_goal_state

nagworld9 · 2022-12-15T20:03:33Z

azurelinuxagent/common/logcollector.py

@@ -118,7 +119,7 @@ def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path):
    @staticmethod
    def _initialize_telemetry():
        protocol = get_protocol_util().get_protocol()


Looks like get_protocal() still initializing goal state with all properties?

+1 - we probably need to add a flag to get_protocol to not initialize the shared goal state (then the call to reset in the next line would initialize it)

Added flag to get_protocol()

nagworld9 · 2022-12-15T20:17:02Z

azurelinuxagent/common/protocol/wire.py

@@ -778,18 +776,30 @@ def update_host_plugin(self, container_id, role_config_name):
            self._host_plugin.update_container_id(container_id)
            self._host_plugin.update_role_config_name(role_config_name)

-    def update_goal_state(self, force_update=False, silent=False):
+    def update_goal_state(self, silent=False):


I don't quite get the update_goal_state vs reset_goal_state. When do we use each of these?

We're removing the force_update parameter and using reset_goal_state in all cases where force_update was True.

Reset will initialize the goal state instead of updating

narrieta

A few comments on goal_state.py, I'm reviewing the usages of the goal state next

narrieta · 2022-12-20T16:18:01Z

azurelinuxagent/common/protocol/goal_state.py

@@ -99,35 +113,59 @@ def incarnation(self):

    @property
    def container_id(self):
-        return self._container_id
+        if not self._goal_state_properties & GoalStateProperties.RoleConfig:
+            raise ProtocolError("RoleConfig is not in goal state properties")


container id

narrieta · 2022-12-20T16:18:16Z

azurelinuxagent/common/protocol/goal_state.py


    @property
    def role_instance_id(self):
-        return self._role_instance_id
+        if not self._goal_state_properties & GoalStateProperties.RoleConfig:
+            raise ProtocolError("RoleConfig is not in goal state properties")


role instance id

narrieta · 2022-12-20T16:20:38Z

azurelinuxagent/common/protocol/goal_state.py


    @property
    def extensions_goal_state(self):
-        return self._extensions_goal_state
+        if not self._goal_state_properties & GoalStateProperties.ExtensionsConfig:


Ah, i did not notice this before. the extensions goal state may come from vmSettings or ExtensionsConfig. The latter is for fabric and the former for fast track.

Let's rename GoalStateProperties.ExtensionsConfig to GoalStateProperties.ExtensionsGoalState

Also, in _update(), we need to retrieve the vmSettings only when requesting GoalStateProperties.ExtensionsGoalState

Updated to add this check

narrieta · 2022-12-20T16:41:33Z

azurelinuxagent/common/protocol/wire.py

        """
        try:
-            if force_update and not silent:
+            if not silent:


we can remove this message since the force flag was removed

narrieta · 2022-12-20T17:22:22Z

azurelinuxagent/common/protocol/wire.py

-                self._goal_state = GoalState(self, silent=silent)
-            else:
-                self._goal_state.update(silent=silent)
+            self._goal_state.update(silent=silent)


I think we need to keep a call to reset here (minus the check for the force flag). This is needed by the initialization of the goal state on line 430 that you pointed out

if self._goal_state is None: self._goal_state = GoalState(self, silent=silent) else: self._goal_state.update(silent=silent)

I added the None check back

narrieta · 2022-12-20T17:23:00Z

azurelinuxagent/ga/update.py

@@ -485,7 +485,7 @@ def _try_update_goal_state(self, protocol):
        try:
            max_errors_to_log = 3

-            protocol.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)
+            protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log)


good point. i added a comment on _try_update_goal_state

maddieford added 13 commits November 8, 2022 15:09

Update version to dummy 1.0.0.0'

da72c37

Revert version change

59dbd22

Merge remote-tracking branch 'upstream/develop' into develop

633a826

Merge remote-tracking branch 'upstream/develop' into develop

14a743f

Update goalstate to take list of properties to process

55e261a

Update protocol to not process extensions in goal state update

908985e

Update logcollector to not process extensions when updating goal state

8fc6471

Remove comments

3d370d0

Remove import enum

d5bb016

Update parameter name to goalstate_properties

2633959

Add default value for goalstate_properties

b8d98a5

Use integers and bitwise operations for goal state properties

e578129

Initialize protocol for logcollector with only necessary goal state p…

c642900

…roperties

maddieford marked this pull request as ready for review December 12, 2022 22:36

maddieford requested review from narrieta, ZhidongPeng and nagworld9 as code owners December 12, 2022 22:36

nagworld9 reviewed Dec 13, 2022

View reviewed changes

narrieta reviewed Dec 14, 2022

View reviewed changes

maddieford added 7 commits December 14, 2022 18:13

Separate update into reset and made goal state properties a private m…

041dee3

…ember

Remove goal_state_properties param from _detect_protocol

cbfb6ad

Update tests to remove force_update

05f4b79

Remove unused import

dc2fe69

Remove update_goal_State from wire protocol

a8c5a7f

Remove update_goal_State from wire protocol

011fbef

Separate extensionsconfig and certificates property

b833735

maddieford commented Dec 15, 2022

View reviewed changes

nagworld9 reviewed Dec 15, 2022

View reviewed changes

narrieta reviewed Dec 20, 2022

View reviewed changes

maddieford added 8 commits December 20, 2022 14:00

Address PR comments

d762a03

Add flag to determine if goal state should be init

11f9c3a

Add test case for goal_state_properties

212a72b

Correct pylint errors

7805dc1

Add reset_goal_state test with goal_state_properties

b1d21ea

Add reset test cases

c61a067

Remove Certificates property in GoalState

dd9d8da

Revert certs change

7711ae2

narrieta approved these changes Jan 5, 2023

View reviewed changes

nagworld9 approved these changes Jan 6, 2023

View reviewed changes

Merge branch 'develop' into update_protocol_init

3926b35

maddieford merged commit e9f495d into Azure:develop Jan 9, 2023

maddieford deleted the update_protocol_init branch April 7, 2023 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log collector should not fetch full goal state with extensions #2713

Log collector should not fetch full goal state with extensions #2713

maddieford commented Dec 8, 2022

codecov bot commented Dec 12, 2022 •

edited

Loading

nagworld9 Dec 13, 2022

maddieford Dec 15, 2022

nagworld9 Dec 13, 2022

nagworld9 Dec 13, 2022

maddieford Dec 15, 2022

nagworld9 Dec 13, 2022

maddieford Dec 15, 2022

narrieta Dec 14, 2022

maddieford Dec 15, 2022

narrieta Dec 14, 2022

maddieford Dec 15, 2022

narrieta Dec 14, 2022

maddieford Dec 15, 2022

narrieta Dec 14, 2022

maddieford Dec 15, 2022

maddieford Dec 15, 2022

narrieta Dec 20, 2022

nagworld9 Dec 15, 2022

narrieta Dec 20, 2022

maddieford Dec 21, 2022

nagworld9 Dec 15, 2022

maddieford Dec 20, 2022

narrieta left a comment

narrieta Dec 20, 2022

narrieta Dec 20, 2022

narrieta Dec 20, 2022

narrieta Dec 20, 2022

maddieford Dec 21, 2022

narrieta Dec 20, 2022

narrieta Dec 20, 2022

maddieford Dec 21, 2022

narrieta Dec 20, 2022

Log collector should not fetch full goal state with extensions #2713

Log collector should not fetch full goal state with extensions #2713

Conversation

maddieford commented Dec 8, 2022

Description

PR information

Quality of Code and Contribution Guidelines

codecov bot commented Dec 12, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narrieta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 12, 2022 •

edited

Loading