Reorganize the history directory #2520

narrieta · 2022-02-25T06:55:40Z

The changes I did to the history directory did not capture the goal state the way I intended to. I wanted to capture all the HTTP requests for needed to process a single goal state within the same directory. Some goal states were producing more than 1 directory, plus I was capturing the waagent_status.json in a separate directory. All of that made the history hard to read.

This PR addresses those issues. There is a single directory per goal state, be it Fabric or FastTrack, and the last status for a given goal state is captured in the directory for that goal state.

This is a sample directory, and the contents of 2 sample subdirectories (the "_*" suffix indicates the incarnation for a Fabric goal state, or the eTag for a FastTrack goal state.

root@edp90bdcp2:/var/lib/waagent# ls -l history/
total 128
-rw-r--r-- 1 root root 3777 Feb 25 05:21 2022-02-25T05:21:51.351437_1.zip
-rw-r--r-- 1 root root 4214 Feb 25 05:22 2022-02-25T05:21:52.941740_1.zip
-rw-r--r-- 1 root root 5671 Feb 25 05:23 2022-02-25T05:22:48.440602_2.zip
-rw-r--r-- 1 root root 5859 Feb 25 05:24 2022-02-25T05:23:22.879699_3.zip
-rw-r--r-- 1 root root 5676 Feb 25 05:24 2022-02-25T05:24:01.136465_4.zip
-rw-r--r-- 1 root root 5953 Feb 25 05:25 2022-02-25T05:24:35.446556_5.zip
-rw-r--r-- 1 root root 2605 Feb 25 05:25 2022-02-25T05:25:03.847029_15811112770847678291.zip
-rw-r--r-- 1 root root 2527 Feb 25 05:26 2022-02-25T05:25:44.087642_18000139041792514682.zip
-rw-r--r-- 1 root root 2407 Feb 25 05:26 2022-02-25T05:26:18.309032_4750493307514613305.zip
-rw-r--r-- 1 root root 2553 Feb 25 05:27 2022-02-25T05:26:52.771520_3125115514822901175.zip
-rw-r--r-- 1 root root 2644 Feb 25 05:28 2022-02-25T05:27:27.017290_11215055166409093376.zip
-rw-r--r-- 1 root root 3519 Feb 25 05:28 2022-02-25T05:28:03.292724_16004112586371732150.zip
-rw-r--r-- 1 root root 3628 Feb 25 05:29 2022-02-25T05:28:37.963574_4568237672134507183.zip
-rw-r--r-- 1 root root 3660 Feb 25 05:30 2022-02-25T05:29:16.265539_7975497350458587842.zip
-rw-r--r-- 1 root root 3715 Feb 25 05:30 2022-02-25T05:29:50.576691_2740615665526774226.zip
-rw-r--r-- 1 root root 3730 Feb 25 05:31 2022-02-25T05:30:24.879073_3193126257072759267.zip
-rw-r--r-- 1 root root 3598 Feb 25 05:31 2022-02-25T05:30:59.204375_6507518737698568822.zip
-rw-r--r-- 1 root root 2709 Feb 25 05:32 2022-02-25T05:31:27.550604_13599842578989748385.zip
-rw-r--r-- 1 root root 2629 Feb 25 05:32 2022-02-25T05:31:57.770524_14353420009637935747.zip
-rw-r--r-- 1 root root 2523 Feb 25 05:33 2022-02-25T05:32:31.989915_558826791117675680.zip
-rw-r--r-- 1 root root 2249 Feb 25 05:33 2022-02-25T05:33:00.218760_2856318285684983312.zip
-rw-r--r-- 1 root root 2555 Feb 25 05:34 2022-02-25T05:33:34.758289_2370205063594795044.zip
-rw-r--r-- 1 root root 2619 Feb 25 05:34 2022-02-25T05:34:15.040888_6966483418219710538.zip
-rw-r--r-- 1 root root 2522 Feb 25 05:35 2022-02-25T05:34:43.235101_8780014925603559582.zip
-rw-r--r-- 1 root root 2371 Feb 25 05:35 2022-02-25T05:35:17.449951_16011384681609892725.zip
-rw-r--r-- 1 root root 2502 Feb 25 05:57 2022-02-25T05:35:51.971207_9901895646845909273.zip
drwx------ 2 root root 4096 Feb 25 05:57 2022-02-25T05:57:13.108043_7729999217724852232

root@edp90bdcp2:/var/lib/waagent# unzip -l history/2022-02-25T05:24:35.446556_5.zip
Archive:  history/2022-02-25T05:24:35.446556_5.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     3044  2022-02-25 05:24   ExtensionsConfig.xml
     1721  2022-02-25 05:24   HostingEnvironmentConfig.xml
     2299  2022-02-25 05:24   VmSettings.json
    13020  2022-02-25 05:24   Microsoft.OSTCExtensions.VMAccessForLinux.manifest.xml
     2099  2022-02-25 05:24   GoalState.xml
      899  2022-02-25 05:24   SharedConfig.xml
     1020  2022-02-25 05:24   waagent_status.json
---------                     -------
    24102                     7 files

root@edp90bdcp2:/var/lib/waagent# unzip -l history/2022-02-25T05:25:03.847029_15811112770847678291.zip
Archive:  history/2022-02-25T05:25:03.847029_15811112770847678291.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     2173  2022-02-25 05:25   VmSettings.json
    13020  2022-02-25 05:25   Microsoft.OSTCExtensions.VMAccessForLinux.manifest.xml
     1632  2022-02-25 05:24   waagent_status.json
---------                     -------
    16825                     3 files

narrieta · 2022-02-25T06:57:33Z

azurelinuxagent/common/utils/archive.py

@@ -38,7 +38,7 @@

 ARCHIVE_DIRECTORY_NAME = 'history'

-_MAX_ARCHIVED_STATES = 100
+_MAX_ARCHIVED_STATES = 50


Previously I increased this value from 50 to 100 because I was saving waagent_status.json to a separate directory. Since now I am saving it to the same directory as the goal state, I am reverting the value to the original 50

codecov · 2022-02-25T06:59:14Z

Codecov Report

Merging #2520 (68275b9) into develop (bb9a337) will decrease coverage by 0.03%.
The diff coverage is 90.47%.

@@             Coverage Diff             @@
##           develop    #2520      +/-   ##
===========================================
- Coverage    71.94%   71.91%   -0.04%     
===========================================
  Files          101      102       +1     
  Lines        15147    15135      -12     
  Branches      2401     2400       -1     
===========================================
- Hits         10898    10884      -14     
+ Misses        3771     3769       -2     
- Partials       478      482       +4

Impacted Files	Coverage Δ
azurelinuxagent/common/utils/archive.py	`87.96% <75.00%> (-0.77%)`	⬇️
azurelinuxagent/ga/update.py	`88.39% <76.47%> (-0.16%)`	⬇️
azurelinuxagent/ga/exthandlers.py	`85.71% <88.88%> (-0.20%)`	⬇️
...protocol/extensions_goal_state_from_vm_settings.py	`80.91% <100.00%> (+0.22%)`	⬆️
azurelinuxagent/common/protocol/goal_state.py	`96.78% <100.00%> (+1.42%)`	⬆️
azurelinuxagent/common/utils/timeutil.py	`100.00% <100.00%> (ø)`
azurelinuxagent/ga/collect_telemetry_events.py	`90.41% <0.00%> (-1.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb9a337...68275b9. Read the comment docs.

narrieta · 2022-02-25T20:42:30Z

azurelinuxagent/common/protocol/goal_state.py

+            timestamp = create_timestamp()
+            xml_text, xml_doc, incarnation = GoalState._fetch_goal_state(self._wire_client)
+            self._history = GoalStateHistory(timestamp, incarnation)
+


the main change here is intantiating the history during init; the rest of the changes are just moving the code around to ensure all properties are defined before invoking any method

narrieta · 2022-02-25T20:44:02Z

azurelinuxagent/common/protocol/goal_state.py

                raise
            except ResourceGoneError:
                # retry after refreshing the HostGAPlugin
                GoalState.update_host_plugin_headers(self._wire_client)
                vm_settings, vm_settings_updated = self._wire_client.get_host_plugin().fetch_vm_settings(force_update=force_update)

-            if vm_settings_updated:


now the vmSettings are added to the history by the caller

narrieta · 2022-02-25T20:45:07Z

azurelinuxagent/ga/exthandlers.py

-}}
-'''.format(status_blob_text, debug_info)
-
-        fileutil.write_file(status_file, status_file_text)


now the file is created by the caller, and this code is exposed as get_ext_handlers_status_debug_info

nagworld9 · 2022-03-01T00:00:25Z

azurelinuxagent/common/protocol/goal_state.py

-            # else just ensure the extensions are using the latest vm_settings
-            if vm_settings is not None:
+            # else ensure the extensions are using the latest vm_settings
+            timestamp = create_timestamp()


why do we need different timestamp? can't we use L#135

L#136 calls _fetch_goal_state, which does a network call, which can have delays. we need a new timestamp here

nagworld9 · 2022-03-01T00:26:06Z

azurelinuxagent/common/protocol/goal_state.py

-
-        vm_settings = self._fetch_vm_settings(force_update=force_update)
+        timestamp = create_timestamp()
+        xml_text, xml_doc, incarnation = GoalState._fetch_goal_state(self._wire_client)


where/when is this full goal state saved to history folder?

it's be saved as part of _fetch_extended_goal_state (called below @line 142) or after retrieving the vmsettings (below @line 150)

larohra

some very minor comments, else LGTM

larohra · 2022-03-01T00:57:58Z

azurelinuxagent/common/protocol/goal_state.py


-    def _fetch_extended_goal_state(self, xml_text, xml_doc, vm_settings):
+    def _fetch_extended_goal_state(self, xml_text, xml_doc, force_vm_settings_update=False):


NIT: maybe rename to fetch_complete_goal_state? Extended seemed a bit confusing to me personally

in init, I split the member variables in 2 blocks, which I called "basic" and "extended", then initialize_basic_properties initializes the former and fetch_extended_goal_state the latter (I did not name fetch_extended_goal_state "initialize_extended_properties" because there is a fetch operation in it and I want that to be explicit)

larohra · 2022-03-01T01:04:12Z

azurelinuxagent/common/protocol/goal_state.py

@@ -175,16 +170,17 @@ def _fetch_goal_state(wire_client):

        # In some environments a few goal state requests return a missing RoleInstance; these retries are used to work around that issue
        # TODO: Consider retrying on 410 (ResourceGone) as well
+        incarnation = "unknown"


Will this ever be "unknown"? It should always get updated right?

the linter believes it can be used before initialization at line #184. that would happen, i believe, only if _GET_GOAL_STATE_MAX_ATTEMPTS is 0, but I decided to go ahead an initialize it anyways

larohra · 2022-03-01T01:08:09Z

azurelinuxagent/common/utils/archive.py

        except Exception as e:
            if not self._errors:  # report only 1 error per directory
                self._errors = True
-                logger.warn("Failed to save goal state file {0}: {1} [no additional errors saving the goal state will be reported]".format(target_name, e))
+                logger.warn("Failed to save goal state file {0}: {1} [no additional errors saving the goal state will be reported]".format(file_name, e))


Maybe this error message needs updating? Failed to save {file_name} for incarnation {id} [no additional errors saving the goal state will be reported]: {error}

i see... by "goal state file" i did not mean GoalState.xml, but a file related to the goal state... i'll update this to "Failed to save {0} to the goal state history: {1}" (the incarnation or etag are already logged 1 or 2 lines above this message)

larohra · 2022-03-01T01:10:36Z

azurelinuxagent/ga/exthandlers.py


-        return json.dumps(debug_info)
+        return '''{{
+    "__comment__": "The __status__ property is the actual status reported to CRP",


NIT: Indent missing for better visuals

I did not want extra indentation in the actual file, so I added only 4 leading spaces

larohra · 2022-03-01T01:14:02Z

azurelinuxagent/ga/update.py

+            self._report_extensions_summary(vm_status)
+            if self._goal_state is not None:
+                agent_status = exthandlers_handler.get_ext_handlers_status_debug_info(vm_status)
+                self._goal_state.save_to_history(agent_status, AGENT_STATUS_FILE)


maybe create an abstraction for this too? i.e. self._goal_state.save_reported_status() or something?

currently i'm using save_to_history to save the manifests as well, i'm planning on getting rid of it in another round of refactoring

narrieta added 3 commits February 24, 2022 20:57

Reorganize the history directory

02d7f45

pylint

35b3f5a

python2

7f7d128

narrieta requested review from larohra, kevinclark19a, ZhidongPeng, dhivyaganesan and nagworld9 as code owners February 25, 2022 06:55

narrieta assigned larohra and nagworld9 Feb 25, 2022

narrieta commented Feb 25, 2022

View reviewed changes

write agent status directly to history

5b8ce71

narrieta commented Feb 25, 2022

View reviewed changes

nagworld9 reviewed Mar 1, 2022

View reviewed changes

larohra previously approved these changes Mar 1, 2022

View reviewed changes

update message

7faa811

narrieta dismissed larohra’s stale review via 7faa811 March 1, 2022 04:35

nagworld9 approved these changes Mar 1, 2022

View reviewed changes

larohra approved these changes Mar 1, 2022

View reviewed changes

Merge branch 'develop' into fast-track

68275b9

narrieta merged commit 6445aaa into Azure:develop Mar 1, 2022

narrieta deleted the fast-track branch March 1, 2022 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize the history directory #2520

Reorganize the history directory #2520

narrieta commented Feb 25, 2022 •

edited

Loading

narrieta Feb 25, 2022

codecov bot commented Feb 25, 2022 •

edited

Loading

narrieta Feb 25, 2022

narrieta Feb 25, 2022

narrieta Feb 25, 2022

nagworld9 Mar 1, 2022

narrieta Mar 1, 2022

nagworld9 Mar 1, 2022

narrieta Mar 1, 2022

larohra left a comment

larohra Mar 1, 2022

narrieta Mar 1, 2022

larohra Mar 1, 2022

narrieta Mar 1, 2022

larohra Mar 1, 2022

narrieta Mar 1, 2022

larohra Mar 1, 2022

narrieta Mar 1, 2022

larohra Mar 1, 2022

narrieta Mar 1, 2022


		def _fetch_extended_goal_state(self, xml_text, xml_doc, vm_settings):
		def _fetch_extended_goal_state(self, xml_text, xml_doc, force_vm_settings_update=False):

Reorganize the history directory #2520

Reorganize the history directory #2520

Conversation

narrieta commented Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Feb 25, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larohra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narrieta commented Feb 25, 2022 •

edited

Loading

codecov bot commented Feb 25, 2022 •

edited

Loading