refactor cgroup controllers #3135

maddieford · 2024-05-31T21:55:20Z

Description

The current code for cgroups is overcomplicated because it keeps track of each controller path separately. If we want to track additional controllers in the future with the existing code, we would need to update each block of code that references controller paths. In cgroup v2, there is a unified cgroup path, which makes it unnecessary to track different paths for each controller.

The changes in this PR simplify the code by creating an abstraction for a cgroup which handles any operations that require cgroup paths. The changes include renaming the CGroup/CpuCGroup/MemoryCGroup classes to ControllerMetrics/CpuMetrics/MemoryMetrics and introduces the Cgroup/CgroupV1/CgroupV2 classes in the cgroupapi to represent a cgroup. The cgroupapi was updated to return instances of a Cgroup instead of tuples of controller paths.

Note this PR does not yet add logic to track cpu/memory metrics in v2. Those changes will be included in a future PR.

Test run of all cgroup scenarios: https://dev.azure.com/cplatruntime/WALinuxAgent/_build/results?buildId=9290&view=results

Issue #

PR information

The title of the PR is clear and informative.
There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
If applicable, the PR references the bug/issue that it fixes in the description.
New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

I have read the contribution guidelines.

* Refactor Cgroup, CpuCgroup, MemoryCgroup to ControllerMetrics, CpuMetrics, MemoryMetrics * Create methods to get unit/process cgroup representation * Refactoring changes * Refactoring changes * Fix e2e test * Fix unintentional comment change * Remove unneeded comments * Clean up comments and make code more readable * Simplify get controller metrics * Clean up cgroupapi * Cleanup cgroup -> controllermetrics changes * Clean up cgroup configurator * Fix unit tests for agent.py * Fix cgroupapi tests * Fix cgroupconfigurator and tests * Rename controller metrics tests * Ignore pylint issues * Improve test coverage for cgroupapi * Rename cgroup to metrics * Update cgroup.procs to accurately represent file

maddieford · 2024-05-31T21:56:50Z

azurelinuxagent/ga/cgroupapi.py

@@ -279,11 +259,6 @@ def _is_systemd_failure(scope_name, stderr):
        unit_not_found = "Unit {0} not found.".format(scope_name)
        return unit_not_found in stderr or scope_name not in stderr

-    @staticmethod
-    def get_processes_in_cgroup(cgroup_path):


Moved to Cgroup class

maddieford · 2024-05-31T21:57:14Z

azurelinuxagent/ga/cgroupapi.py

@@ -202,7 +202,6 @@ class _SystemdCgroupApi(object):
    Cgroup interface via systemd. Contains common api implementations between cgroup v1 and v2.
    """
    def __init__(self):
-        self._agent_unit_name = None


this was an unused property

maddieford · 2024-05-31T22:01:00Z

azurelinuxagent/ga/cgroupapi.py

+
+        return True
+
+    def get_controller_metrics(self, expected_relative_path=None):


Next step for cgroup v2 support will be adding logic to track metrics for cpu and memory in v2

maddieford · 2024-05-31T22:02:39Z