Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start sending additional memory information per process #1729

Merged
merged 22 commits into from
Dec 16, 2019

Conversation

vrdmr
Copy link
Member

@vrdmr vrdmr commented Dec 6, 2019

Description

Adding proc_statm check to validate the memory values we get from CGroup.

Sample event would look like

{
    "SchemaVersion": 1.0,
    "PerfMetrics": {
        "walinuxagent.service": {
            "cpu": {
                "cur_cpu": [
                    0.5708333333333334, 0.531, 0.648, 0.5625, 6, "2019-12-11 00:05:03.961587", "2019-12-11 00:30:05.881054"
                ]
            },
            "memory": {
                "cur_mem": [
                    42011306.666666664, 41652224, 43409408, 41750528.0, 6, "2019-12-11 00:05:03.961690", "2019-12-11 00:30:05.881153"
                ],
                "max_mem": [
                    338530304.0, 338530304, 338530304, 338530304.0, 6, "2019-12-11 00:05:03.961728", "2019-12-11 00:30:05.881193"
                ]
            },
            "proc_statm_memory": {
                "1350 | python3 | /usr/bin/python3 -u /usr/sbin/waagent -daemon": [
                    23064576.0, 23064576, 23064576, 23064576.0, 6, "2019-12-11 00:05:03.961908", "2019-12-11 00:30:05.881369"
                ],
                "1511 | python3 | python3 -u /usr/sbin/waagent -run-exthandlers": [
                    27376981.333333332, 27312128, 27668480, 27318272.0, 6, "2019-12-11 00:05:03.962010", "2019-12-11 00:30:05.881471"
                ]
}}}}

PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines


This change is Reviewable

@vrdmr vrdmr added this to the v2.2.46 milestone Dec 6, 2019
@codecov
Copy link

codecov bot commented Dec 6, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@cf3bce5). Click here to learn what that means.
The diff coverage is 83.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #1729   +/-   ##
==========================================
  Coverage           ?   68.39%           
==========================================
  Files              ?       81           
  Lines              ?    11741           
  Branches           ?     1648           
==========================================
  Hits               ?     8030           
  Misses             ?     3369           
  Partials           ?      342
Impacted Files Coverage Δ
azurelinuxagent/common/exception.py 98.87% <ø> (ø)
azurelinuxagent/common/resourceusage.py 100% <100%> (ø)
azurelinuxagent/ga/monitor.py 87.3% <56.52%> (ø)
azurelinuxagent/common/cgroup.py 90.15% <69.23%> (ø)
azurelinuxagent/common/cgroupstelemetry.py 92.03% <83.58%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf3bce5...5e35ec8. Read the comment docs.

vrdmr and others added 4 commits December 10, 2019 14:43
Adding the default process_name, process_commandline
- Making it more explicit for Memory cases as well if IOError came.
- Adding a filter to make sure not adding entries into Metrics which are marked_for_delete
- Tests changes
Did some tweaks to make it mergable in test_cgroups and test_cgroupstelemetry.
azurelinuxagent/common/cgroup.py Show resolved Hide resolved
azurelinuxagent/common/cgroup.py Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Outdated Show resolved Hide resolved
azurelinuxagent/ga/monitor.py Show resolved Hide resolved
azurelinuxagent/ga/monitor.py Show resolved Hide resolved
tests/common/test_cgroups.py Show resolved Hide resolved
tests/ga/test_monitor.py Outdated Show resolved Hide resolved
tests/ga/test_monitor.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroup.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroup.py Show resolved Hide resolved
azurelinuxagent/common/resourceusage.py Outdated Show resolved Hide resolved
azurelinuxagent/common/resourceusage.py Show resolved Hide resolved
azurelinuxagent/common/resourceusage.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Show resolved Hide resolved
azurelinuxagent/common/exception.py Outdated Show resolved Hide resolved
azurelinuxagent/ga/monitor.py Outdated Show resolved Hide resolved
tests/common/test_cgroupstelemetry.py Outdated Show resolved Hide resolved
vrdmr and others added 4 commits December 12, 2019 18:51
- cgroups.py - cleaner returns and raises
- cgroupstelemetry.py - exception handling when proc_statm for pid fails.
- exception.py - cgroup comments fixed
- resourceusage.py - raising exception when failing to get memory info.
- monitor.py - reverted the reset_logger changes
- test_cgroupstelemetry.py - stray print and mock removed.
- test_resourceusage.py - newline and end
- test_monitor.py - removed reset_logger changes, and count of metrics fixed.
- resourceusage.py - Propogate IOError(Errno2) above, and for other exceptions, raise ProcessInfoException.
- test_resourceusage.py - Add more asserts to check the raising of IOError, ProcessInfoException.
- cgroupstelemetry.py - handling of exception thrown by get_proc_*. Also some variable name refactoring.
- resourceusage.py - Bubbling up exceptions in get_proc_*
- test_cgroupstelemetry.py - Refactoring variable name.
- test_resourceusage.py - Changes in test to test exception bubbling up.
@vrdmr vrdmr requested a review from narrieta December 13, 2019 23:13
azurelinuxagent/common/resourceusage.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroup.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Outdated Show resolved Hide resolved
azurelinuxagent/common/cgroupstelemetry.py Outdated Show resolved Hide resolved
azurelinuxagent/common/resourceusage.py Outdated Show resolved Hide resolved
- test_cgroups.py - Simple refactoring of class setup and asserts fixed for get_tracked_process
- resourceusage.py - Comments fixed.
- cgroupstelemetry.py - Refactored the strings into class for easy usage.
- cgroup.py - Refactoring the controller names strings into class. Also changed get_tracked_processes's return behavior.
Copy link
Contributor

@larohra larohra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment else LGTM

azurelinuxagent/common/cgroupstelemetry.py Show resolved Hide resolved
@vrdmr vrdmr merged commit 8194bbc into Azure:develop Dec 16, 2019
@vrdmr vrdmr deleted the vameru/send-addl-memory-information branch December 16, 2019 23:52
rjschwei pushed a commit to rjschwei/WALinuxAgent that referenced this pull request Mar 24, 2020
* Adding ResourceUsage Class and Adding ProcessIds in Telemetry

* Adding process tests to fetch the processes in cgroup

* Sending Resource Metrics for Memory Usage telemetry.

* Sending Memory Data for Tracked Processes

* Fix proc_statm collection and dictionary formatting

* Updating the process_name pattern when sending memory telemetry

* Adding the default process_name, process_commandline

* Handling inactive cgroups; not send empty values in ExtensionMetricsData

- Making it more explicit for Memory cases as well if IOError came.
- Adding a filter to make sure not adding entries into Metrics which are marked_for_delete
- Tests changes

* Fixing the missed test - Changing the memory exceptions.

* Review Comments addressed, and clearer exception handling.

- cgroups.py - cleaner returns and raises
- cgroupstelemetry.py - exception handling when proc_statm for PID fails.
- exception.py - cgroup comments fixed
- resourceusage.py - raising an exception when failing to get memory info.
- monitor.py - reverted the reset_logger changes
- test_cgroupstelemetry.py - stray print and mock removed.
- test_resourceusage.py - newline and end
- test_monitor.py - removed reset_logger changes, and count of metrics fixed.

* Making IOError explicit for get_memory_usage_proc_statm & test fixes

- resourceusage.py - Propogate IOError(Errno2) above, and for other exceptions, raise ProcessInfoException.
- test_resourceusage.py - Add more asserts to check the raising of IOError, ProcessInfoException.

* Addressing review comments.

- cgroupstelemetry.py - handling of exception thrown by get_proc_*. Also, some variable name refactoring.
- resourceusage.py - Bubbling up exceptions in get_proc_*
- test_cgroupstelemetry.py - Refactoring variable name.
- test_resourceusage.py - Changes in test to test exception bubbling up.

* Initializing a new logger for each test here to not conflict with others

* Review comments addressed and some refactoring.

- test_cgroups.py - Simple refactoring of class setup and asserts fixed for get_tracked_process
- resourceusage.py - Comments fixed.
- cgroupstelemetry.py - Refactored the strings into class for easy usage.
- cgroup.py - Refactoring the controller names strings into class. Also changed get_tracked_processes' return behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants