Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor logcollector memory usage #2637

Merged
merged 4 commits into from
Aug 4, 2022
Merged

Conversation

nagworld9
Copy link
Contributor

Description

Problem: The log collector process runs with 30MB of cgroup memory limit. Some vms reported OOM kills of the log collector process when it's reaches the limit.

Solution: Agent monitors the memory every 2 secs in separate thread and gracefully exit the log collector process when it's reaches 30MB limit. That way we can avoid force kills by OOM killer.


2022-08-01T02:55:25.357959Z INFO LogCollectorMonitorHandler LogCollector Log collector memory limit 31457280 bytes exceeded. The max reported usage is 32534528 bytes.
2022-08-01T02:55:25.546377Z INFO CollectLogsHandler ExtHandler Disabling periodic log collection until service restart due to exceeded process memory limit.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

azurelinuxagent/ga/collect_logs.py Outdated Show resolved Hide resolved
self.join()
try:
self.join()
except RuntimeError:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError being raised when agent try to stop the thread to exit gracefully. Seems like expected error ignoring this

File "bin/WALinuxAgent-9.9.9.9-py3.8.egg/azurelinuxagent/ga/collect_logs.py", line 123, in join
self.event_thread.join()
File "/usr/lib/python3.6/threading.py", line 1053, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

return LogCollectorMonitorHandler(cgroups)


class LogCollectorMonitorHandler(ThreadHandlerInterface):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Montor thread

max_usage = metric.value

current_max = max(current_usage, max_usage)
if current_max > LOGCOLLECTOR_MEMORY_LIMIT:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block checks the memory limit and sends exit

@codecov
Copy link

codecov bot commented Aug 1, 2022

Codecov Report

Merging #2637 (53bae61) into develop (aa9b0e1) will increase coverage by 0.11%.
The diff coverage is 81.74%.

@@             Coverage Diff             @@
##           develop    #2637      +/-   ##
===========================================
+ Coverage    71.91%   72.02%   +0.11%     
===========================================
  Files          103      103              
  Lines        15654    15745      +91     
  Branches      2494     2501       +7     
===========================================
+ Hits         11258    11341      +83     
- Misses        3880     3887       +7     
- Partials       516      517       +1     
Impacted Files Coverage Δ
azurelinuxagent/ga/collect_logs.py 81.51% <75.90%> (+1.11%) ⬆️
azurelinuxagent/common/logcollector.py 88.30% <86.36%> (-0.30%) ⬇️
azurelinuxagent/agent.py 58.63% <100.00%> (+1.19%) ⬆️
azurelinuxagent/common/cgroup.py 87.92% <100.00%> (+0.05%) ⬆️
azurelinuxagent/common/cgroupconfigurator.py 73.47% <100.00%> (ø)
azurelinuxagent/common/cgroupapi.py 83.79% <0.00%> (+2.79%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a few minor comments/questions

azurelinuxagent/agent.py Show resolved Hide resolved
azurelinuxagent/ga/collect_logs.py Show resolved Hide resolved
azurelinuxagent/ga/collect_logs.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants