-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve logic of unexpected processes check started by agent #2522
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2522 +/- ##
===========================================
- Coverage 71.93% 71.92% -0.02%
===========================================
Files 102 102
Lines 15166 15188 +22
Branches 2400 2403 +3
===========================================
+ Hits 10910 10924 +14
- Misses 3775 3778 +3
- Partials 481 486 +5
Continue to review full report at Codecov.
|
@@ -605,7 +605,8 @@ def _check_processes_in_agent_cgroup(self): | |||
current = process | |||
while current != 0 and current not in agent_commands: | |||
current = self._get_parent(current) | |||
if current == 0: | |||
# Process started by agent will have a marker and check if that marker found in process environment. | |||
if current == 0 and not self.__is_process_env_flag_found(process): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as alternative condition If all the above checks unsuccessful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably due the env variable check at the beginning and skip the rest of the logic if that check succeeds.
The current logic is too complicated and it would be good if we can drop it. You can add some sample telemetry to see if there are any cases where the env check is False, but the current logic is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we have special cases like demon, systemd-run checks and we need those even now. I think only place we can improve when we check parent process. We can get rid of this and add env variable check there.
current = process
while current != 0 and current not in agent_commands:
current = self._get_parent(current)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further look I realized we need this parent check since few places like logcollector we do popen directly not going through defined code. In those scenarios, this parent pid check would catch it. So, I would say either we need all checks or update direct popen calls to shellutil. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not quite what i meant, but we can sync offline... for now you can leave it as is
|
||
# Set the marker before process start | ||
env[PARENT_PROCESS_NAME] = AZURE_GUEST_AGENT | ||
kwargs['env'] = env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If caller sends env
arg, use the same flags otherwise extend os.enviorn
along with new marker flag
@@ -605,7 +605,8 @@ def _check_processes_in_agent_cgroup(self): | |||
current = process | |||
while current != 0 and current not in agent_commands: | |||
current = self._get_parent(current) | |||
if current == 0: | |||
# Process started by agent will have a marker and check if that marker found in process environment. | |||
if current == 0 and not self.__is_process_env_flag_found(process): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably due the env variable check at the beginning and skip the rest of the logic if that check succeeds.
The current logic is too complicated and it would be good if we can drop it. You can add some sample telemetry to see if there are any cases where the env check is False, but the current logic is True.
@@ -640,6 +641,24 @@ def __format_process(pid): | |||
pass | |||
return "[PID: {0}] UNKNOWN".format(pid) | |||
|
|||
@staticmethod | |||
def __is_process_env_flag_found(pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some comments about how that variable is used, and probably rename the function to something that indicates its purpose (check that the process is a descendant of the agent) rather than its implementation (check for the variable).
environ = env_file.read() | ||
if environ and environ[-1] == '\x00': | ||
environ = environ[:-1] | ||
match = re.search(shellutil.PARENT_PROCESS_NAME + "=" + shellutil.AZURE_GUEST_AGENT, environ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we do not need a regex for this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how sample output looks. I feel regex would simplify the parsing.
[root@edpm26zyjw azureuser]# cat /proc/1156/environ
OLDPWD=/etc/sysconfig/network-scripts_AZURE_GUEST_AGENT_DAEMON_VERSION_=9.9.9.9PATH=/sbin:/usr/sbin:/bin:/usr/binPWD=/etc/sysconfig/network-scriptsLANG=en_US.UTF-8PARENT_PROCESS_NAME=AZURE_GUEST_AGENTSHLVL=2_=/sbin/dhclient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use "{0}={1}".format(shellutil.PARENT_PROCESS_NAME, shellutil.AZURE_GUEST_AGENT) in environ
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinclark19a yes, that is what i was trying to point out, sorry for not being clear... I did not see any pattern in the re.search call, so this is basically a "contains".
Now @nagworld9 if you want to be more strict than simple "contains", you should match whatever separators the file uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see your point and fixed it, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor comment about the use of re.search, otherwise LGTM (please look at the comment)
Description
The current logic of finding parent and if parent is agent, then we don't consider it as unexpected process and don't disable cgroups. That is broken if the process gets orphaned on certain scenarios after it's started by agent.
Design:
we should explicitly add an env variable to the processes we create and change the cgroupconfigurator to look for that var.
PR information
Quality of Code and Contribution Guidelines