-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit gProfiler memory & CPU usage and --log-usage support in exe mode. #564
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
help="Limit on the memory used by gProfiler." | ||
) | ||
|
||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I specific no CPU limit? i.e --limit-cpu none
should be a possible parameter, not necessarily this way, but you should be able to limit ONLY the memory or CPU.
gprofiler/main.py
Outdated
default=0.5, # 500m, same as in the k8s DaemonSet | ||
dest="cpu_limit", | ||
type=float, | ||
help="Limit on the cpu used by gProfiler." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="Limit on the cpu used by gProfiler." | |
help="Limit on the cpu used by gProfiler. Units are cores and the default is %(default)s" |
same for memory (with appropriate units)
gprofiler/main.py
Outdated
# Set limits and return path of the cgroup. | ||
def set_limits(cpu: float, memory: int) -> str: | ||
cgroups = {} | ||
logger.debug("Check if cgroup version is supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is not needed IMO. As I suggested in https://github.com/Granulate/gprofiler/pull/564/files#r1007993619, log only the negative case.
assert controller in cgroup_v1_hierarchies | ||
return f"{cgroup_v1_hierarchies[controller]}{cgroup}" | ||
else: | ||
return f"{find_v2_hierarchy()}/{controller}{cgroup}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work now - because cgroups v2 files are different. For example - there is no cpuacct.usage
file, but cpu.stat
file which contains a usage_usec
field.
I created a ticket for cgroups v2 support. Until that's done, I suggest you raise an exception here if v2 is in use.
Also I see that granulate-utils does this check: len(get_cgroups(os.getpid())) == 1
(in _verify_preconditions
). It's subtly different from what you did here (checks the controllers available for this processes instead of the mounted controllers). I think it'll produce the same results, but let's be consistent and use the same check here (you can export it to a function in granulate-utils - generally, if you need to make changes in granulate-utils, you can opne a PR there as well, and in the gProfiler repo you just update the revision of the submodule to point to your PR revision)
@Jongy To be honest I added "[WIP]" to the title to show you that I'm working on that. There will be plenty of force pushes there. Many things will look differently. I wasn't guessing that you'll review that already. ✌️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished reviewing.
Let's test:
- cgroup limitting when a container - gprofiler should not move / make changes.
- cgroup limitting when a privileged executable - gprofiler should move & make changes
- cgroup limitting when a non-privileged executable - gprofiler should try to move, fail, and continue running okay (this can be tested by running as root but with dropped capabilities.. happens in AWS Fargate, Databricks and a few other platforms)
I have a few more ideas, will add
Oh sorry :sweat_smiley: I saw that it's not a draft so I started. Nevermind, no problem |
By the way - we eventually use squash-and-merge so there is no need to force push, you can just add commits on top of it. But up to you |
@Jongy Ahhhh it is my mistake, nevertheless, the code will be better reviewed :) |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
The tests check if gProfiler behaved as predicted, but I prefer if we add true "end-to-end" tests, i.e - the test will actually check the cgroup limit imposed on the gProfiler process and see that it matches what we expected. You can get the gProfiler PID, get its cgroup, verify that it actually moved to a new cgroup named |
Description
TBA / WIP
Related Issue
#529 #227
Motivation and Context
TBA / WIP
How Has This Been Tested?
TBA / WIP
Screenshots
Checklist: