-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for rsvd accounting hugetlb cgroup #2360
Add initial support for rsvd accounting hugetlb cgroup #2360
Conversation
The previous non-rsvd max/limit_in_bytes does not account for reserved huge page memory, making it possible for a processes to reserve all the huge page memory, without being able to allocate it (due to cgroup restrictions). In practice this makes it possible to successfully mmap more huge page memory than allowed via the cgroup settings, but when using the memory the process will get a SIGBUS and crash. This is bad for applications trying to mmap at startup (and it succeeds), but the program crashes when starting to use the memory. eg. postgres is doing this by default. This also keeps writing to the old max/limit_in_bytes, to make sure some applications read the wrong value. More info can be found here: https://lkml.org/lkml/2020/2/3/1153 Signed-off-by: Odin Ugedal <odin@ugedal.com>
d8fe1b1
to
5c84b1a
Compare
func (s *HugetlbGroup) Set(path string, cgroup *configs.Cgroup) error { | ||
supportsReservationAccounting := s.HasReservationAccountingSupport(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best way to check, or should we try to "cache" the value like we do with HugePageSizes
?
for _, pagesize := range hugePageSizes { | ||
usage := strings.Join([]string{"hugetlb", pagesize, "current"}, ".") | ||
filenamePrefix := strings.Join([]string{"hugetlb", pagesize}, ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it would be better to have it as
filenamePrefix := "hugetlb."+pagesize
(for readability)
filenamePrefix += ".rsvd" | ||
} | ||
|
||
usage := fmt.Sprintf("%s.current", filenamePrefix) | ||
value, err := fscommon.GetCgroupParamUint(dirPath, usage) | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to parse hugetlb.%s.current file", pagesize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message from GetCgroupParamUint
already contain file name, so you can return the error as-is, no need to wrap it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the error now returns the wrong file name in case supportsReservationAccounting
is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing this should be done as a separate first patch I think.
value, err := fscommon.GetCgroupParamUint(dirPath, usage) | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to parse hugetlb.%s.current file", pagesize) | ||
} | ||
hugetlbStats.Usage = value | ||
|
||
fileName := strings.Join([]string{"hugetlb", pagesize, "events"}, ".") | ||
fileName := fmt.Sprintf("%s.events", filenamePrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using fileName := filenamePrefix + ".events"
would be faster
but either way is fine
// is supported. This is supported from linux 5.7 | ||
// https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html | ||
func HasReservationAccountingSupport(dirPath string) bool { | ||
hugePageSizes, err := cgroups.GetHugePageSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to do this check once, using sync.Once
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or not... since different cgroups can have different controls I guess...
@@ -65,6 +70,58 @@ func TestHugetlbSetHugetlb(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestHugetlbSetHugetlbWithReservedAccounting(t *testing.T) { | |||
helper := NewCgroupTestUtil("hugetlb", t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this test be skipped if !HasReservationAccountingSupport()
?
if len(HugePageSizes) == 0 { | ||
return false | ||
} | ||
_, err := fscommon.ReadFile(path, strings.Join([]string{"hugetlb", HugePageSizes[0], "rsvd", "limit_in_bytes"}, ".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use cgroups.PathExists
here
if err != nil || len(hugePageSizes) == 0 { | ||
return false | ||
} | ||
_, err = fscommon.ReadFile(dirPath, strings.Join([]string{"hugetlb", hugePageSizes[0], "rsvd", "max"}, ".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use cgroups.PathExists()
I'm afraid yes. Reservation and use are two different properties, and we should not mix them together. |
So, @odinuge, I think this should start with a PR to https://github.com/opencontainers/runtime-spec. Once merged, we can open a PR here (and most of the comments that I left reviewing this are still valid). |
I'm working on reviving this PR now, once the spec is merged. |
The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).
In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.
This also keeps writing to the old max/limit_in_bytes, to make sure some
applications read the wrong value.
More info can be found here: https://lkml.org/lkml/2020/2/3/1153
Do we have to edit the runtime-spec in order to do this?
Also, this will fix patroni/patroni#1393 (ref. the postgres part at the top ^)