-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Linux memory allocations int64 not uint64 #876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this PR fits into the current v1.0.0 freeze. I'd like to see it landed in 1.0, but I'm generally in favor of polishing vs. pushing 1.0 out quickly.
config-linux.md
Outdated
@@ -270,13 +270,16 @@ For more information, see the kernel cgroups documentation about [memory][cgroup | |||
**`memory`** (object, OPTIONAL) represents the cgroup subsystem `memory` and it's used to set limits on the container's memory usage. | |||
For more information, see the kernel cgroups documentation about [memory][cgroup-v1-memory]. | |||
|
|||
Values specify the limit in bytes, or `-1` for unlimited memory. For `swappiness` the values are from `0` (almost never swap) to `100` | |||
(very likely to be swapped). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your newline is in the middle of the second sentence, but should be after the first period to match our policy.
I'd rather have the swappiness
semantics documented in the swappiness
entry below (or dropped entirely if you feel the current “sysctl's vm.swappiness” punt is sufficient). Then you can use:
For all parameters except
swappiness
, values specify…
And in the entries themselves, drop the now-redundant “bytes”. For example:
limit
(int64, OPTIONAL) - limit memory usage
reservation
(int64, OPTIONAL) - soft limit of memory usage
…
config-linux.md
Outdated
* **`kernel`** *(uint64, OPTIONAL)* - sets hard limit for kernel memory | ||
* **`kernelTCP`** *(uint64, OPTIONAL)* - sets hard limit in bytes for kernel TCP buffer memory | ||
* **`limit`** *(int64, OPTIONAL)* - sets limit of memory usage in bytes | ||
* **`reservation`** *(int64, OPTIONAL)* - sets soft limit of memory usage in bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soft limits have some interesting caveats, so I'd be happier with a direct reference to memory.soft_limit_in_bytes
. But that's orthogonal to this PR.
6990fda
to
97c466a
Compare
@wking adjusted the wording and line breaks as suggested, and fixed the example which had |
On Fri, Jun 23, 2017 at 03:20:57PM -0700, Justin Cormack wrote:
… and fixed the example which had `0` limits rather than unlimited
or unspecified.
You'll also want to make that fix in [1,2].
[1]: https://github.com/opencontainers/runtime-spec/blame/c1a98486c417a03b56277a63a58d3456459e128c/config.md#L692-L693
[2]: https://github.com/opencontainers/runtime-spec/blob/c1a98486c417a03b56277a63a58d3456459e128c/schema/test/config/good/spec-example.json#L242-L243
|
97c466a
to
5de3024
Compare
@wking amended |
config-linux.md
Outdated
* **`kernel`** *(int64, OPTIONAL)* - sets hard limit for kernel memory | ||
* **`kernelTCP`** *(int64, OPTIONAL)* - sets hard limit for kernel TCP buffer memory | ||
|
||
For `swappiness` the values are from `0` (almost never swap) to `100` (very likely to be swapped). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0 really “almost never”? From the cgroup docs:
Please note that unlike during the global reclaim, limit reclaim enforces that 0 swappiness really prevents from any swapping even if there is a swap storage available.
which seems stronger than the the sysctl docs:
A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.
And neither one of those mentions a max at 100. I don't doubt that 100 is the current max, but I'd rather punt to kernel docs for the swappiness
semantics instead of inlining them with this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded to paraphrase the comment in the code https://github.com/torvalds/linux/blob/master/mm/vmscan.c#L149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a user facing ABI, so the limit of 100 will not change without adding a new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded to paraphrase the comment in the code…
But is exposing kernel comments in the spec a good idea? What if they change the semantics, MUST runtimes translate from the 0–100 range to whatever the local kernel uses?
What's important about this value is that it get written to memory.swappiness
. I'd rather leave the config author and host kernel to sort out the memory.swappiness
semantics between themselves, and have the runtime (and this spec) just blindly pass the opaque value through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe the important thing is what is written to the memory.swappiness
we should not be discussing this PR at all, we should change all these values to strings, which is what is actually written to the files.
Percentages, like bytes, are pretty common in human facing interfaces, just like some of the other values are 0
or 1
which we describe as booleans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe the important thing is what is written to the
memory.swappiness
we should not be discussing this PR at all, we should change all these values to strings, which is what is actually written to the files.
Yeah, and a cgroup config like:
"<controller>": {
"<filename-suffix>": "<string to write>"
}
For example:
"memory": {
"swappiness": "60",
"soft_limit_in_bytes": "-1",
}
would be as expressive as what we have now and a lot easier to document. And it extends to other controllers and properties without needing any spec changes.
The counter-argument is that that approach degenerates into an array-of-syscalls config (e.g. if you set capabilities as integers instead of CAP_*
strings. Drawing the line somewhere is useful to keep configs human-readable, but "swappiness": 0
and "swappiness": "0"
are equally legible to me.
But overhauling the cgroup-config API is a big change with a 1.0 freeze in place, so in the short term I'd rather just avoid baking more kernel-defined semantics into the spec in situations where the runtime implementation can ignore those semantics.
The kernel ABI to these values is a string, which accepts the value `-1` to mean "unlimited" or an integer up to 2^63 for an amount of memory in bytes. While the internal representation in the kernel is unsigned, this is not exposed in any ABI directly. Because of the user-kernel memory split, values over 2^63 are not really useful; indeed that much memory is not supported, as physical memory is limited to 52 bits in the forthcoming switch to five level page tables. So it is much more natural to support the value `-1` for unlimited, especially as the actual number needed to represent the maximum has varied in different kernel versions, and across 32 and 64 bit architectures, so determining the value to use is not possible, so it is necessary to write the string `-1` to the cgroup files. See also discussion in - opencontainers/runc#1494 - opencontainers/runc#1492 - opencontainers/runc#1375 - opencontainers/runc#1421 Signed-off-by: Justin Cormack <justin.cormack@docker.com>
5de3024
to
e73cd70
Compare
1 similar comment
replace opencontainers#1492 opencontainers#1494 fix opencontainers#1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
See my comments in opencontainers/runc#1492 (comment) , the only benefit we can gain from "uint64 -> int64" is that it makes runc code easier, all other reasons are not making But since there's no real world issue for using LGTM |
replace opencontainers#1492 opencontainers#1494 fix opencontainers#1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
replace opencontainers#1492 opencontainers#1494 fix opencontainers#1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com> (cherry picked from commit 3d9074e) Signed-off-by: Tibor Vass <tibor@docker.com>
replace opencontainers#1492 opencontainers#1494 fix opencontainers#1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com> (cherry picked from commit 3d9074e) Signed-off-by: Tibor Vass <tibor@docker.com>
Catch the JSON Schema up with e73cd70 (Make Linux memory allocations int64 not uint64, 2017-06-23, opencontainers#876), which carried the kernel's 0 to 100 range into the spec. Signed-off-by: W. Trevor King <wking@tremily.us>
replace opencontainers#1492 opencontainers#1494 fix opencontainers#1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
replace #1492 #1494 fix #1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
replace #1492 #1494 fix #1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
replace #1492 #1494 fix #1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
replace #1492 #1494 fix #1422 Since opencontainers/runtime-spec#876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>
The kernel ABI to these values is a string, which accepts the value
-1
to mean "unlimited" or an integer up to 2^63 for an amount of memory in
bytes.
While the internal representation in the kernel is unsigned, this is not
exposed in any ABI directly. Because of the user-kernel memory split, values
over 2^63 are not really useful; indeed that much memory is not supported,
as physical memory is limited to 52 bits in the forthcoming switch to five
level page tables. So it is much more natural to support the value
-1
forunlimited, especially as the actual number needed to represent the maximum
has varied in different kernel versions, and across 32 and 64 bit architectures,
so determining the value to use is not possible, so it is necessary to write
the string
-1
to the cgroup files.See also discussion in
Signed-off-by: Justin Cormack justin.cormack@docker.com