Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No SWAP space available for containers on 4.7 #434

Closed
webdock-io opened this issue Oct 26, 2020 · 15 comments · Fixed by #435
Closed

No SWAP space available for containers on 4.7 #434

webdock-io opened this issue Oct 26, 2020 · 15 comments · Fixed by #435

Comments

@webdock-io
Copy link

Focal
LXD 4.7

Reporting this, despite seeing some action on this recently here #418 and here https://discuss.linuxcontainers.org/t/invalid-swaptotal-in-proc-meminfo-swaptotal-0/8231/16 and not seeing any confirmation that this is still a problem.

In any case, I set up a vanilla Focal system as per the usual without anything weird going on, swap space available is reported as 0K inside container.

# grep SwapTotal /proc/meminfo
SwapTotal:      73692156 kB
# lxc launch ubuntu:focal testswap
Creating testswap
Starting testswap
# lxc exec testswap bash
root@testswap:~# # grep SwapTotal /proc/meminfo
root@testswap:~# uname -r
5.4.0-52-generic

Note, I am not trying to limit swap for the container. Nothing has been added to the default profile.

The system has not been put into production yet - at least until we figure this out - so I can provide access if need be.

@stgraber
Copy link
Member

cat /proc/cmdline

@stgraber
Copy link
Member

Yeah, I'm also seeing 0K here in an unrestricted container with swap accounting enabled on the host.

@brauner looks like there's more swap weirdness to look at...

@webdock-io
Copy link
Author

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-52-generic root=UUID=aee554fd-3017-46d1-a8d5-9fb95d482489 ro init_on_alloc=0 maybe-ubiquity

Unrestricted container as well over here. Containers with memory restrictions like:

  limits.memory: 2GB
  limits.memory.enforce: hard

Are showing 0K also (thought I'd check that as I saw a comment that mentioned something about this possibly having some effect)

@stgraber
Copy link
Member

Your kernel is missing swap accounting, so even with a working lxcfs, you wouldn't see anything.

Do you have /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes on your host? If not, then swap account is indeed not enabled and you'll need to boot with swapaccount=1

@webdock-io
Copy link
Author

webdock-io commented Oct 26, 2020

That's odd - is this something that has changed since 4.2? Because I have a load of systems on 4.2 where swap is working without this commandline argument. I thought / assumed this was only needed if you wanted to limit swap for containers.

Anyway, there was no /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes on my host, and after adding swapaccount=1 and a reboot I now get:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-52-generic root=UUID=aee554fd-3017-46d1-a8d5-9fb95d482489 ro init_on_alloc=0 cgroup_enable=memory swapaccount=1 maybe-ubiquity
# cat /sys/fs/cgroup/memory/memory.memsw.usage_in_bytes
7598370816
# grep SwapTotal /proc/meminfo
SwapTotal:      73692156 kB
# lxc exec testswap -- grep SwapTotal /proc/meminfo
SwapTotal:             0 kB

Still no swap goodness for the container :(

@stgraber
Copy link
Member

Okay, so still an issue for @brauner to look into (again...).
swapaccount=1 is required for both limiting and accounting of swap space.

Anyway, in cases where no memory limits are in place, we certainly should be showing the entirety of the swap.
For cases where memory limits are in place and swap accounting is disabled, then I guess showing no swap is probably close enough to reality.

Then for cases where swap accounting is enabled and a limit has been set, we can reliably compute the amount of memory and swap being used (memory.usage_in_bytes for memory, memory.usage_in_bytes - memsw.usage_in_bytes for swap), but we can't exactly control how much of each you can use (we can set how much RAM you're allowed to use and how much RAM+SWAP you're allowed to use, but not how much SWAP you're allowed to use, so that makes rendering the swap disk size quite tricky).

@webdock-io
Copy link
Author

@stgraber While we wait for @brauner to give this a look - I'd like you to clarify something for me:

I today, maybe naively, thought that if I switched back to LXD 4.0/stable I could get back to a version where SWAP worked as it did before: Where containers just got all the SWAP on the system available to them.

However, even after rolling back and setting everything up anew, I'm still getting no swap. This indicates to me that despite rolling back LXD, lxcfs is still in some newer version than what used to be the case?

I am a bit confused by the fact that you state that swapaccount=1 is required, when it clearly was not the case in the past - I have about 16 systems running v4.2 (some early build of 4.2) which do not have swapaccount enabled and they all show swap available.

I just want to be clear on what has happened here - it's totally cool if things have changed and some regression has happened - it seems clear to me that @brauner has been battling a multitude of issues related to this.

In the past I really wanted SWAP to be limited, and shared memory in general, and I also made a few posts to that effect:

https://github.com/lxc/lxd/issues/6168 and https://github.com/lxc/lxd/issues/7279

So I am all for it being implemented in various ways (I realize the tmpfs issue is not directly related to this and needs some other magic/shenanigans)

But I have made my peace with this and we are working around this in other ways - so really, I'd be totally cool with things working "as they did before" - as right now I'm stuck with a system I am hesitant to put into production where clients will now have 0K swap available, and if a fix is forthcoming it would probably require an LXD upgrade and subsequent restart = downtime for clients.

So yeah ... I guess what I am asking here is twofold:

  1. Why the change in the first place?
  2. Why can I not / Any way I can roll back to the previous status-quo in a sane manner?
  • Or should I just chill out, be patient, and wait for a fix? :O)

Thank you for your time and attention as always.

@stgraber
Copy link
Member

stgraber commented Nov 3, 2020

LXCFS is now the same version in 4.0 and latest.
This is a bug so we really just need @brauner to sort it out and then we'll push it everywhere.

@webdock-io
Copy link
Author

@stgraber OK I see - thank you for clearing that up.

I will wait a while and see what happens with this, in that case :O)

@brauner
Copy link
Member

brauner commented Nov 3, 2020

Okay, so still an issue for @brauner to look into (again...).
swapaccount=1 is required for both limiting and accounting of swap space.

Anyway, in cases where no memory limits are in place, we certainly should be showing the entirety of the swap.
For cases where memory limits are in place and swap accounting is disabled, then I guess showing no swap is probably close enough to reality.

Then for cases where swap accounting is enabled and a limit has been set, we can reliably compute the amount of memory and swap being used (memory.usage_in_bytes for memory, memory.usage_in_bytes - memsw.usage_in_bytes for swap), but we can't exactly control how much of each you can use (we can set how much RAM you're allowed to use and how much RAM+SWAP you're allowed to use, but not how much SWAP you're allowed to use, so that makes rendering the swap disk size quite tricky).

Every time this is brought up people come around with "we just have to do x to calculate swap usage reliably" to which I always have the same reply "no, you can't". We've tried for years and it can't be done in a way to make it fully correct especially not for all of use-cases. If we change this again someone else will come along and complain about swap values being wrong when they're not shown as 0. I'll see what I can do but I just expect to be back here with another issue in a few months.

brauner pushed a commit to brauner/lxcfs that referenced this issue Nov 3, 2020
Closes: lxc#434
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
stgraber pushed a commit that referenced this issue Nov 3, 2020
Closes: #434
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
@webdock-io
Copy link
Author

webdock-io commented Nov 4, 2020

@brauner @stgraber I appreciate the effort here, and I understand this must be something that brings up feelings of annoyance especially since I've been able to dig up posts with people complaining about this for about 2+ years in various respects.

However, I'd like to urge you to consider being a bit more pragmatic with the latest changes here, all things considered.

I have done a number of tests and read through the proc_fuse.c commits, and although it is greatly appreciated the optimiziations and (honestly very nice) changes that have been made, the fact remains that you are at this point in time leaving both us legacy greabeard users and new users of LXD with a bit of confusion and a basic problem.

Please consider the following:

  • A new user/adopter of LXD fires it up on a minty-fresh Ubuntu system only to find that there is 0K swap. She checks the docs at https://github.com/lxc/lxd/blob/master/doc/instances.md and sees that limits.memory.swap has a default of true and starts scratching her head. She starts Googling only to find that she needs to set swapaccount=1 and reboot. It is also unclear if she needs to add cgroup_enable=memory to the cmdline argument as there are differing opinions on that out there as well.

After the reboot, with the latest change from @brauner she will now see the entire host swap allowance available for the container. Awesome. If she wants to disable swap entirely she now just needs to set limits.memory.swap=false. If she wants to limit the container to some specific swap allowance now that she has that sweet, sweet swapaccount=1 enabled, she now needs to Google further - and maybe if he/she is lucky - she will stumble on this post and find that she needs to do something like:

lxc config set testswap raw.lxc="lxc.cgroup.memory.memsw.limit_in_bytes = 4G"

Where this value is memory allowance + swap allowance.

At a bare minimum, the docs need to be improved to mention these use cases so new users know what's going on and how to deal with swap.

  • The other case is a legacy user, like myself, who has been happily using LXD from way-back-when swap limiting wasn't really a thing that worked in any conceivable way, there was never any incentive to do swapaccount=1 as containers just "inherited" the host swap space allowance by default. This legacy user now has a great number of systems on production where downtime is a sensitive issue. For some older host machines a reboot may result in 20 minutes of downtime, while a snap refresh lxd would typically only take a minute or two.

This user would really love to use the new swap limiting features, but will probably only do so on new systems where they can set this up beforehand, as they are totally aware that this requires a reboot.

The problem now becomes that this user is locked-in on old LXD versions until some "excuse" comes up in the future where they can afford to do a full reboot of the current production systems.

You are now requireing us to do a system reboot in order to get back to a long-standing status-quo for a fundamental system resource.

I hope that this resonates with you and for this reason - if you at all have any love for us "using LXD in real-life production environments" users, then I would like to urge you to reconsider the behavior here.

IF you allow swap space = host swap space again, when swapaccount=0 then you solve two problems:

  1. Us legacy users are less boned and can upgrade our systems without having to resort to a full system reboot (if we want any swap available for containers, that is)

  2. New users will not hit the 0K swap issue immediately (although, that can be remedied somewhat by expanding on the docs, as mentioned above)

Edit: This may not be the solution you want - so an alternative could maybe be to add a (somewhat hidden?) config parameter that allows for swap space in containers (equal to host swap) for us legacy users? Anyway, just a thought. Something like that would help us out in any case :)

It may not show, but I really tried to keep this post short and sweet - haha.

@stgraber
Copy link
Member

stgraber commented Nov 5, 2020

@webdock-io https://github.com/lxc/lxcfs/pull/436/files is the policy I'm proposing for SWAP in LXCFS. With that clearly explained we'll add tests in our CI and will stop flip/flopping the way we handle things every time something breaks somewhere.

Swap accounting and limits just plain suck, that's unfortunately how things are on Linux and every time we try to fix things for someone we break them for someone else, so hopefully this time we can have a clear agreement on how LXCFS behaves and why, have tests to confirm this behavior daily and consider it set in stone.

@webdock-io
Copy link
Author

webdock-io commented Nov 5, 2020

OK I get that you want to make things uniform @stgraber - and I also realize that what lxcfs is doing is trying to show available swap space and swap usage and not actually limiting anything. I was under the mistaken impression that 0K swap shown in a container meant they had 0K swap available and were subject to OOM.

With that said, I think you are glossing over my concerns a little bit. Now that I know this is not a hard cap we are dealing with here, this is much less critical than I thought, but there is still a usability and documentation problem here.

i.e. that the limits.memory.swap=false config parameter, as described in the docs is incorrect as it doesn't actually do what the description says it does.

And the fact that you do not document in https://github.com/lxc/lxd/blob/master/doc/instances.md the swapaccount=1 requirement, as well as give clear explanation / guidelines on how to use raw.lxc

I think it would be appropriate to document these things, now that you have settled on a model, so that especially new/novice users know what is going on (and save yourself from more swap related posts on the forum moving forward)

  • With that said, I still feel that we have a bit of a problem on our existing systems. If we upgade LXD on these, containers will show 0K swap (with no way for us to have it show anything else - except a full system reboot) and we will without a doubt then receive support requests from clients asking "where did my swap go?!"

In your clarification you write:

When swapaccount isn't enabled, no SWAP space is reported at all. (...) Showing the host value would be completely wrong, showing a 0 value would be equallty wrong.

I feel you are conflating swap utilization with swap availability in this section. I agree it is technically wrong to show the host consumption, and showing 0K consumption is also wrong. However, it is also wrong - to a more serious extent in my mind - to show 0K swap availability to the user - as that kicks up confusion.

I think a sane and reasonable approach would be to show the host consumption as well as availability in that case. Despite it being technically wrong as seen at the container level. Why? Because that more accurately reflects availability at the very least, and prevents users from freaking out about OOM and asking themselves "wait what, why?"

That's just my two cents, and something I hope you will reconsider. I know there is no perfect solution here, but I think you are potentially setting yourself up for more noise from the userbase moving forward with the current formulation - especially if you do not provide clear explanations and guidelines in the docs, once this is all set in stone.

@stgraber
Copy link
Member

stgraber commented Nov 5, 2020

Having to potentially pass swapaccount=1 entirely depends on the distribution and kernel version so LXD's current behavior of logging a warning if the cgroup controller is missing, as it does all others feels appropriate. It's up to the user to lookup their distribution's instructions on enabling it. For some it will be a kernel rebuild, for and it's an alternative existing kernels, in some cases it's always enabled and on some you need the boot time option. I'll updated the proposed LXCFS readme to better reflect those various options and not rely on swapaccount so much.

limits.memory.swap should indeed get an update to say it will discourage swap usage for the instance but not state it will prevent it.

For the swap behavior when swapaccount isn't enabled, all options suck for different reasons.
Showing the host value would certainly be correct as far as availability but would completely throw off anything that cares about usage and it wouldn't be obvious to the user that something is missing in their configuration.

Not reporting any swap will lead in the user looking for something wrong, hopefully seeing the LXD warning or similar message from their runtimes of choice and eventually get to the LXCFS readme so they can get onto a system configuration where memory limits are meaningful.

@webdock-io
Copy link
Author

Ok I will buy that argument @stgraber - and I am pleased you take other distros into account. I assumed that since we are all drinking the Canonical kool-aid that we would be favoring Ubuntu and its defaults here :)

I agree that when something is wrong - if you notice it - you will start to look for resolutions. However, that should not negate having a mention of the potential problem in the docs up-front so new users won't have to waste too much time looking for answers (and maybe finding the wrong ones - Googling stuff is after all, down to a bit of skill in some respects) - I hope you keep this in mind for further updates.

Anyway, I'll leave it here. I think you know why I voiced my concerns here (as I am directly affected) and that you took it all in the good spirits and recognize the solely good intentions which were the basis of my posts here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants