cannot allocate memory #489

klausenbusk · 2015-09-23T14:46:36Z

I get his error on CoreOS alpha (808.0.0), through I have "plenty" of memory available (300mb+).

Like this:

Error response from daemon: open /var/lib/docker/overlay/4147d85b3462f5e8db7d445913e63e680ce113c7361f0b3bcd95b3bdbf3d3e11-init/merged/etc/resolv.conf: cannot allocate memory

/proc/meminfo

MemTotal:         505456 kB
MemFree:            5952 kB
MemAvailable:     372596 kB
Buffers:           97248 kB
Cached:           209948 kB
SwapCached:            0 kB
Active:           265772 kB
Inactive:         134192 kB
Active(anon):      92988 kB
Inactive(anon):      144 kB
Active(file):     172784 kB
Inactive(file):   134048 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:         92776 kB
Mapped:            65592 kB
Shmem:               356 kB
Slab:              86292 kB
SReclaimable:      70132 kB
SUnreclaim:        16160 kB
KernelStack:        1760 kB
PageTables:         2480 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      252728 kB
Committed_AS:     556088 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        3220 kB
VmallocChunk:   34359732520 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       94200 kB
DirectMap2M:      430080 kB
DirectMap1G:           0 kB

I haven't found a way to reproduce it. Got it yesterday on one of my web node, and today on the second web node.

The text was updated successfully, but these errors were encountered:

klausenbusk · 2015-09-23T14:46:55Z

I found a user with similar problems https://groups.google.com/forum/#!topic/coreos-user/t1RI2K1BdV0

klausenbusk · 2015-09-23T15:11:38Z

Found a way to reproduce it (tested on Digtal Ocean 512mb node):

sudo cat /dev/vda > /dev/null (fill up memory cache/buffer)
docker run --rm -t -i debian:jessie ls
Error response from daemon: open /var/lib/docker/overlay/3aa2541f3c93a58a9dfbfbca6511a18d638c772608f255f38c4c3dc86512c2bb-init/merged/dev/console: cannot allocate memory
Stop the cat progress.
Try 2 again, it still error.
sysctl vm.drop_caches=3 (clear cache/buffer)
Try 2 again, it work.

Edit: It seems like Docker isn't "freeing" the buffer/cache memory.

klausenbusk · 2015-09-23T16:04:28Z

Uploaded meminfo here: http://sprunge.us/GCNa
On a fresh 512MB CoreOS Alpha node, with the above step.

klausenbusk · 2015-09-23T16:43:58Z

Cannot reproduce on CoreOS Stable 766.3.0

bfallik · 2015-09-23T18:44:38Z

This looks very similar to the issue I reported on coreos-user (linked above).

leonfs · 2015-09-28T21:10:31Z

I've been experiencing the same issue today on alpha (815.0.0) when running random commands (apt-get install dnsutils) within a container.

Do you have Swap enabled on the droplet? I had similar problems on Ubuntu and by enabling swap all problems went away. But in CoreOS it doesn't seem to work. Swap has more than 900M available and still throws "cannot allocate memory" error.

klausenbusk · 2015-09-28T21:21:34Z

Do you have Swap enabled on the droplet? I had similar problems on Ubuntu and by enabling swap all problems went away. But in CoreOS it doesn't seem to work. Swap has more than 900M available and still throws "cannot allocate memory" error.

Nope, I think swap is just a workaround. It seems like docker don't free the memory used by the cache/buffer, it is my theory at least.

mischief · 2015-09-28T21:24:04Z

can you check slabtop when you hit oom condition?

leonfs · 2015-09-28T21:30:40Z

The following is the initial state of slabtop:

Active / Total Objects (% used) : 214570 / 236605 (90.7%)
Active / Total Slabs (% used) : 8280 / 8280 (100.0%)
Active / Total Caches (% used) : 73 / 96 (76.0%)
Active / Total Size (% used) : 45571.83K / 48884.55K (93.2%)
Minimum / Average / Maximum Object : 0.01K / 0.21K / 16.44K

Then I run on a container: apt-get install nmap (just a random command)

Preparing to unpack .../python-lxml_3.4.0-1_amd64.deb ...
Unpacking python-lxml (3.4.0-1) ...
dpkg: error processing archive /var/cache/apt/archives/python-lxml_3.4.0-1_amd64.deb (--unpack):
error creating directory `./usr/lib/python2.7/dist-packages/lxml': Cannot allocate memory
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
Errors were encountered while processing:
/var/cache/apt/archives/python-lxml_3.4.0-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

$ free -h
total used free shared buffers cached
Mem: 493M 464M 29M 380K 9.1M 88M
-/+ buffers/cache: 366M 126M
Swap: 1.0G 160M 863M

klausenbusk · 2015-09-30T12:09:09Z

Just hit it on CoreOS stable (766.3.0)
When starting docker with from a systemd service file.

9d: iptables failed: iptables --wait -t nat -A DOCKER -p udp -d 172.17.42.1 --dport 53 -j DNAT --to-destination 172.17.0.239:53 ! -i docker0:  (fork/exec /usr/sbin/iptables: cannot allocate memory)

/proc/meminfo

MemTotal:         505380 kB
MemFree:          141308 kB
MemAvailable:     304672 kB
Buffers:           69176 kB
Cached:            89980 kB
SwapCached:            0 kB
Active:           284496 kB
Inactive:          31316 kB
Active(anon):     158288 kB
Inactive(anon):      696 kB
Active(file):     126208 kB
Inactive(file):    30620 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              3028 kB
Writeback:             0 kB
AnonPages:        156684 kB
Mapped:            44560 kB
Shmem:              2328 kB
Slab:              32604 kB
SReclaimable:      16892 kB
SUnreclaim:        15712 kB
KernelStack:        2192 kB
PageTables:         3356 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      252688 kB
Committed_AS:    1029684 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        3304 kB
VmallocChunk:   34359732584 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      528376 kB
DirectMap2M:    18446744073709547520 kB
DirectMap1G:           0 kB

Slabtop

 Active / Total Objects (% used)    : 151776 / 166505 (91.2%)
 Active / Total Slabs (% used)      : 6014 / 6014 (100.0%)
 Active / Total Caches (% used)     : 72 / 97 (74.2%)
 Active / Total Size (% used)       : 32526.62K / 35323.95K (92.1%)
 Minimum / Average / Maximum Object : 0.01K / 0.21K / 16.00K

klausenbusk · 2015-09-30T12:11:56Z

May be related moby/moby#8539

bfallik · 2015-09-30T14:29:36Z

@klausenbusk are you sure the iptables issue is the same? The previous issue you noted and the one I emailed the list about were trigged by launching docker containers, not by running docker itself. Both exhibit the same OOM symptom but they might have different root causes.

klausenbusk · 2015-09-30T15:21:07Z

@klausenbusk are you sure the iptables issue is the same?

Not sure, they is both a memory allocate problem. The iptables error is also when launching a docker container.

klausenbusk · 2015-09-30T15:25:02Z

@bfallik Could you try to set /sys/kernel/mm/transparent_hugepage/enabled to always, and see if it fix it? I can't reproduce the issue anymore on alpha, so can't test if transparent_hugepage fix it.

bfallik · 2015-09-30T17:27:34Z

@klausenbusk yes I can but I'll need to get back to you later today or tomorrow.

bfallik · 2015-09-30T19:26:54Z

@klausenbusk Unless I'm misreading transparent_hugepage/enabled is set to always automatically:

core@core-01 ~ $ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

This is the default setting. I'm still easily able to reproduce the memory allocation errors on CoreOS 815.0.0.

Am I misreading the output or did you mean to suggest I try changing the setting to "madvise" or "never"?

klausenbusk · 2015-09-30T19:44:42Z

This is the default setting

On Digital Ocean the default is never. What does /proc/buddyinfo? Maybe it some memory fragmentation, through I think the kernel should handle that..

bfallik · 2015-09-30T19:51:53Z

Ah, interesting. I'm running CoreOS from Vagrant on VirtualBox.

What does /proc/buddyinfo?

core@core-01 ~ $ cat /proc/buddyinfo
Node 0, zone      DMA     33    105     87     31     11      2      0      0      0      0      0
Node 0, zone    DMA32   3283   3386   1667    703    395    226    113     44     14      1      0

klausenbusk · 2015-09-30T20:06:20Z

@bfallik Could to try setting vm.min_free_kbytes to 50000? I'm really not a expert, just trying to understand how memory work in Linux :) and haven't found a reliable way to reproduce it in the last alpha.

bfallik · 2015-09-30T20:13:02Z

@klausenbusk Sure, no problem. I can reliably reproduce this failure every time on alpha so I'm happy to run trials to help resolve this.

Speaking of which I think we're making progress! I can reliable run the tests by increasing the min_free_kbytes and I can reproduce the failure by reseting it back to the original value.

bfallik · 2015-09-30T20:16:15Z

Ugh, I pressed submit too soon. Please ignore my previous comment. Increasing min_free_kbytes did allow the tests to pass but reseting it back to the original value had no effect. I need to poke around a bit more.

klausenbusk · 2015-09-30T20:27:35Z

Some links, I found when trying to understand memory:
http://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases
https://www.kernel.org/doc/Documentation/sysctl/vm.txt (all the config options)
http://stackoverflow.com/questions/21374491/vm-min-free-kbytes-why-keep-minimum-reserved-memory (explanation of when vm.min_free_kbytes is useful)

but reseting it back to the original value had no effect.

I think you need to fell the buffer/cache again. See step 1.

I do the following steps to trigger it on my node:

Fill up buffer/cache until free -m says 5 free, with sudo bash -c "cat /dev/vda > /dev/null" and sudo find / -xdev -exec sha1sum {} \;
Run docker run --rm -t -i debian:jessie ls and most of the time it says cannot allocate memory.
sudo sysctl vm.min_free_kbytes=10000 and docker run work.
Doing 1 again, it still work.
sudo sysctl vm.min_free_kbytes=2000 do 1 again and I get cannot allocate memory again.

leonfs · 2015-10-01T10:28:58Z

@bfallik What is your process to reproduce the error? I would like to help you debug it.

bfallik · 2015-10-01T16:34:55Z

@leonfs Thanks!

It's fairly simple for us to reproduce. We had a set of Docker containers we run as part of our internal integration tests. As of CoreOS 808.0.0 we encounter memory errors every time we invoke the tests on our standard VM size with 1024GB of RAM. In 766.x and earlier versions the tests always pass.

As a workaround for our developers we've increased the default VM size to 1536GB. Obviously this is not a real fix. The increase to 1536GB is why I was confused yesterday. My local VM had been recreated at that the larger size which explained why I saw inconsistent results.

Now that I'm back to 1024GB RAM, here are my results of testing against a fresh 1024GB VM running 815.0.0. Sorry if this is messy but I can clarify if you have any questions.

trial 1

min_free_kbytes at default (3846);

Error response from daemon: symlink /proc/mounts
/var/lib/docker/overlay/54c6c3840146217ac28fa11af738dc8e8114fd258261e114b6e8c50fb9664c77-
init/merged/etc/mtab: cannot allocate memory

trial 2

(same config as trial 1)

trial 3

core@core-01 ~ $ sudo sh -c "echo 50000 >/proc/sys/vm/min_free_kbytes"
core@core-01 ~ $ cat /proc/sys/vm/min_free_kbytes
50000

tests pass; tried twice in a row

trial 4

core@core-01 ~ $ sudo sh -c "echo 3846 >/proc/sys/vm/min_free_kbytes"
core@core-01 ~ $ cat /proc/sys/vm/min_free_kbytes
3846

tests pass; tried twice in a row

I'm not sure I understand the real effect, if any, min_free_kbytes is having on the results. I'm also not sure if changing this to a lower setting has any effect or if the kernel can only increase the minimum.

It may also be worth mentioning that failures occur very early in the process before the point where the final container, the one actually running the tests, gets started. We encounter memory errors running containers that wrap test dependencies like postgres and redis.

klausenbusk · 2015-10-01T19:20:30Z

tests pass; tried twice in a row

Excepted, it take some time before the 461454 kbytes is used again, by cache/buffer. Try wait a little while, or fill the cache/buffer. Kind of explanation of cache/buffer here: http://stackoverflow.com/questions/6345020/linux-memory-buffer-vs-cache
Explanation of why increasing min_free_kbytes help: http://stackoverflow.com/questions/21374491/vm-min-free-kbytes-why-keep-minimum-reserved-memory (first answer)

vcaputo · 2015-10-02T22:27:09Z

I've been able to reproduce this without involving docker at all, so we can eliminate that variable.

Right now it looks like a kernel regression, and overlayfs is effective at triggering it. Simply read /dev/vda into /dev/null as @klausenbusk did above then try write to a file on overlayfs triggering a copy_up, it will trigger the allocation failure. There are plenty of reclaimable buffers but for some reason they aren't being reclaimed before kmalloc fails.

bfallik · 2015-10-05T13:21:20Z

@vcaputo nice catch! hopefully the isolated reproduction case will make it easier to track down the regression.

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> (cherry picked from commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe)

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> (cherry picked from commit f641fdbd8ca737df49e96d9d58206fb0a9d82512)

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe upstream. Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> (cherry picked from commit 7babbe3844b400ccf9e95974bfc71b4ff0a89f77)

[ Upstream commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe ] Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes coreos/bugs#489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

mischief added the component/docker label Sep 28, 2015

crawford added the kind/regression label Sep 29, 2015

klausenbusk mentioned this issue Sep 30, 2015

Error response from daemon: Cannot start container (fork/exec /usr/sbin/iptables: cannot allocate memory) moby/moby#8539

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot allocate memory #489

cannot allocate memory #489

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

bfallik commented Sep 23, 2015

leonfs commented Sep 28, 2015

klausenbusk commented Sep 28, 2015

mischief commented Sep 28, 2015

leonfs commented Sep 28, 2015

klausenbusk commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

leonfs commented Oct 1, 2015

bfallik commented Oct 1, 2015

klausenbusk commented Oct 1, 2015

vcaputo commented Oct 2, 2015

bfallik commented Oct 5, 2015

cannot allocate memory #489

cannot allocate memory #489

Comments

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

klausenbusk commented Sep 23, 2015

bfallik commented Sep 23, 2015

leonfs commented Sep 28, 2015

klausenbusk commented Sep 28, 2015

mischief commented Sep 28, 2015

leonfs commented Sep 28, 2015

klausenbusk commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

bfallik commented Sep 30, 2015

bfallik commented Sep 30, 2015

klausenbusk commented Sep 30, 2015

leonfs commented Oct 1, 2015

bfallik commented Oct 1, 2015

trial 1

trial 2

trial 3

trial 4

klausenbusk commented Oct 1, 2015

vcaputo commented Oct 2, 2015

bfallik commented Oct 5, 2015