Kibana shuts down over time: out of memory? #17

spujadas · 2016-01-20T16:57:18Z

See #16 for background.

spujadas · 2016-01-20T19:19:51Z

@jchannon OK, just started a container, I'm monitoring the (idle) container using New Relic and tracking memory usage and next victim of oom_killer (which is presumed to be how Kibana gets killed once its memory usage gets out of hand) using dstat.
So far (30 minutes in), Elasticsearch and Logstash are both stable at 270MB and 180MB of memory, and Kibana is using 150MB and on the rise, up from 120MB half an hour ago.
I'll leave everything up and running overnight and "hope" that Kibana dies with useful clues.

jchannon · 2016-01-20T19:23:30Z

I think it took mine from 24-36 hours

On Wednesday, 20 January 2016, Sébastien Pujadas notifications@github.com
wrote:

@jchannon https://github.com/jchannon OK, just started a container, I'm
monitoring the (idle) container using New Relic and tracking memory usage
and next victim of oom_killer (which is presumed to be how Kibana gets
killed once its memory usage gets out of hand) using dstat.
So far (30 minutes in), Elasticsearch and Logstash are both stable at
270MB and 180MB of memory, and Kibana is using 150MB and on the rise, up
from 120MB half an hour ago.
I'll leave everything up and running overnight and "hope" that Kibana dies
with useful clues.

—
Reply to this email directly or view it on GitHub
#17 (comment).

spujadas · 2016-01-20T19:53:25Z

Goodness me! Right, I'd better restart the container with much less memory available to begin with!

spujadas · 2016-01-21T06:36:02Z

Still running…

$ docker stats elk
CONTAINER           CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
elk                 3.81%               770.6 MB/838.9 MB   91.86%              7.964 MB/2.208 MB

Overall memory usage:

Kibana memory usage:

Elastisearch's and Logstash's memory usage is fairly constant (and even somewhat decreasing).

Candidate process for oom_killer if/when memory runs own was initially java (Elasticsearch or Logstash), but is now node (Kibana).

# dstat --mem --top-oom --top-mem
------memory-usage----- --out-of-memory--- --most-expensive-
 used  buff  cach  free|    kill score    |  memory process
1271M  362M  288M 79.8M|node          152 |node         377M

So at this point, Kibana's cyclic and increasing sawtooth memory usage trend suggests that it will ultimately make the container run out of memory and be killed by oom_killer, which reproduces and explains the issue (not Docker-specific), but doesn't solve it.

Will leave the container running for the day and run more tests this evening with node's--max-old-space-size option to try and mitigate the problem.

jchannon · 2016-01-21T09:43:14Z

I've just started it up and its sitting at 459MB of 1.023 GB, will keep any eye.

What tool did you use to get those graphs? 😄

spujadas · 2016-01-21T10:00:35Z

I'm using New Relic to get the graphs: it's SaaS so no server to set up on my side, just a client-side agent to apt-get install in the running container and hey presto!

spujadas · 2016-01-21T20:22:03Z

OK, Kibana died after roughly 16 hours (had limited the container's memory to 800MB, Kibana crashed after peaking at 424MB), so can confirm the issue 😄

Next step: same thing, but limiting NodeJS's maximum heap size to a lower value and seeing what happens. Will keep you posted.

jchannon · 2016-01-21T20:26:55Z

Thanks

I saw ours rise to 565mb of 1023mb but no crash yet

On Thursday, 21 January 2016, Sébastien Pujadas notifications@github.com
wrote:

OK, Kibana died after roughly 16 hours (had limited the container's memory
to 800MB, Kibana crashed after peaking at 424MB), so can confirm the issue [image:
😄]
[image: screenshot-rpm newrelic com 2016-01-21 20-31-56]
https://cloud.githubusercontent.com/assets/930566/12493144/f42b1450-c083-11e5-8831-ec0e5c9b3c79.png

Next step: same thing, but limiting NodeJS's maximum heap size to a lower
value and seeing what happens. Will keep you posted.

—
Reply to this email directly or view it on GitHub
#17 (comment).

spujadas · 2016-01-22T06:40:18Z

After about 10 hours, memory usage kind of looks better than it did during the previous test.

CONTAINER           CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
elk                 1.35%               437.9 MB/838.9 MB   52.20%              14.78 MB/7.004 MB

Kibana's behaviour seems reasonable (currently peaking at 240MB), but there is an upward trend in the memory usage, so let's see how this goes during the next few hours.

Again, Kibana is the top candidate for oom_killer if anything goes south.

jchannon · 2016-01-22T08:58:18Z

Yup my Kibana fell over during the night. Although I don't have anything
monitoring it really. Looks like you have all the right tools to keep an
eye on this :)

Thanks

On 22 January 2016 at 06:40, Sébastien Pujadas notifications@github.com
wrote:

After about 10 hours, memory usage kind of looks better than it did
during the previous test.

CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
elk 1.35% 437.9 MB/838.9 MB 52.20% 14.78 MB/7.004 MB

Kibana's behaviour seems reasonable (currently peaking at 240MB), but
there is an upward trend in the memory usage, so let's see how this goes
during the next few hours.

[image: screenshot-rpm newrelic com 2016-01-22 07-26-40]
https://cloud.githubusercontent.com/assets/930566/12504003/fd9cc910-c0d9-11e5-9e80-64c72841b218.png

Again, Kibana is the top candidate for oom_killer if anything goes south.

—
Reply to this email directly or view it on GitHub
#17 (comment).

spujadas · 2016-01-22T19:48:05Z

Right, memory usage seems to remain under control, so I updated the image.

If you could give it a spin and tell me how it goes that would be terrific.

jchannon · 2016-01-22T19:48:40Z

Brill. What do you think the issue is?

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

Right, memory usage seems to remain under control, so I updated the image.

[image: screenshot-rpm newrelic com 2016-01-22 20-00-52]
https://cloud.githubusercontent.com/assets/930566/12521099/d50d2b88-c148-11e5-9fa4-3594672579d1.png

If you could it a spin and tell me how it goes that would be terrific.

—
Reply to this email directly or view it on GitHub
#17 (comment).

spujadas · 2016-01-22T19:53:28Z

According to elastic/kibana#5170, the most likely explanation is that Kibana's underlying NodeJS is failing to collect garbage properly, which might be due to NodeJS getting confused by Docker and not being able to figure out how much memory is actually available. Solved by forcing garbage collection when the heap reaches 250MB.

jchannon · 2016-01-22T19:59:02Z

Ah interesting. Thanks for the help. Will pull it down and try

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

According to elastic/kibana#5170
elastic/kibana#5170, the most likely
explanation is that Kibana's underlying NodeJS is failing to collect
garbage properly, which might be due to NodeJS getting confused by
Docker and not being able to figure out how much memory is actually
available. Solved by forcing garbage collection when the heap reaches
250MBs.

—
Reply to this email directly or view it on GitHub
#17 (comment).

jchannon · 2016-01-22T20:42:33Z

Have pulled it and deployed. One thing I noticed, I ran docker stats elk
and then in the browser, kept refreshing a saved search in kibana and the
mem usage in docker stats kept increasing.

On 22 January 2016 at 19:59, Jonathan Channon jonathan.channon@gmail.com
wrote:

Ah interesting. Thanks for the help. Will pull it down and try

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

According to elastic/kibana#5170
elastic/kibana#5170, the most likely
explanation is that Kibana's underlying NodeJS is failing to collect
garbage properly, which might be due to NodeJS getting confused by
Docker and not being able to figure out how much memory is actually
available. Solved by forcing garbage collection when the heap reaches
250MBs.

—
Reply to this email directly or view it on GitHub
#17 (comment)
.

spujadas · 2016-01-22T21:20:49Z

Tried it, same here, but nothing too dramatic, and eventually dropped to the initial level (garbage collection kicking in perhaps?). When left alone ps aux reports about 150MB (and slooooowly climbing, as usual) for Kibana's RSS.
How bad is it on your side?

jchannon · 2016-01-22T23:55:34Z

Kibana seems to be at 22% mem usage of the container and RSS is 226748 using ps aux | grep -E "RSS|159"

docker stats elk reports 642mb although I can't seem to get that to drop although once I have at arounf 670mb it doesnt seem to go higher and the RSS seems stable. I'll leave it overnight and see what state its in

spujadas · 2016-01-23T17:26:40Z

Sounds about right. Here's the latest on my end (quick and dirty hack plotted using R based on raw ps aux data)

The red line shows max value 341MB (at which point I assume garbage is collected), container stays under 900MB (albeit with zero incoming log activity)… so we might be out of the woods!

jchannon · 2016-01-23T17:32:57Z

Just checked and all values are not really higher than yesterday so fingers crossed.

I'll keep it running over the weekend and check Monday morning to see if its still up.

Thanks for the awesome help 👍

spujadas · 2016-01-23T17:33:54Z

All righty! Cheers!

jchannon · 2016-01-25T09:14:20Z

Stil up so its looking good 😄

spujadas · 2016-01-25T09:21:28Z

Cool! Same over here. I'll leave this issue open for a few more days and if everything continues playing nicely I'll close it.

jchannon · 2016-01-25T09:22:04Z

Brill, thanks for the support

On 25 January 2016 at 09:21, Sébastien Pujadas notifications@github.com
wrote:

Cool! Same over here. I'll leave this issue open for a few more days and
if everything continues playing nicely I'll close it.

—
Reply to this email directly or view it on GitHub
#17 (comment).

jchannon · 2016-01-28T10:05:11Z

Its stil up so I think we're good! 👍 😄

spujadas · 2016-01-28T13:47:43Z

😃 Thanks so much for your feedback, same behaviour here, so… closing the issue!

jalagrange · 2016-02-11T19:22:13Z

Hey guys, I'm experiencing the same behavior. Could you point out a way to define when garbage collection should be triggered on my server? or node instance? what should I configure for this to happen?

jchannon · 2016-02-11T20:01:36Z

I dont think you need to do anything if you have pulled the latest image

spujadas · 2016-02-11T20:02:03Z

@jalagrange Are you experiencing this behaviour with the latest version of the image? This should have been solved by aaa09d3 that I published a few weeks ago (by the way, if you take a look at that specific commit, you'll see how I configured garbage collection, you'll also want to have a look at elastic/kibana#5170 for background information on this issue).

Also, how much memory are you dedicating to the container?

jalagrange · 2016-02-11T21:30:39Z

Wao thanks guys, that was quick... I am actually running kibana 4.3.1 directly on an CentOS server that connects to a remote Elastic Search cluster, not using docker. But take a look at my node memory usage in 3 hours without any type of usage, it sounds very similar to what you guys are describing:

I am currently running this on an AWS micro instance so 1GB of memory.

spujadas · 2016-02-11T21:44:47Z

Ah yes, looks familiar. elastic/kibana#5170 is what you want to have a look at for the non-Docker version of the issue (long story short: setting NODE_OPTIONS="--max-old-space-size=250" before starting Kibana should solve the problem).
Can't help beyond that as this is really a Kibana issue (I'm merely packaging it as a Docker image!), so if setting NODE_OPTIONS doesn't help, you might want to consider filing an issue with the Kibana guys.

jalagrange · 2016-02-11T21:49:56Z

Thanks a lot @spujadas ! I did just that and took a look at the issue you mentioned. I'm pretty confident it will work but I'l post back in case it doesn't. Just to expand your reply,

NODE_OPTIONS="--max-old-space-size=250"

Must be set at the beginning of bin/kibana that is being executed, 250 being the number of MB you wish to cap the process at. (in case someone else stumbles onto this)

itsAnuga · 2016-09-30T08:54:18Z

Yup, our kibana on Ubuntu is constantly crashing because of this as well.

spujadas self-assigned this Jan 20, 2016

spujadas mentioned this issue Jan 20, 2016

Kibana log files #16

Closed

spujadas added a commit that referenced this issue Jan 22, 2016

limiting NodeJS heap to 250MB (solves #17)

aaa09d3

spujadas closed this as completed Jan 28, 2016

deviantony mentioned this issue Apr 28, 2016

Kibana container shutting down in less than a day deviantony/docker-elk#51

Closed

spujadas mentioned this issue Jan 19, 2017

kibana daemon "--max-old-space-size=250" may cause bundling issue #105

Closed

spujadas mentioned this issue Mar 24, 2017

elasticsearch "curl: (52) Empty reply from server" on port 9200, Unable to revive connection #123

Closed

tedder mentioned this issue Feb 17, 2020

kibana service is failing in ELK 7.3. #295

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana shuts down over time: out of memory? #17

Kibana shuts down over time: out of memory? #17

spujadas commented Jan 20, 2016

spujadas commented Jan 20, 2016

jchannon commented Jan 20, 2016

spujadas commented Jan 20, 2016

spujadas commented Jan 21, 2016

jchannon commented Jan 21, 2016

spujadas commented Jan 21, 2016

spujadas commented Jan 21, 2016

jchannon commented Jan 21, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 23, 2016

jchannon commented Jan 23, 2016

spujadas commented Jan 23, 2016

jchannon commented Jan 25, 2016

spujadas commented Jan 25, 2016

jchannon commented Jan 25, 2016

jchannon commented Jan 28, 2016

spujadas commented Jan 28, 2016

jalagrange commented Feb 11, 2016

jchannon commented Feb 11, 2016

spujadas commented Feb 11, 2016

jalagrange commented Feb 11, 2016

spujadas commented Feb 11, 2016

jalagrange commented Feb 11, 2016

itsAnuga commented Sep 30, 2016

Kibana shuts down over time: out of memory? #17

Kibana shuts down over time: out of memory? #17

Comments

spujadas commented Jan 20, 2016

spujadas commented Jan 20, 2016

jchannon commented Jan 20, 2016

spujadas commented Jan 20, 2016

spujadas commented Jan 21, 2016

jchannon commented Jan 21, 2016

spujadas commented Jan 21, 2016

spujadas commented Jan 21, 2016

jchannon commented Jan 21, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 22, 2016

jchannon commented Jan 22, 2016

spujadas commented Jan 23, 2016

jchannon commented Jan 23, 2016

spujadas commented Jan 23, 2016

jchannon commented Jan 25, 2016

spujadas commented Jan 25, 2016

jchannon commented Jan 25, 2016

jchannon commented Jan 28, 2016

spujadas commented Jan 28, 2016

jalagrange commented Feb 11, 2016

jchannon commented Feb 11, 2016

spujadas commented Feb 11, 2016

jalagrange commented Feb 11, 2016

spujadas commented Feb 11, 2016

jalagrange commented Feb 11, 2016

itsAnuga commented Sep 30, 2016