Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana shuts down over time: out of memory? #17

Closed
spujadas opened this issue Jan 20, 2016 · 32 comments
Closed

Kibana shuts down over time: out of memory? #17

spujadas opened this issue Jan 20, 2016 · 32 comments
Assignees

Comments

@spujadas
Copy link
Owner

See #16 for background.

@spujadas spujadas self-assigned this Jan 20, 2016
@spujadas
Copy link
Owner Author

@jchannon OK, just started a container, I'm monitoring the (idle) container using New Relic and tracking memory usage and next victim of oom_killer (which is presumed to be how Kibana gets killed once its memory usage gets out of hand) using dstat.
So far (30 minutes in), Elasticsearch and Logstash are both stable at 270MB and 180MB of memory, and Kibana is using 150MB and on the rise, up from 120MB half an hour ago.
I'll leave everything up and running overnight and "hope" that Kibana dies with useful clues.

@jchannon
Copy link
Contributor

I think it took mine from 24-36 hours

On Wednesday, 20 January 2016, Sébastien Pujadas notifications@github.com
wrote:

@jchannon https://github.com/jchannon OK, just started a container, I'm
monitoring the (idle) container using New Relic and tracking memory usage
and next victim of oom_killer (which is presumed to be how Kibana gets
killed once its memory usage gets out of hand) using dstat.
So far (30 minutes in), Elasticsearch and Logstash are both stable at
270MB and 180MB of memory, and Kibana is using 150MB and on the rise, up
from 120MB half an hour ago.
I'll leave everything up and running overnight and "hope" that Kibana dies
with useful clues.


Reply to this email directly or view it on GitHub
#17 (comment).

@spujadas
Copy link
Owner Author

Goodness me! Right, I'd better restart the container with much less memory available to begin with!

@spujadas
Copy link
Owner Author

Still running…

$ docker stats elk
CONTAINER           CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
elk                 3.81%               770.6 MB/838.9 MB   91.86%              7.964 MB/2.208 MB

Overall memory usage:
screenshot-rpm newrelic com 2016-01-21 07-18-12

Kibana memory usage:
screenshot-rpm newrelic com 2016-01-21 07-16-35

Elastisearch's and Logstash's memory usage is fairly constant (and even somewhat decreasing).

Candidate process for oom_killer if/when memory runs own was initially java (Elasticsearch or Logstash), but is now node (Kibana).

# dstat --mem --top-oom --top-mem
------memory-usage----- --out-of-memory--- --most-expensive-
 used  buff  cach  free|    kill score    |  memory process
1271M  362M  288M 79.8M|node          152 |node         377M

So at this point, Kibana's cyclic and increasing sawtooth memory usage trend suggests that it will ultimately make the container run out of memory and be killed by oom_killer, which reproduces and explains the issue (not Docker-specific), but doesn't solve it.

Will leave the container running for the day and run more tests this evening with node's--max-old-space-size option to try and mitigate the problem.

@jchannon
Copy link
Contributor

I've just started it up and its sitting at 459MB of 1.023 GB, will keep any eye.

What tool did you use to get those graphs? 😄

@spujadas
Copy link
Owner Author

I'm using New Relic to get the graphs: it's SaaS so no server to set up on my side, just a client-side agent to apt-get install in the running container and hey presto!

@spujadas
Copy link
Owner Author

OK, Kibana died after roughly 16 hours (had limited the container's memory to 800MB, Kibana crashed after peaking at 424MB), so can confirm the issue 😄
screenshot-rpm newrelic com 2016-01-21 20-31-56

Next step: same thing, but limiting NodeJS's maximum heap size to a lower value and seeing what happens. Will keep you posted.

@jchannon
Copy link
Contributor

Thanks

I saw ours rise to 565mb of 1023mb but no crash yet

On Thursday, 21 January 2016, Sébastien Pujadas notifications@github.com
wrote:

OK, Kibana died after roughly 16 hours (had limited the container's memory
to 800MB, Kibana crashed after peaking at 424MB), so can confirm the issue [image:
😄]
[image: screenshot-rpm newrelic com 2016-01-21 20-31-56]
https://cloud.githubusercontent.com/assets/930566/12493144/f42b1450-c083-11e5-8831-ec0e5c9b3c79.png

Next step: same thing, but limiting NodeJS's maximum heap size to a lower
value and seeing what happens. Will keep you posted.


Reply to this email directly or view it on GitHub
#17 (comment).

@spujadas
Copy link
Owner Author

After about 10 hours, memory usage kind of looks better than it did during the previous test.

CONTAINER           CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
elk                 1.35%               437.9 MB/838.9 MB   52.20%              14.78 MB/7.004 MB

Kibana's behaviour seems reasonable (currently peaking at 240MB), but there is an upward trend in the memory usage, so let's see how this goes during the next few hours.

screenshot-rpm newrelic com 2016-01-22 07-26-40

Again, Kibana is the top candidate for oom_killer if anything goes south.

@jchannon
Copy link
Contributor

Yup my Kibana fell over during the night. Although I don't have anything
monitoring it really. Looks like you have all the right tools to keep an
eye on this :)

Thanks

On 22 January 2016 at 06:40, Sébastien Pujadas notifications@github.com
wrote:

After about 10 hours, memory usage kind of looks better than it did
during the previous test.

CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
elk 1.35% 437.9 MB/838.9 MB 52.20% 14.78 MB/7.004 MB

Kibana's behaviour seems reasonable (currently peaking at 240MB), but
there is an upward trend in the memory usage, so let's see how this goes
during the next few hours.

[image: screenshot-rpm newrelic com 2016-01-22 07-26-40]
https://cloud.githubusercontent.com/assets/930566/12504003/fd9cc910-c0d9-11e5-9e80-64c72841b218.png

Again, Kibana is the top candidate for oom_killer if anything goes south.


Reply to this email directly or view it on GitHub
#17 (comment).

@spujadas
Copy link
Owner Author

Right, memory usage seems to remain under control, so I updated the image.

screenshot-rpm newrelic com 2016-01-22 20-00-52

If you could give it a spin and tell me how it goes that would be terrific.

@jchannon
Copy link
Contributor

Brill. What do you think the issue is?

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

Right, memory usage seems to remain under control, so I updated the image.

[image: screenshot-rpm newrelic com 2016-01-22 20-00-52]
https://cloud.githubusercontent.com/assets/930566/12521099/d50d2b88-c148-11e5-9fa4-3594672579d1.png

If you could it a spin and tell me how it goes that would be terrific.


Reply to this email directly or view it on GitHub
#17 (comment).

@spujadas
Copy link
Owner Author

According to elastic/kibana#5170, the most likely explanation is that Kibana's underlying NodeJS is failing to collect garbage properly, which might be due to NodeJS getting confused by Docker and not being able to figure out how much memory is actually available. Solved by forcing garbage collection when the heap reaches 250MB.

@jchannon
Copy link
Contributor

Ah interesting. Thanks for the help. Will pull it down and try

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

According to elastic/kibana#5170
elastic/kibana#5170, the most likely
explanation is that Kibana's underlying NodeJS is failing to collect
garbage properly, which might be due to NodeJS getting confused by
Docker and not being able to figure out how much memory is actually
available. Solved by forcing garbage collection when the heap reaches
250MBs.


Reply to this email directly or view it on GitHub
#17 (comment).

@jchannon
Copy link
Contributor

Have pulled it and deployed. One thing I noticed, I ran docker stats elk
and then in the browser, kept refreshing a saved search in kibana and the
mem usage in docker stats kept increasing.

On 22 January 2016 at 19:59, Jonathan Channon jonathan.channon@gmail.com
wrote:

Ah interesting. Thanks for the help. Will pull it down and try

On Friday, 22 January 2016, Sébastien Pujadas notifications@github.com
wrote:

According to elastic/kibana#5170
elastic/kibana#5170, the most likely
explanation is that Kibana's underlying NodeJS is failing to collect
garbage properly, which might be due to NodeJS getting confused by
Docker and not being able to figure out how much memory is actually
available. Solved by forcing garbage collection when the heap reaches
250MBs.


Reply to this email directly or view it on GitHub
#17 (comment)
.

@spujadas
Copy link
Owner Author

Tried it, same here, but nothing too dramatic, and eventually dropped to the initial level (garbage collection kicking in perhaps?). When left alone ps aux reports about 150MB (and slooooowly climbing, as usual) for Kibana's RSS.
How bad is it on your side?

@jchannon
Copy link
Contributor

Kibana seems to be at 22% mem usage of the container and RSS is 226748 using ps aux | grep -E "RSS|159"

docker stats elk reports 642mb although I can't seem to get that to drop although once I have at arounf 670mb it doesnt seem to go higher and the RSS seems stable. I'll leave it overnight and see what state its in

@spujadas
Copy link
Owner Author

Sounds about right. Here's the latest on my end (quick and dirty hack plotted using R based on raw ps aux data)

rplot

The red line shows max value 341MB (at which point I assume garbage is collected), container stays under 900MB (albeit with zero incoming log activity)… so we might be out of the woods!

@jchannon
Copy link
Contributor

Just checked and all values are not really higher than yesterday so fingers crossed.

I'll keep it running over the weekend and check Monday morning to see if its still up.

Thanks for the awesome help 👍

@spujadas
Copy link
Owner Author

All righty! Cheers!

@jchannon
Copy link
Contributor

Stil up so its looking good 😄

@spujadas
Copy link
Owner Author

Cool! Same over here. I'll leave this issue open for a few more days and if everything continues playing nicely I'll close it.

@jchannon
Copy link
Contributor

Brill, thanks for the support

On 25 January 2016 at 09:21, Sébastien Pujadas notifications@github.com
wrote:

Cool! Same over here. I'll leave this issue open for a few more days and
if everything continues playing nicely I'll close it.


Reply to this email directly or view it on GitHub
#17 (comment).

@jchannon
Copy link
Contributor

Its stil up so I think we're good! 👍 😄

@spujadas
Copy link
Owner Author

😃 Thanks so much for your feedback, same behaviour here, so… closing the issue!

@jalagrange
Copy link

Hey guys, I'm experiencing the same behavior. Could you point out a way to define when garbage collection should be triggered on my server? or node instance? what should I configure for this to happen?

@jchannon
Copy link
Contributor

I dont think you need to do anything if you have pulled the latest image

@spujadas
Copy link
Owner Author

@jalagrange Are you experiencing this behaviour with the latest version of the image? This should have been solved by aaa09d3 that I published a few weeks ago (by the way, if you take a look at that specific commit, you'll see how I configured garbage collection, you'll also want to have a look at elastic/kibana#5170 for background information on this issue).

Also, how much memory are you dedicating to the container?

@jalagrange
Copy link

Wao thanks guys, that was quick... I am actually running kibana 4.3.1 directly on an CentOS server that connects to a remote Elastic Search cluster, not using docker. But take a look at my node memory usage in 3 hours without any type of usage, it sounds very similar to what you guys are describing:

screen shot 2016-02-11 at 1 27 42 pm

I am currently running this on an AWS micro instance so 1GB of memory.

@spujadas
Copy link
Owner Author

Ah yes, looks familiar. elastic/kibana#5170 is what you want to have a look at for the non-Docker version of the issue (long story short: setting NODE_OPTIONS="--max-old-space-size=250" before starting Kibana should solve the problem).
Can't help beyond that as this is really a Kibana issue (I'm merely packaging it as a Docker image!), so if setting NODE_OPTIONS doesn't help, you might want to consider filing an issue with the Kibana guys.

@jalagrange
Copy link

Thanks a lot @spujadas ! I did just that and took a look at the issue you mentioned. I'm pretty confident it will work but I'l post back in case it doesn't. Just to expand your reply,

NODE_OPTIONS="--max-old-space-size=250"

Must be set at the beginning of bin/kibana that is being executed, 250 being the number of MB you wish to cap the process at. (in case someone else stumbles onto this)

@itsAnuga
Copy link

Yup, our kibana on Ubuntu is constantly crashing because of this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants