-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mongodb constantly at 100%+ of CPU #22
Comments
If I run
|
Are you using the modifications included with PR #16 & #17? I saw the same insane mongo container CPU usage, I didn't quite fix it but reduced it to manageable levels by reducing processing intervals. Before those changes it would fail due to running out of CPU resources after a couple days. Now I can keep 7 days worth of data on a t2.micro unlimited container, and CPU is usage sits around ~90%, so my web UI stays responsive enough for me. This is still kind of expensive ($13/month for excess CPU credits in addition to $8/month for the instance) but works well enough for now. |
I have high CPU usage too but the UI response fine, so I didn't spend to much time on investigation. |
Hi @jehartzog and @lmachens thanks for getting back. After reporting this I did a reboot, and for the last 24 hours it's been hanging out around 100% (of a possible 200%), so things are running fairly smoothly. Perhaps some indexes just needed some time to work out; or, similarly, redis-oplog had to kick into gear. The good news is that it seems to be working fairly stable on a $15/month Digital Ocean droplet (2gb ram and 2 CPU). Will close this issue for now, and re-open if it crashes again. |
Okay so about 5 hours ago it looks like mongodb might have crashed because of excessive CPU usage? Not sure... this is all I see in the logs:
Grr. |
Double check that you are using code from the two PR's I added above, they helped manage (but not completely fix) this exact issue for me. |
@jehartzog Yep, I most certainly am running those -- I only set this server up last week. Here's what I tried:
Stumped for now. |
You can adjust metricsLifetime. If you limit it to like 3 days, the cpu usage should be better. |
I believe my settings.json already has it set to 3 days:
My Digital Ocean droplet has 60gb of space -- of which it says 6gb has been used so far. The Meteor node process is at ~1% CPU and the mongodb one consistently at 100%. I'm guessing that mongodb is the culprit -- especially because the period where Kadira went down syncs perfectly with an error message related to mongodb: |
My CPU usage is always at 100% too, but it is fine for me because of 2 cores. No issues in the UI. The aggregations are the reason for the high CPU. |
I went ahead and changed the first contsant from
Which gets it to run when Meteor first reboots. So I gave it a few minutes to settle down, but... 30 minutes later and it hasn't died down yet. Edit -- completely shut down and restarted Meteor so that mongodb would reset. Launched with just |
Looks like my mongodb is now at 200% again -- both processors completely hogged up and the UI pretty much unresponsive. I'm going to expand to 3 CPUs and see what happens. Edit -- that didn't work, as their package was 3 CPUs and 1gb of ram, which isn't enough ram to boot Meteor properly. |
Been running it with 4 cores and 8gb of ram and she's finally happy. But I'm not super happy about the $40/month that's setting me back! Going to shop around for another host at some point. |
Unless you're experiencing some massively different customer usage than I'm used to (~200 peak active simultaneous users), you may want to dig more into fixing the issue causing the mongo container CPU usage. I don't think you'll be able to find a significantly cheaper hosting service that can do the CPU processing that is currently required. |
Okay so here's an observation -- in the server/rma/server.js file, each of runShort, runMedium and runLong get fired at the same time when Meteor starts. I put a little timer on each. At startup, this is how long each one took to run:
Once they were done, runShort was able to run on its own again and took a total of 348 seconds! As soon as it was done, it started instantly again -- this time taking 480 seconds. Sure enough, the instant it was done another runShort fired, this time taking 579 seconds. So... getting longer each time, heading in the direction of 10+ minutes per runShort(). Not so short, hey! Each one of them takes up at least 100% of a CPU -- and in my case, with 4 cores, the CPU was at 400% when all 3 functions were running at the same time. It also looks like whenever runMedium fires (30 minutes), runShort will also fire. And whenever runLong fires (3 hours), both runShort and runMedium will fire. So I've constantly got my CPU maxed out at 100% with runShort(), then every 30 minutes it bumps up to 200%+, and then every 3 hours it gets hammered at a full 400%. Some steps to take to figure this out and mitigate the issue:
I'll report back if I make any progress -- this is my first time digging into what actually happens with each aggregation, so if you've got some thoughts in the meanwhile that'd be great :) |
Even with 4 cores, my mongodb keeps crashing and the whole site becomes unusable :( This could be because I've got so many methods / publications? For example, there'll often be 25k to 35k methods per hour that it has to aggregate. Are all these queries indexed? Edit -- I see in lib/collections.js that there's Edit 2 -- apparently db.methodTraces and db.methodsTraces are different (one with "s", one without. Not sure what's going on there). Edit 3 -- indeed, there's a lack of indexing going on here. Check that COLLSCAN below.
|
WOW. Okay so I added some indexes and it's like night and day. Instead of runShort() taking 2+ minutes, it now takes ~500 milliseconds. Meanwhile my mongodb CPU is at almost 0%. I'll put in a pull request for the changes. |
This helps solve lmachens#22
@jasongrishkoff This looks great, thanks for keeping up the hunt to figure this out! Are you 100% sure those indexes are the only things you changed to fix the CPU usage? For one thing, number of those indexes already exist. I also added the indexes you recommend and I'm not seeing a change in CPU usage. |
@jehartzog you added the indexes in this pull request, right? #23 I'm 99.99999999% positive that was the only change, and it was like night and day. When you fire
|
I did use the indexes you added in #23, the only thing I did different was add them in the background. These are the only 4 collections that you edited in the PR, but
|
@jehartzog my apologies, I definitely should have said
I tested it by doing
|
I went back and reverified everything, don't know what I was smoking before but your solution is 100% effective. I had somehow missed the adding the index for |
@jehartzog Nice! Hopefully it saves you some money on that extra CPU you were paying for :) |
Good job guys! I created a new version with your changes. https://github.com/lmachens/meteor-apm-server/blob/master/CHANGELOG.md#108---2018-02-27 |
I've got a 2-core system running on Digital Ocean, and after ~3 days of tracking data, everything totally freezes up with Mongodb stuck at 100% on both cores. I'm using redis-oplog for oplog tailing, but that doesn't seem to help :(
Anyone else experiencing similar?
The text was updated successfully, but these errors were encountered: