-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak #134
Comments
I've been experiencing the same issue |
Hi @mattmcla we definitely do not want the module causing memory leaks. Our tracing module patches node core pretty deeply, so it's totally possible that we're causing a leak. I would like to gather a bit more information here. It is expected that overhead will increase when using our module, but are you seeing a steady growth of memory over time? I have tried to reproduce the leak locally based on descriptions, but have been unable to show any significant growth of memory, even when placing a high load on our sample apps. I am fairly sure we're capable of causing a leak, I just don't have enough information to reproduce the issue. If you have some sample code, I would be happy to try running it. Alternatively, if you are able to core-dump the leaky process, I should be able to inspect the core for leaks. This would only work with Linux or SmartOS cores. I hope this isn't causing too much extra overhead on your application, but I really appreciate any help you can provide here. |
Here's a chart from you guys that outlines what was going on. As you can see after each restart memory consumption would go through the roof. The app consumes 42M just after start and typically settles at around 82M per instance after it's warmed up. With NR installed it would just consume memory until Heroku would restart the application, the site would get sluggish, etc. All the things that speak to a memory leak. Our application is very new at this point and there are long periods of idle time but during those times memory would just get consumed and never let go. Here is how we're kicking our express app off: https://gist.github.com/mattmcla/b82b064a639efa4b7e00 Other than that it's a pretty straight forward expressjs app. I did try using it with just one CPU (still using cluster but overriding the numCPUs to 1) to the same effect. As for getting a core dump, that would be a bit of work as we're on Heroku. I'll look into it if it's necessary. What I can tell you, is with new relic removed, we're running smooth. |
Also, in my first comment I linked to a gist of our logs. It does show a stack trace coming from the new relic module. When those errors occur is when the memory starts climbing until there is no memory left. |
@mattmcla this is great information, thank you! The errors in those logs are enough to suggest that there is a leaky way we're handling re-connection attempts. This may or may not be related to SSL, but given that this coincides with 1.4.0, I'm going to start looking there. I still need to generate a reproducible case, but I believe I have enough information to go on. If you have any other information, logs or code samples I would gladly accept them. You can email me directly at jacob@newrelic.com. @rwky if you have any logs to share, I would also love to see them. Sorry for the spikes! I hope we can get this sorted out soon, and I appreciate all the detailed information you're providing. |
Cool, yeah, I can replicate it every time, but it sometimes takes an hour before the failure happens. It lead me astray a number of times. Let me know if there is anything else I can do. |
@groundwater here's our trace:
The same error is repeated throughout the logs, it only happened since upgrading to 1.4.0 (I've now pinned it at 1.3.2) and it takes a couple of hours to see a problem. |
I just wanted to update everyone here with that's been going on this last week. We are focusing in on the This may or may not be related to the memory leak issues reported; I don't yet have a sample app that shows the same kind of leaks. Given the lack of data in this area, I'm focusing on fixing the socket error first. Once that's cleared, I will be diving deeper into the memory leak issue. In the mean time, if anyone has an app they can share which reliably leaks memory with the newrelic module installed, I would be grateful for a solid repro case. |
Our app is far too large (and confidential) to share but once you've fixed the socket hang up I can upgrade new relic and see if the issue persists. |
Our app is our secret sauce so it's not something I'm willing to share. As mentioned by rwky, if you fix the socket hang up issue I'll be more than happy to give NR another shot. |
I just wanted to jump in and say that we are experiencing the same issues. Edit: We are running on dedicated servers. No Heroku or similar platform. |
We've also experiencing this issue. Our express app dies with the following error message:
We've just adopted NR into our node app. I'm busy trying to replicate the issue. Previously, we've been running our application for over a year, and it has been stable in terms of memory and CPU usage and general availability. Since adding in NR, we have noticed instability. We're using express 2.5.x, jade 0.25.0 and nr 1.3.x with node 0.10.x and a handful of other modules. We're running in AWS, and using NodeJS clustering.
|
have just boosted nodejs memory from 512m for more stability, with --max-old-space-size=1024 edit I have upgraded to NR 1.4.0, and it's more unstable than 1.3.x: crashes within 20 minutes of starting up the application consistently. |
We just released version Unfortunately I do not know if this was causing the memory leak or not. I very much believe we have caused a leak somewhere, but I don't know where yet. We have not been able to reproduce the problem, which makes tracking it down difficult. I would be grateful to anyone here would can try version Thanks again for your wonderful support! |
Hi groundwater, Thanks for the release. I've been running 1.5.0 for over 5 hours now and have not had a crash. 1.4.0 was crashing within 20 minutes, so on first look, the new version looks a lot more stable. I am going to add some load to the app and see how it handles over the next couple of days. |
Unfortunately we're still experiencing memory leaks with 1.5.0. The socket hangup has been fixed though. |
Memory still leaks with 1.5.0 This happened well after the memory started leaking
In the picture you can see the line plateau. This is due to throttling on Heroku. All of this also happened during the night and we were receiving no traffic. It's also worth noting we're running 4 instances on 2 dyno's. |
My Node app would run stable in memory in the 70-80 MB range. The last couple of weeks I've noticed that the memory usage has started to grow steadily until it caps the server, requiring a restart of the app. I've been debugging memory leaks for days, banging my head against this wall. I just recently tried removing newrelic-node from my app, and memory is stable again. newrelic_agent.log looks fine except for the following error which appears periodically:
|
@jmdobry that issue, at least, is fixed in v1.5.0 of New Relic for Node, which we released on Friday (see @groundwater's comment upthread), but for at least a few people this hasn't fixed the memory leak issue, which means the two probably aren't correlated. We're actively investigating this issue, but it's only happening for some people, and we haven't been able to reproduce the problem locally. Sorry, and thanks for your report and your patience while we figure this out! |
@othiym23 Good to hear about the socket hangup. Still have the leak though. For what it's worth, the leak seemed worse the more verbose the logging level was. |
we're also still seeing the memory leak + instability (though not sure whether it's related to socket hangup). We're using a local port monitor that restarts the service when it stops responding (3s timeout), and we start experiencing non-responsiveness within an hour. However, we run 2 Node apps side by side, and only 1 of the apps ever becomes unresponsive, even though they have similar architectures. Here's some info that might help with reproducing. As with others, we are unfortunately unable to share our app, but maybe something in here can be used to help narrow down the issue. Some libraries that are in the unstable app, but not in the stable app:
Middleware configured in unstable app, not in stable app:
The graph below shows host memory usage with NR enabled until about 5:30pm, after which we disabled it and restarted the services due to unresponsiveness. |
Just want to jump in here and say that we are also using
Do the other people experiencing this also use these modules? On Tue, Apr 15, 2014 at 11:11 AM, Brett cave notifications@github.comwrote:
Sebastian Hoitz Geschäftsführer, Entwickler, Coffee 2 Code converter komola GmbH Telefon: +49 531 3804200 Geschäftsführer: Sebastian Hoitz, Thomas Schaaf |
We were in communcation with @groundwater as soon as 1.4.0 was release because we were experiencing this same memory leak. We've been running 1.3.2 due to this issue. I just want to throw our voice in that this is a major issue for us because business is starting to require RUM data. I just tried 1.5.0 and still have the leak. In about an hour we are reaching the heroku dyno limit.
We're running over https on heroku and doing a lot of https api requests using the request http client. We're also using cluster. |
@sebastianhoitz nope, we're not using handlebars or i18n and are still experiencing the memory leak with 1.5.0 |
I think there is a memory leak, but we still haven't been able to reproduce it. I think the fastest path to a solution from here is getting our hands on some hard evidence. I completely understand if you cannot share you app, but perhaps there are other solutions.
If you'd like to email me directly at jacob@newrelic.com we can talk about details. Thanks to everyone for their great help so far. It sucks when we cause problems for your apps, and we really really appreciate you helping us fix these things. |
I just finished preparing heapdumps that should help resolve this issue. I'm emailing them to you, @groundwater, and the New Relic support team. |
The latest theory that I've heard indicates that the memory leak has something to do with wrapping MongoDB queries. I created a simple proof-of-concept app that seems to verify this. If anyone has changes or tweaks that might help them shake out this bug, feel free to fork it. |
@nicholaswyoung I ran your demo app, and drove 500k requests to it. I did not get a memory leak. Neither memwatch nor my external metrics indicating any problems, and the memory did not grow beyond about 80mb peak. The memory promptly dropped when I stopped driving traffic to the application. I've asked my colleague to look at it, just in case there is something about system setup involved. Can you give me the exact command you used to drive traffic, and how long before the leak occurred? |
@Chuwiey, hrm, forgot that I can't see your email address through github. Can you email me at chase@newrelic.com so we can have a more direct dialog? Thanks! |
Responding via email... |
Not sure if it's the same issue, but running with stats set to "trace" OR "info" I'm seeing a ~50MB increase every 30 minutes. Disabling this module (but still using new relic on the server) reports no increase in RAM over time. Emailed heather@newrelic (my contact) more info, but I wanted to post here as well. |
@Rowno Without y-axis labels, your chart doesn't tell us much. Could you provide the amount of memory usage you are seeing before and after? We're in the middle of doing some deep inspection of core dumps and other data to try to gather as much information as we can. Stay tuned for more info. |
@framerate Yes, using a verbose logging level can cause a serious increase in memory usage. We used to default to "trace" level, but have backed that off to just "info" level at this point. That said, we believed "info" was unlikely to cause a noticeable memory usage. Can you double check that "info" level is still problematic? The reason verbose log levels are problematic for memory usage is due to garbage collection. Normally, one would expect log message data to be ephemeral and be collected quickly during garbage collection scavenge cycles. However, we are finding log data persisting into the old-generational space of memory. The end result is that a lot of log messages end up sitting around in memory waiting for a slower, less frequent mark and sweep cycle to be collected. This means the memory usage in the steady state for a given app is higher than if all log messages were collected immediately by scavenge cycles. We're still investigating. Stay tuned! |
@txase I let it run over night and sadly found an actual memory leak in my API so my data is corrupted, but it still appears with a small sample size that even with 'info' logging I go up ~2MB every 5 minutes with newrelic running at app level. Initial tests you'll see running overnight the slight "slope". Then the drop off when I restarted without running new relic on the application and it baselines. |
@framerate The small slope could be an indication of a leak, or simply a rise over time that hasn't plateaued yet. Due to how the agent works, we need to isolate memory usage due to a leak versus usage due to higher request throughput. For your particular environment, it might be useful to let the app run a few days (going through day/night peak cycles), and then check for continually rising memory usage indicating a memory leak. When we've asked other customers to try this, they eventually see memory usage level-off. Thanks for following up! |
@txase - This is running on a micro AWS instance. The reason this is on my radar is because my server seems to hit 100% Ram (micro has no swap) and become non-responsive. Could be related to this issue, could not be, but it seems to be :(. So running a few days and watching becomes an issue since the 100% RAM never drops back down. Granted some of the leaks were mine, but the above graph is running a clean app with/without newrelic agent. I'm going to keep investigating, but I have to turn off newrelic until I have time to circle back next sprint. |
@framerate We're preparing a document with things you can do to mitigate memory usage. The gist is that you can try one of the following:
To a certain extent, we simply record a lot of data in order to provide our customers with as much info as possible. Using our product will entail a certain overhead in memory, and you may need to increate the available memory. Thanks! |
Thanks Chase! I'm upgrading to a medium soon. I don't mind the overheard, I I look forward to seeing this doc! Thanks! *- justin *| @framerate http://twitter.com/framerate | framerate.info On Tue, Jul 8, 2014 at 11:29 AM, Chase Douglas notifications@github.com
|
Thanks @supergrilo. I had to remove newrelic for now. They suggested that I run it on a server with more ram and it'll eventually plateau. How much ram is on your machine? (mine was ~600MB micro AWS instance) |
My machine has 4G of memory ram, but v8 are using only 1.4G for default. |
Hello, I have the same problem on my node.js apps. One of my node have newrelic, the other one not. $ node --version $ npm list |
Still have this! |
Hi, I have the same problem on my node.js apps. |
+1 Some weeks ago I rolled back the new relic deployment and I am watching this issue since then. I understand it is not an easy problem to solve, but hopefully you can give it higher priority. Thanks. |
The best way to get the priority bumped on a problem you are having is to The github issue tracker is being phased out, and all of our internal tools On Wed, Jul 23, 2014 at 2:07 PM, ericsantos notifications@github.com
|
I was experiencing the same issue running on node 0.11. I had to remove it because the memory increase was huge - from ~170mb to ~2gb. Like @rictorres, I originally thought this could be related to pm2, but the issue still presented itself when running with @wraithan I don't think that at this point it should be up to us, as users and/or customers, to report this issue on yet another tracker, considering that a discussion is already on-going here. |
@ruimarinho I get that. But without account data, module lists, being able to correlate things across those, etc. It makes our job harder. On top of that, product management uses support tickets to push what is important. @ruimarinho Also of note, we don't really support We can't reproduce an unbounded leak. We can find cases of higher than user desired memory, but nothing that actually shows a leak. Most of the memory usage appears to be in flight objects, especially in the higher memory cases. That large number of in flight objects cause V8 to allocate a lot of memory (it is greedy) and puts pressure on the GC. |
I understand that Indeed, I can't really reproduce a memory leak but the observed memory usage is much, much higher than what you would normally get without the module installed. Like you said, this may not exactly be a bug in the module, but it is a tradeoff that some of us are not willing to make. Nevertheless, the workarounds mentioned above to limit this issue did not work for me, but I'll gladly try other suggestions if you have any available. My help may be limited to node with the harmony flag enabled, but right now the symptoms are similar to what others are experience in node 0.10. |
+1 here, we are seeing the same memory leak. Hope this is resolved quickly, but honestly just glad to know about it -- I've been waking up at odd hours of the night for a couple of months to restart our Node processes to prevent the memory leak from overwhelming our servers and we didn't know the culprit until today. We have a few clusters of servers running on AWS with a handful of different Node apps, all with NewRelic with sawtooth memory usage graphs. Disabling NewRelic's Node module solved it immediately. Just submitted a NewRelic support request as @wraithan suggested. Looking forward to a fix here. |
A big warning should be put somewhere! This agent should not be used in production as it will almost certainly decrease the performance substantially of your app due to this leak!!!! The stable version is 1.3.2 |
@wraithan will followup with newrelic support but do want to add some learnings to this more public forum since it is very active. In short, we have tried newrelic 1.9 with RUM enabled, 1.3.2 with RUM disabled (not supported) and sans newrelic. We did see some improvements when going from 1.9 to 1.3.2 but the when removing newrelic entirely we saw a significant drop in memory usage over time. Here is a screenshot of our heroku dashboard with newrelic-node 1.9 installed vs no newrelic. Note that throughput is about the same. We are an https only app serving mostly web pages. Any sudden drops in app memory are from a deploy which restarts the app. I understand that monitoring isn't cheap but we saw significant improvements across the board when removing newrelic and are looking at other smaller monitoring solutions now. |
Hi folks, First off, we've received a lot of very helpful information in this thread. We appreciate the amount of time and effort people have put into helping us determine potential issues in our agent. Our greatest concern is ensuring that we do not negatively impact our customers, so we take this issue very seriously. We've spent a lot of time behind the scenes looking into the memory usage of our agent. We worked with a small number of customers who provided core dumps of their apps, and this has led to a few discoveries: http://docs.newrelic.com/docs/agents/nodejs-agent/troubleshooting/large-memory-usage However, continuing this issue on GitHub will not help us. If, after consulting the documentation above, you continue to experience memory usage issues, please follow up with us at node-github-issue-134@newrelic.com. If possible, please contact us using the email address you use to log into New Relic, and include your account # and application name(s). This is a temporary address specifically set up to help us create a direct support ticket for you. Creating a dedicated support ticket will allow us to work with you on an individual basis to gather the information we need. We highly encourage you to follow up there, and we will be locking this issue. Relatedly, we are winding down our use of GitHub issues. It can be difficult to support our customers through GitHub because we can’t share confidential information. Instead, please contact us through our dedicated portal at http://support.newrelic.com for any other issues you encounter. We are better equipped to support you there, and issues filed there are resolved more quickly. Towards this end, we will soon be turning off GitHub issues. Once we flip the switch, all access to issues, both active and closed, will be gone. This is an unfortunate limitation of how GitHub handles issues once the feature is disabled. Thank you for your understanding as we undergo this transition. |
I've been hunting memory leaks for the past 4 days but this one stumped me for a bit. While the application was idle, it was consuming a consistant amount of memory. That's when I noticed new relic appeared to be crashing and not recovering.
Logs:
https://gist.github.com/mattmcla/958c26fb8e8374981016
packages:
"dependencies": {
"express": "~3.2.5",
"jade": "~0.35.0",
"versionator": "~0.4.0",
"oauth": "~0.9.10",
"pg": "~2.8.2",
"hat": "0.0.3",
"knox": "~0.8.8",
"mime": "~1.2.11",
"slugify": "~0.1.0",
"dateutil": "~0.1.0",
"newrelic": "~1.4.0",
"orchestrate": "0.0.4",
"bluebird": "~1.0.4",
"dotenv": "~0.2.4",
"less-middleware": "~0.2.0-beta"
},
"engines": {
"node": "0.10.22",
"npm": "1.3.x"
}
It's only been a hour or so since I removed new relic but I'm not seeing any leaky behavior.
The text was updated successfully, but these errors were encountered: