-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak starting with node v11.0.0-nightly20180908922a1b03b6 #28420
Comments
Thanks for the report!
I agree, looking at e917a23...922a1b0 that seems like the only reasonable source of issues – and V8 updates are big changes, so this is likely the culprit.
I agree 👍 That valgrind shows nothing of interest makes it less likely that this is a memory leak in C++, so that’s already helpful.
The fact that this is unlikely to be a C++ memory leak makes it more likely that this is one in JS; valgrind doesn’t track memory allocated using First, if possible, it would be good to verify that this is the case. Secondly, if possible, taking heap dumps at different points in time and comparing then in Chrome DevTools should help a lot with figuring out what objects are being retained and why, if this is a JS memory leak. For taking heap dumps you can use e.g. https://www.npmjs.com/package/heapdump, or, if that’s more convenient, directly attach DevTools to the process. Given that this is likely a V8 bug, you may want to run the process with Finally, it would also be good to know if a more recent version of Node.js (e.g. current nightlies) has the same issue. V8 issues are often caught and later addressed, and it may be the case that the solution is to figure out which V8 commit fixed this issue and then backport that to the relevant Node.js versions.
If this does turn out to be an issue with one of the C++ parts (e.g. by seeing an increase in RSS but not the heap as reported by I hope this helps! |
Here are the graphs using I am proceeding with the additional memory metrics and the heapdump approach as well as checking whether deliberately invoking gc helps. |
graphs for These runs were executed with garbage collection being called once every minute. left chart top-to-bottom (leaves out heap_size_limit and total_available_size): right chart top-to-bottom: |
@bmacnaughton It’s not fully obvious from the graphs, but I’d said that this does look like a JS memory leak because the heap size and the RSS curves in the right graph do about the same thing? |
@addaleax - when you say a JS memory leak you mean v8, right? If yes, it seems that way to me. I'm adding heapdump code to the todo server now; I'll probably dump every 10 minutes or so. I'm going to let this run a while longer before starting the version that will call heapdump every 10 minutes. I'd like to get a little longer view. I will also turn off the forced garbage collection for the next run to have a comparison to this run. btw, the v8 heap total_available_size is slowly going down. see attached chart. I put these two values in a separate chart because their scale was very different. the top line (which is flat but is a bit of an optical illusion) is the heap_size_limit. the bottom line is the total_available_size. |
OK, here's charts at the end of the previous run. It's looking pretty similar to the previous runs. The gc doesn't prevent the memory loss; it would take more work to determine whether it slows down the rate of loss. I'm going to pass on that now. The v8.total_available_size is now down to 1.4 GB; it started at 1.5 GB. The heap_size_limit is 1.518 GB (constant) I'm starting a run with heap dumps every 10 minutes. |
Hi, may I ask what did you use to generate the memory usage graph? |
The data is coming from the JavaScript calls process.memoryUsage() and v8.getHeapStatistics(). I'm sending those to the appoptics metrics api at api.appoptics.com/v1/measurements. and they're available at my.appoptics.com in dashboards/metrics. The todo server app (my interactive test harness, not really meant for general application) is at github.com/bmacnaughton/todo. the metrics generation code is in server.js (search for argv.metrics) and lib/metrics.js. @addaleax - I have almost 20 hours of heapdump snapshots at this time - one every 15 minutes. I haven't looked at them before but am just starting to look now; I'll see what I learn. If they would be helpful to you I can post all or specific time periods. |
@bmacnaughton You’re on the right track – you typically want to look at the retainers of objects created later in time, because those are more likely part of the memory leak. (The map for |
@addaleax I am not creating any and within that, |
@addaleax - OK, here's a wag based on a guess. I'm guessing the @Number is sequencing of some sort so that larger numbers imply later creation. If that's workable then this object is later in time and it's retainer is related to our use of async_hooks. That seems like a likely candidate to be impacted by a v8 change. (Sorry for the miserable formatting - there's too much for a screen shot and the copy didn't format well.)
I appreciate the time you've provided so far and any pointers you can provide me at this time. |
for anyone following along that has as much to learn here as me: https://developers.google.com/web/tools/chrome-devtools/memory-problems/memory-101 |
@addaleax - one video I watched said that I have no control over the parenthesized items in the memory display. If that's correct then the (system) category is in the control of (I suspect) v8 only. |
@bmacnaughton So … that’s partially true – these objects are still kept alive by other objects, and if those other objects are controlled by your code, then you do have some control over that. Also, the fact that these objects are being controlled by V8 would align with this pointing to a bug in V8… which is a good reason to figure out where the leak comes from ;) |
This is last commit to
At this point tbh, I'm not sure I would recognize a smoking gun if I saw one. I do see that a number of the array elements are my code's async_hooks/cls-hooked context. |
@addaleax - I'm sorry that I can't make faster progress on this. I am a bit like a dog watching TV. The things I'm looking at sound like things I've heard but the form is abstracted in a way that makes it hard for me to grasp. |
@bmacnaughton Would you be able to share a reproduction, or maybe even “just” a heap dump for inspection? da8641f seems like an unlikely candidate for causing a memory leak, and also isn’t part of the commit range that we would primarily look into (according to the different nightlies). It might still be good to know which objects in your application keep the newly added
Well, I’ll be without access to a computer for the next 48 hours, so as far as I am concerned, take your time. I’m pretty interested in finding out where this is coming from, though :) |
And in the worst case, there’s always the brute-force option of bisecting against the V8 commit range corresponding to the update; it’s going to require building Node.js 11 times from scratch and seeing whether the leak also occurs there, but it would likely lead to the commit that caused this change. |
I'm happy to share all the heap dumps you would be willing to look at. This is from an internal benchmark and has no customer data in it. I would prefer to send a dump, or put it somewhere for you specifically. I've sent a google docs share to you. Please advise if you have any issues (I used an organization account - not sure how that handles shares). I've done the node bisecting twice now (once to find where the default heap limits changed and once to find where the leak started - I thought those were the same problem initially) but they didn't involve building node from scratch. I did that a couple times at Vancouver Node+JS Interactive but haven't since. It's a little harder to share a repro - I have not been able to reproduce this outside of a docker-compose environment. Setting up access to the EC2 environment and going through the process for that is probably best left as a last resort. If I could repro on my dev machine then I would be able to share it.
I'll be looking at this while you're away. I appreciate your hints. |
Yes, they are all maps. One thing that makes no sense to me is this: There are 784 elements in The majority of the 784 elements appear to be |
I'm not sure how to make sense of this either. The bulk of memory is not labeled at all - it can't just be C/C++ code as the area is bigger than the initial memory footprint of the whole program. @addaleax - If you are able to take a look at the snapshots and they are clear to you I would very much like to understand the process you went through to find the error (I wish I could just watch the process). If they don't help then I will give bisecting the commits in |
@addaleax - many of the objects being retained in In the meantime, I've built node again and can start trying to bisect v8 releases if you can provide a little guidance as to how to go about that. |
@addaleax - clarifying: your suggestion of building each specific v8 commit is to use each of the commits in e917a23...922a1b0 to build node and then test that particular binary in our test harness, right? |
Hi again @bmacnaughton!
Yes. This doesn’t make sense to me either, and, sadly, points to some relevant information simply not ending up in the heap snapshot. I feel like at this point it would make sense for somebody from @nodejs/v8 to take a closer look.
It’s in the V8 source code, for the versions of V8 that come into question here.
It’s unfortunately not that easy, because we bump V8 versions as a bulk commit, not by picking individual commits. Bisecting against V8 versions is a painful process that’s best performed on a machine with a lot of CPU power, but what I typically do if I have to do this is:
It’s not a lot of manual work, but it takes time because it usually means recompiling all of Node.js from scratch a dozen times. It might be easier to wait for an opinion from somebody more familiar with V8 internals first… |
There is a way to narrow the range of commits before bisecting V8: go to https://nodejs.org/download/v8-canary/ and try with canary versions. you can look at this table to know which V8 version is in which canary version. |
I've been trying to reproduce the leak using the simple application @laino created (#28787) but have had no luck. i don't know how the debugger is using async_hooks so it's possible that i'm just doing something differently. here's the simple program that i've run to try to reproduce the leak using #28787's basic logic - but it does not reproduce the memory leak shown in the graphs above. it's clearly too simple (or i've made a bad assumption somewhere) to reproduce either my app's or the debugger's use of async_hooks. i know that the debugger isn't using cls-hooked so that's one big difference but my application is using it so my thought was that the program could check two things at once. i'm running the app in a docker container on node. my next step is to create a more complex application that adds more logic within the main() function to see if i can generate a memory leak. if anyone has ideas please let me know. this program is really simple because it was my hope that reproducing @laino's memory leak would be the fastest way to get there.
|
Maybe a long shot but would this be related at all #28988, seems like @gabrielschulhof has found that there is a fix for that issue in a later v8 version. |
thanks for connecting the possibly related dots! any idea when that version of v8 will make it into node? i'm happy to test against my big, ugly, messy, hard-to-package benchmark where i see the issue come up. i've been working on reproducing it in something simple but haven't had any luck. in the meantime i'll keep trying to repro in something packageable. |
I'm not sure yet but I'm hoping @gabrielf will comment in #28988 (comment) |
i have isolated the memory leak to using our agent on an application running mongodb-core v2. if i update mongodb-core to v3 there is no memory leak. i know that's not exactly a reproducible case but i wanted to provide an update. maybe someone familiar with mongodb 2 => 3 changes will have a lightbulb go on. i am able to work around this now, albeit by removing functionality for the problematic combination. disabling our instrumentation for the mongodb-core package removes the interplay of async_hooks, mongodb-core, and versions of node since v11.0.0-nightly20180908922a1b03b6 so it's clear that it's not mongodb-core by itself. apparently there is some activity in mongodb-core regarding async_hooks: https://jira.mongodb.org/browse/NODE-1603 |
@bmacnaughton are you using the mongodb-core driver directly or through abstraction layer such as Mongoose? |
there are two levels here - we instrument mongodb-core. so mongoose looks like mongoose (patch) => mongodb (prior to v3.3.0) => mongodb-core (instrument) while we instrument mongodb-core we do patch some mongoose function in order to maintain context across queues and the like, e.g., |
I have written a simple code to reproduce the problem and submit an issue to google https://bugs.chromium.org/p/chromium/issues/detail?id=1030569 |
@likev thanks for reproduction code! I've created v8 issue: https://bugs.chromium.org/p/v8/issues/detail?id=10057 It seems we observe this memory leak issue in TypeGraphQL(?) code in our backend. Slightly modified repro code (node 12, 13 will OOM after a few minutes): const leakMemory = () => {
class Leak{
constructor(){this.unused();}
unused(){}
}
new Leak();
};
setInterval(function(){
for(let i = 0; i < 1000000; i++) {
leakMemory();
}
console.log(Object.entries(process.memoryUsage()).map(([k,v])=>`${k}: ${(v/1024/1024).toFixed()}`).join(' '));
}, 1); |
V8 8.1.39 with the commit v8/v8@d3a1a5b have fixed it. |
@alex3d this memory leak is still present in the latest LTS branch!! (12-alpine docker image) omg... we had so much trouble in the past... changed back to node 10, fixed problems. |
also in version 13, the problem seems to be fixed |
Looks like the fix in v8 was cherry-picked in #31005. |
v12.15.0 to be released soon will include the fix. |
Is there an ETA on publishing 12.15.0? We're running into this now, and it would be awesome if this was published soon! |
@bencripps Any time now, it is scheduled for 2020-01-28 https://github.com/nodejs/node/blob/v12.15.0-proposal/doc/changelogs/CHANGELOG_V12.md#2020-01-28-version-12150-erbium-lts-targos |
We're going to delay 12.15.0 until a week after the upcoming security releases: nodejs/Release#494 (comment) |
Has this issue been resolved in Node >=12.15.0? Can we close this issue? |
Node 12.16.0 includes the fix. This issue can be closed. |
the leak that i was seeing appears to be closed as well, so it appears to be the same thing. |
@likev - how is it that you isolated the leak to the precise test case? i ask because i only saw it in a large system and wasn't able to isolate it. |
ohh, I observed the leak in my own small project and isolated the leak is surely rather hard with lucky. |
Looks like the leak has been resolved in our application as well. Upon deploying 12.16.0 memory growth has remained steady for the past 2 weeks. |
Hello, I believe this issue may still exist in one form or another in Node 12.16.1. We've observed an increasing memory usage after upgrading a production system to Node 12. The reason I believe it may be related to this issue is we're seeing the exact same 110761200 bytes of memory used in We also attempted an upgrade to Node 13, but we are still seeing a similar leaky behavior: We don't have a minimal repro unfortunately because we are only able to exhibit this behavior on a fairly large application. It also seems to be related to the volume of work/concurrency in the program as we are unable to reproduce in staging environments. Also likely causing googleapis/cloud-debug-nodejs#811. |
uname reports the host container but the application is running in a docker container, from /etc/os-release:
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
The leak also occurs running alpine 3.9.4 in a container. I have not been able to reproduce it outside of a docker container running on a native ubuntu 18.04 machine.
I am looking for some guidance on steps I can take to help narrow this down; I don't believe that the information I currently have is enough to isolate the problem.
The situation occurs running a memory/cpu benchmark of a todo application instrumented by our APM product. The APM product comprises JavaScript, a C/C++ library, and C++ code that uses the node-addon-api. The todo application is derived from todomvc-mongodb and runs express and mongo.
The graph below shows runs against two consecutive nightly releases (the scale changes so that the increased memory usage isn't off the top of the chart).
node v11.0.0-nightly20180907e917a23d2e
shows no memory leak whilenode v11.0.0-nightly20180908922a1b03b6
shows a stair-step memory leak. I took a quick look at the commits between the two nightlies but nothing stood out to me other than the v8 changes. I don't know enough of v8 to evaluate them. (6.5.1 and 6.6.0 are the versions of our agent being tested. and at one time the blue line was purple but the legend didn't change.)Here is output from valgrind for an 11 hour run. This is the first time I've used valgrind so I don't have much to compare it with, but nothing stands out to me - there's nothing that looks like it could account for the memory growth shown in the second graph. Hopefully it will mean more to you.
What can I do next to help isolate this?
The text was updated successfully, but these errors were encountered: