-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node process killed by os during heap snapshot due to OOM #50711
Comments
I can reproduce locally. To correct the OP a bit - the process isn't killed because it uses all of the host system resources, it is killed because the extra leeway that we give it to generate the heap snapshot isn't enough because V8 allocates extra heap memory to cache the calculated line ends during heap snapshot generation - something that we weren't aware of when implementing The new-line-ends-cache-during-heap-snapshot-generation thing is a current caveat in V8 that ideally should be got rid of (to ensure snapshot accuracy). For us maybe we can be slightly less conservative about the memory raised for now and give it some extra leeway. Hard to say how to get a good number, though. 2x heap size might be a bit too much, but then as embedder I don't think we have any APIs to know about the number of functions in the heap. For starters, maybe It's also worth noting that |
@joyeecheung a question, is there any immediate fix to apply now? I have a node app in production which sometimes get heap OOM crashes and we couldn't find a reproduction step for it. |
I opened #50711 which locally allows some heap snapshot to be generated for the test case (I am not too sure whether the current formula is good, however, it seems to encourage unbound growth). |
Actually, even with the new limit the process can still decide not to generate the snapshot because |
Oh actually I found libuv/libuv#3897, this seems to be specific to macOS. I guess we can skip the check on macOS for now and reference that issue. When that gets fixed, we can remove the skip. |
Over at Julia, we had a user create a tool and format for streaming out the required data into multiple files that needed very little memory overhead to write, and then in a separate process to reassemble them into the heap profile format for chrome devtools. I thought I would provide this info, in case someone finds it motivating to change the nodejs implementation to use the same tricks: JuliaLang/julia#52854 |
@vtjnash Thanks for the tip! I am not very familiar with the implementation of Julia, do you generate a heap snapshot from your own heap? I think for the problems we see in Node.js, the problem happens more on the V8 side (the part where the JS heap gets iterated and converted to an in-memory snapshot is controlled by V8 and there's currently no way to stream it, the only part that can be streamed is writing this in-memory format to a JSON on disk). |
By the way V8 recently added |
Yes, the Julia implementation is separate, and some of the work would need to be done in the vendored copy of V8. I just wanted to bring to your attention that it is possible to implement a streaming iterator which does not need as much extra address space as the in-memory version. |
Version
v20.9.0
Platform
linux 6.2.7-060207-generic
Subsystem
No response
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
It always happens
What is the expected behavior? Why is that the expected behavior?
I expected node to generate a heap snapshot when the memory usage approached the specified limit, allowing analysis of memory allocation before the process was terminated duo to heap OOM
What do you see instead?
Node uses all of the host system resources, and the Node process is killed by the operating system before a heap snapshot can be generated, preventing analysis of the memory usage pattern that leads to the crash.
Additional information
I have 16GB of ram and it should be enough to generate a heap snapshot of a node process with 1GB use of heap
The text was updated successfully, but these errors were encountered: