Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node process killed by os during heap snapshot due to OOM #50711

Open
erfanium opened this issue Nov 13, 2023 · 9 comments
Open

node process killed by os during heap snapshot due to OOM #50711

erfanium opened this issue Nov 13, 2023 · 9 comments

Comments

@erfanium
Copy link

erfanium commented Nov 13, 2023

Version

v20.9.0

Platform

linux 6.2.7-060207-generic

Subsystem

No response

What steps will reproduce the bug?

const memoryHog = [];

setInterval(async () => {
  const mem = process.memoryUsage();
  console.log(mem);
  memoryHog.push(new Array(20000000).fill({ foo: Date.now() }));
}, 5000);
node --heapsnapshot-near-heap-limit=800 --max-old-space-size=1000 script.js

How often does it reproduce? Is there a required condition?

It always happens

What is the expected behavior? Why is that the expected behavior?

I expected node to generate a heap snapshot when the memory usage approached the specified limit, allowing analysis of memory allocation before the process was terminated duo to heap OOM

What do you see instead?

Node uses all of the host system resources, and the Node process is killed by the operating system before a heap snapshot can be generated, preventing analysis of the memory usage pattern that leads to the crash.

Additional information

I have 16GB of ram and it should be enough to generate a heap snapshot of a node process with 1GB use of heap

@joyeecheung
Copy link
Member

joyeecheung commented Nov 13, 2023

I can reproduce locally. To correct the OP a bit - the process isn't killed because it uses all of the host system resources, it is killed because the extra leeway that we give it to generate the heap snapshot isn't enough because V8 allocates extra heap memory to cache the calculated line ends during heap snapshot generation - something that we weren't aware of when implementing --heapsnapshot-near-heap-limit, the advice we got was that adding the maxium size of the young generation to accommodate promotion should be enough.

The new-line-ends-cache-during-heap-snapshot-generation thing is a current caveat in V8 that ideally should be got rid of (to ensure snapshot accuracy). For us maybe we can be slightly less conservative about the memory raised for now and give it some extra leeway. Hard to say how to get a good number, though. 2x heap size might be a bit too much, but then as embedder I don't think we have any APIs to know about the number of functions in the heap. For starters, maybe max(max_young_gen_size, 0.5 * old_gen_size) is a better estimate.

It's also worth noting that --heapsnapshot-near-heap-limit only operates on a best-effort basis. It's not guaranteed that heap snapshots must be generated. It only tries its best to do so without raising the limit too much.

@erfanium
Copy link
Author

erfanium commented Nov 13, 2023

@joyeecheung a question, is there any immediate fix to apply now?

I have a node app in production which sometimes get heap OOM crashes and we couldn't find a reproduction step for it.
I wanted to use --heapsnapshot-near-heap-limit. but this option is not working for us (as i described in the issue)

@joyeecheung
Copy link
Member

I opened #50711 which locally allows some heap snapshot to be generated for the test case (I am not too sure whether the current formula is good, however, it seems to encourage unbound growth).

@joyeecheung
Copy link
Member

Actually, even with the new limit the process can still decide not to generate the snapshot because uv_get_available_memory() or uv_get_free_memory() returns a fairly low number under pressure, so it will just consider heap snapshot generation too risky and skips it. For example with #50718 and the snippet in the OP, I get 50~80MB from uv_get_available_memory() on macOS (which should just be the same as uv_get_free_memory() there), even though I have ~4GB memory left in the system. Maybe @nodejs/libuv knows whether this is a known issue or there is a less conservative way to decide about the bailout.

@joyeecheung
Copy link
Member

joyeecheung commented Nov 14, 2023

Oh actually I found libuv/libuv#3897, this seems to be specific to macOS. I guess we can skip the check on macOS for now and reference that issue. When that gets fixed, we can remove the skip.

@vtjnash
Copy link
Contributor

vtjnash commented Mar 2, 2024

Over at Julia, we had a user create a tool and format for streaming out the required data into multiple files that needed very little memory overhead to write, and then in a separate process to reassemble them into the heap profile format for chrome devtools. I thought I would provide this info, in case someone finds it motivating to change the nodejs implementation to use the same tricks: JuliaLang/julia#52854

@joyeecheung
Copy link
Member

@vtjnash Thanks for the tip! I am not very familiar with the implementation of Julia, do you generate a heap snapshot from your own heap? I think for the problems we see in Node.js, the problem happens more on the V8 side (the part where the JS heap gets iterated and converted to an in-memory snapshot is controlled by V8 and there's currently no way to stream it, the only part that can be streamed is writing this in-memory format to a JSON on disk).

@joyeecheung
Copy link
Member

By the way V8 recently added --heap-snapshot-on-oom flag which works better than the one implemented in Node.js since it doesn't need to go through the back and forth of heap limit estimation - raising limits temporarily - bringing the limit back down to actually crash. V8 can simply start the write internally once it thinks the limit is reached and a full GC has been done.

@vtjnash
Copy link
Contributor

vtjnash commented Jun 5, 2024

Yes, the Julia implementation is separate, and some of the work would need to be done in the vendored copy of V8. I just wanted to bring to your attention that it is possible to implement a streaming iterator which does not need as much extra address space as the in-memory version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants