Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any hints at tracking memory usage/leaks in node 12 when using node-addon-api? #498

Closed
bruce-solar opened this issue Jun 15, 2019 · 15 comments

Comments

@bruce-solar
Copy link

Our application uses the node-addon-api to interface to and abstract functions in an external .so

I am looking for some hints as to how track down memory leaks, or what additional information I could provide, in order to isolate why, for node 12 only, our application uses significantly more memory and leaks memory as well.

Here's a graph to get a rough idea of what's going on. The total X-axis represents 6 hours.

image

The left hand section is running node 8, the middle section node 10, and the right hand section node 12. The orange line at the bottom is the application running without our agent running in the application (so no .so file, wrappers, or agent code). The application is a test-bed todo application using express (derived from the todomvc-mongodb package).

I know this isn't enough information to solve the problem - I'm just looking for some pointers and steps to take.

@kevinGodell
Copy link

I don't know if you have taken this step yet, but I would first expose node's garbage collection, then call it at some interval to verify if the memory usage continues to grow.

  1. start the app node --expose-gc your-app.js
  2. trigger garbage collection global.gc()

@bruce-solar
Copy link
Author

bruce-solar commented Jun 16, 2019

@kevinGodell - I would never have thought of that - have you known cases where garbage collection fails to run on its own? In any case, I implemented your suggestion, calling global.gc on a 60 second interval. The data loss is practically the same (this is a 3 hour window vs. the previous graph being 6 hour with only a portion using node 12).

image

@kevinGodell
Copy link

I have a situation where I am creating a napi buffer from a uint8 array and the memory usage grows very high in nodejs before garbage collection kicks in and cleans up. If I run gc() often, then the memory usage stays low. I have not solved it yet.

@mhdawson
Copy link
Member

If you can narrow down to a small enough test case (for example maybe running the unit tests for your module) and then running under valgrind that can help identify native memory leaks.

It might also be useful to use the nightly builds for 12.x: https://nodejs.org/download/nightly/ to see if the behavior changed along the way as that would help narrow down what changes in Node.js could be related.

@mhdawson
Copy link
Member

@kevinGodell for your issue you might try using

// Memory management
NAPI_EXTERN napi_status napi_adjust_external_memory(napi_env env,
int64_t change_in_bytes,
int64_t* adjusted_value);

To let the gc know about the native memory you are using. This is intended to give the gc more info about native memory usage so it knows when is might night to gc. You would need to make sure you adjust down as well when the buffers are freed.

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Jun 18, 2019

tl;dr - the big jump in memory usage is between v12 nightlies 20190201 and 20190202. my best guess is that it's due to src/api/environment.cc changing the amount of memory v8 should allocate. it looks like there is a slow memory leak independent of that big jump that's in my court.

details
v12.0.0-nightly201902017c9fba30ef - works as expected, same as all prior versions of node
v12.0.0-nightly201902020f8e8f7c6b - sees dramatic increase in memory used

currently i have not been able to reproduce this running on my local machine; i see the problem consistently when testing in docker containers - both debian stretch and alpine 3.9.4.

there are a lot of changes there; i doubt that anything stands out for you. my application doesn't really use anything advanced, and it's possible that i'm using things in a way they were intended. the application is appoptics-bindings and primarily uses a couple classes and a number of namespaced functions. no threads, no callbacks, etc.

if anything stands out to you @mhdawson let me know. in the meantime, i'll continue trying to get a reproducible case locally (odd it only happens in a docker container) and taking a look through the changes from 2019-02-01 and 2019-02-02.

i've appended "- x" to the changes that look most relevant to the dramatic increase in memory. src/api/environment.cc looks like it could account for the initial big bump in memory but i'm not sure how to explain the difference between docker and my local tests - both have the same amount of total memory (though docker has more available than i do locally). and because we do module patching lib/internal/boostrap/loaders.js is a candidate but it's going to take more study.

$ git diff --name-only 7c9fba30ef 0f8e8f7c6b
Makefile
common.gypi
deps/v8/BUILD.gn
deps/v8/PRESUBMIT.py
deps/v8/src/third_party/siphash/LICENSE
deps/v8/src/third_party/siphash/halfsiphash.cc
deps/v8/src/third_party/siphash/halfsiphash.h
deps/v8/src/utils.h
deps/v8/test/cctest/test-code-stub-assembler.cc
deps/v8/test/cctest/test-hashcode.cc
doc/api/deprecations.md
doc/api/http.md
doc/api/report.md
doc/api/tls.md
lib/.eslintrc.yaml
lib/_http_server.js
lib/_tls_wrap.js
lib/https.js
lib/internal/bootstrap/cache.js
lib/internal/bootstrap/loaders.js - x
lib/internal/bootstrap/node.js
lib/internal/bootstrap/primordials.js
lib/internal/error-serdes.js
lib/internal/modules/esm/module_job.js
lib/internal/modules/esm/module_map.js
lib/internal/modules/esm/translators.js
lib/internal/policy/manifest.js
lib/internal/process/report.js
lib/internal/safe_globals.js
lib/internal/trace_events_async_hooks.js
lib/internal/util/inspector.js
lib/internal/vm/source_text_module.js
lib/internal/worker.js
lib/url.js
node.gyp
src/api/environment.cc - x (memory)
src/env-inl.h
src/env.cc
src/env.h
src/js_stream.h
src/node.cc
src/node.h
src/node_config.cc
src/node_crypto.cc
src/node_messaging.cc
src/node_native_module.cc
src/node_options.cc
src/node_options.h
src/node_process_object.cc
src/node_report.cc
src/node_worker.cc
src/tls_wrap.cc
test/addons/async-hello-world/binding.gyp
test/addons/async-hooks-id/binding.gyp
test/addons/async-hooks-promise/binding.gyp
test/addons/async-resource/binding.gyp
test/addons/at-exit/binding.gyp
test/addons/buffer-free-callback/binding.gyp
test/addons/callback-scope/binding.gyp
test/addons/common.gypi
test/addons/dlopen-ping-pong/binding.gyp
test/addons/errno-exception/binding.gyp
test/addons/heap-profiler/binding.gyp
test/addons/hello-world-esm/binding.gyp
test/addons/hello-world-function-export/binding.gyp
test/addons/hello-world/binding.gyp
test/addons/load-long-path/binding.gyp
test/addons/make-callback-domain-warning/binding.gyp
test/addons/make-callback-recurse/binding.gyp
test/addons/make-callback/binding.gyp
test/addons/new-target/binding.gyp
test/addons/node-module-version/binding.gyp
test/addons/non-node-context/binding.gyp
test/addons/not-a-binding/binding.gyp
test/addons/null-buffer-neuter/binding.gyp
test/addons/openssl-binding/binding.gyp
test/addons/openssl-client-cert-engine/binding.gyp
test/addons/parse-encoding/binding.gyp
test/addons/repl-domain-abort/binding.gyp
test/addons/stringbytes-external-exceed-max/binding.gyp
test/addons/symlinked-module/binding.gyp
test/addons/uv-handle-leak/binding.gyp
test/addons/worker-addon/binding.gyp
test/addons/zlib-binding/binding.gyp
test/common/index.js
test/common/report.js
test/known_issues/test-inspector-cluster-port-clash.js
test/parallel/test-cli-bad-options.js
test/parallel/test-cli-node-print-help.js
test/parallel/test-http-header-overflow.js
test/parallel/test-https-client-resume.js
test/parallel/test-module-cjs-helpers.js
test/parallel/test-process-env-allowed-flags.js
test/parallel/test-process-features.js
test/parallel/test-repl-tab-complete.js
test/parallel/test-tls-async-cb-after-socket-end.js
test/parallel/test-tls-client-resume.js
test/parallel/test-tls-ticket-cluster.js
test/parallel/test-tls-ticket.js
test/parallel/test-v8-coverage.js
test/parallel/test-worker-message-port-transfer-duplicate.js
test/root.status
test/sequential/test-async-wrap-getasyncid.js
test/sequential/test-debugger-debug-brk.js
test/sequential/test-inspector-has-inspector-false.js
tools/doc/addon-verify.js
tools/test.py

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Jun 18, 2019

Graphical representation of the data:

The read/green lines at the bottom are the ones of interest. The blue/purple lines are running node 12.4 on alpine because I wasn't able to use node-nightly to load/run specific nightlies on alpine.

green line - v12.0.0-nightly20181024bb79e768e5
red line -

    v12.0.0-nightly20190129aa8fcee7ce - leftmost
    v12.0.0-nightly201902017c9fba30ef - middle
    v12.0.0-nightly201902020f8e8f7c6b - rightmost

as you can see the 0202 nightly spikes.

image

it looks like there is a slow leak independent of the spike but that doesn't appear to have anything to do with node.

@mhdawson
Copy link
Member

This is the list of commits between those 2:

-sh-4.2$ git log --pretty=oneline 7c9fba30ef...0f8e8f7c6b
0f8e8f7c6b9e7a8bdae53c831f37b2034d1c9fa7 tls: introduce client 'session' event
e1aa9438ead2093a536e5981da7097c9196e7113 tools: add test-all-suites to Makefile
5e0a3261f0e61b50f03e65e0134d4a01475c0cf1 tools: make test.py Python 3 compatible
dee9a61bb97bce0c75afcaffb1ae7fec81020ac4 src: remove unused AsyncResource constructor in node.h
8c8144e51ce3ae2a521668b45d5afb21be8564e3 src: remove unused method in js_stream.h
fa5e09753055edfe0e9a0f700dfcb4f356ac3c9d process: move DEP0062 (node --debug) to end-of-life
154efc9bdef3ba8df5d3dfe3b32102baf3ac4311 process: exit on --debug and --debug-brk after option parsing
c369b3e9297d11f1b761409ce7b429c5de9dcb92 test: exclude additional test for coverage
406329de577d9604e8b1dce9c5a5161e8cb33e25 process: move process mutation into bootstrap/node.js
c2359bdad62b83d40976d91e91097684c23a7ae3 process: expose process.features.inspector
39d922123c02aecc2e289a08e3bdb9515a7b193a lib: save primordials during bootstrap and use it in builtins
1d996f58af3067617a67c0af8f86f014ed4d139c src: properly configure default heap limits
d0d84b009ce4f2fe274568ed39754c02167e27d3 report: separate release metadata
393c19660510f3cd1ac3f9445747ec4c32ec224f worker: refactor thread id management
bcf2886a84407028572fd1084242a1c789c056f8 http: return HTTP 431 on HPE_HEADER_OVERFLOW error
a861adde3bc22dec07e67f199be5f2c2aa226b44 test: allow coverage threshold to be enforced
0ff0af534ef150820ac218b6ef3614dc199de823 worker: throw for duplicates in transfer list
7c8ac5a01b4ba5d4c7060875ea024e6efbc12893 deps: cherry-pick c736883 from upstream V8
f4510c4148b50b47ac22fdb5331ce726b63b8525 test, tools: suppress addon function cast warnings
80873ec3c2e18c151ddf1c0d79461c48d367206f crypto: fix public key encoding name in comment

@mhdawson
Copy link
Member

This one:

 1d996f58af3067617a67c0af8f86f014ed4d139c src: properly configure default heap limits

would affect the default limits set to be used by the gc and could be affected by where you run (you mention issue is only seen under docker). You might try setting the max sizes with --max-old-space-size and see if that has any effect.

Nothing else really stands out and nothing looks related to N-API.

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Jun 18, 2019

thanks for wading through this @mhdawson - you got to this before i added the tl;dr. looks like my problem from here on out. the large bump associated with the default heap limits seems to have been a red herring.

@mhdawson
Copy link
Member

@bmacnaughton I assume this issue can be closed. If so can you just confirm if the problem was related to the change in the default heap limits or something else?

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Jun 18, 2019

The issue I was looking for help on was the large jump in memory usage which was caused by the change in default heap limits. It can be closed - thanks. If I find something else I'll enter a new issue that isn't conflated with new heap allocations.

@mhdawson
Copy link
Member

@bmacnaughton thanks for the confirmation. Closing

@bmacnaughton
Copy link
Contributor

@mhdawson - i know you have a million things going on but, for closure (and the wild hope that you'll say "ah, i know what it is"), here's my follow-on posting about the memory leak.

nodejs/node#28420

@mhdawson
Copy link
Member

mhdawson commented Jul 3, 2019

@bmacnaughton I had taken a quick read of that one. Sorry I don't have an "aha".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants