memory leak starting with node v11.0.0-nightly20180908922a1b03b6 #28420

bmacnaughton · 2019-06-25T12:08:08Z

Version: node v11.0.0-nightly20180908922a1b03b6 and all subsequent releases
Platform: Linux fd989fc725a7 4.15.0-1031-aws What to do about a mailing list? #33-Ubuntu SMP Fri Dec 7 09:32:27 UTC 2018 x86_64 GNU/Linux
Subsystem:

uname reports the host container but the application is running in a docker container, from /etc/os-release:

PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian

The leak also occurs running alpine 3.9.4 in a container. I have not been able to reproduce it outside of a docker container running on a native ubuntu 18.04 machine.

I am looking for some guidance on steps I can take to help narrow this down; I don't believe that the information I currently have is enough to isolate the problem.

The situation occurs running a memory/cpu benchmark of a todo application instrumented by our APM product. The APM product comprises JavaScript, a C/C++ library, and C++ code that uses the node-addon-api. The todo application is derived from todomvc-mongodb and runs express and mongo.

The graph below shows runs against two consecutive nightly releases (the scale changes so that the increased memory usage isn't off the top of the chart). node v11.0.0-nightly20180907e917a23d2e shows no memory leak while node v11.0.0-nightly20180908922a1b03b6 shows a stair-step memory leak. I took a quick look at the commits between the two nightlies but nothing stood out to me other than the v8 changes. I don't know enough of v8 to evaluate them. (6.5.1 and 6.6.0 are the versions of our agent being tested. and at one time the blue line was purple but the legend didn't change.)

Here is output from valgrind for an 11 hour run. This is the first time I've used valgrind so I don't have much to compare it with, but nothing stands out to me - there's nothing that looks like it could account for the memory growth shown in the second graph. Hopefully it will mean more to you.

^C==12927== 
==12927== Process terminating with default action of signal 2 (SIGINT)
==12927==    at 0x5DE3469: syscall (syscall.S:38)
==12927==    by 0xA50AA9: uv__epoll_wait (linux-syscalls.c:321)
==12927==    by 0xA4E96D: uv__io_poll (linux-core.c:289)
==12927==    by 0xA3E3AA: uv_run (core.c:370)
==12927==    by 0x96F8E7: node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5AE94A3: start_thread (pthread_create.c:456)
==12927==    by 0x5DE7D0E: clone (clone.S:97)
==12927== 
==12927== HEAP SUMMARY:
==12927==     in use at exit: 31,449,945 bytes in 40,429 blocks
==12927==   total heap usage: 486,096,918 allocs, 486,056,489 frees, 183,817,581,480 bytes allocated
==12927== 
==12927== 8 bytes in 1 blocks are possibly lost in loss record 251 of 2,853
==12927==    at 0x4C2E5F9: memalign (vg_replace_malloc.c:909)
==12927==    by 0x4011F57: allocate_and_init (dl-tls.c:603)
==12927==    by 0x4011F57: tls_get_addr_tail (dl-tls.c:791)
==12927==    by 0x4017137: __tls_get_addr (tls_get_addr.S:55)
==12927==    by 0xA3C3C32: boost::asio::async_result<boost::asio::handler_type<urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >, void (boost::system::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>)>::type>::type boost::asio::ip::resolver_service<boost::asio::ip::tcp>::async_resolve<urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > > >(boost::shared_ptr<void>&, boost::asio::ip::basic_resolver_query<boost::asio::ip::tcp> const&, urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > > const&) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CB37E: urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >::operator()(boost::system::error_code, boost::asio::ip::basic_resolver_query<boost::asio::ip::tcp> const*) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CB792: void urdl::detail::async_connect<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >(boost::asio::basic_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&, boost::asio::ip::basic_resolver<boost::asio::ip::tcp, boost::asio::ip::resolver_service<boost::asio::ip::tcp> >&, urdl::url const&, urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> >) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CA00F: urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> >::operator()(boost::system::error_code, unsigned long) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CA82B: urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler>::operator()(boost::system::error_code) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3B7EBC: urdl::istreambuf::open(urdl::url const&) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3D6E70: oboe_ssl_reporter::getAWSInstanceId() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3D72D9: oboe_ssl_reporter::refreshHostId() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3D8A49: oboe_ssl_reporter::eventSender() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927== 
==12927== 50 bytes in 1 blocks are possibly lost in loss record 1,689 of 2,853
==12927==    at 0x4C2C4CC: operator new(unsigned long) (vg_replace_malloc.c:344)
==12927==    by 0xAAFA26: v8_inspector::String16::String16(unsigned short const*, unsigned long) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAB3B62: v8_inspector::toString16(v8_inspector::StringView const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAAB954: v8_inspector::InspectedContext::InspectedContext(v8_inspector::V8InspectorImpl*, v8_inspector::V8ContextInfo const&, int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAE7660: v8_inspector::V8InspectorImpl::contextCreated(v8_inspector::V8ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9DEBD1: node::inspector::Agent::ContextCreated(v8::Local<v8::Context>, node::ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x915541: node::contextify::ContextifyContext::CreateV8Context(node::Environment*, v8::Local<v8::Object>, node::contextify::ContextOptions const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9159E5: node::contextify::ContextifyContext::MakeContext(v8::FunctionCallbackInfo<v8::Value> const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9AE5E: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9B968: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x3D74BBB4FC3C: ???
==12927==    by 0x3D74BBB0EA7A: ???
==12927== 
==12927== 62 bytes in 1 blocks are possibly lost in loss record 1,773 of 2,853
==12927==    at 0x4C2C4CC: operator new(unsigned long) (vg_replace_malloc.c:344)
==12927==    by 0xAAFA26: v8_inspector::String16::String16(unsigned short const*, unsigned long) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAB3B62: v8_inspector::toString16(v8_inspector::StringView const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAAB962: v8_inspector::InspectedContext::InspectedContext(v8_inspector::V8InspectorImpl*, v8_inspector::V8ContextInfo const&, int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAE7660: v8_inspector::V8InspectorImpl::contextCreated(v8_inspector::V8ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9DD5C0: node::inspector::Agent::Start(std::string const&, std::shared_ptr<node::DebugOptions>) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EDE8E: node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EC1FB: node::Start(int, char**) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5D1F2E0: (below main) (libc-start.c:291)
==12927== 
==12927== 64 bytes in 1 blocks are possibly lost in loss record 1,869 of 2,853
==12927==    at 0x4C2C4CC: operator new(unsigned long) (vg_replace_malloc.c:344)
==12927==    by 0xAAFA26: v8_inspector::String16::String16(unsigned short const*, unsigned long) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAB3B62: v8_inspector::toString16(v8_inspector::StringView const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAAB962: v8_inspector::InspectedContext::InspectedContext(v8_inspector::V8InspectorImpl*, v8_inspector::V8ContextInfo const&, int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAE7660: v8_inspector::V8InspectorImpl::contextCreated(v8_inspector::V8ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9DEBD1: node::inspector::Agent::ContextCreated(v8::Local<v8::Context>, node::ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x915541: node::contextify::ContextifyContext::CreateV8Context(node::Environment*, v8::Local<v8::Object>, node::contextify::ContextOptions const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9159E5: node::contextify::ContextifyContext::MakeContext(v8::FunctionCallbackInfo<v8::Value> const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9AE5E: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9B968: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x3D74BBB4FC3C: ???
==12927==    by 0x3D74BBB0EA7A: ???
==12927== 
==12927== 88 bytes in 1 blocks are possibly lost in loss record 1,955 of 2,853
==12927==    at 0x4C2C4CC: operator new(unsigned long) (vg_replace_malloc.c:344)
==12927==    by 0xAAFA26: v8_inspector::String16::String16(unsigned short const*, unsigned long) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAB3B62: v8_inspector::toString16(v8_inspector::StringView const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAAB954: v8_inspector::InspectedContext::InspectedContext(v8_inspector::V8InspectorImpl*, v8_inspector::V8ContextInfo const&, int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xAE7660: v8_inspector::V8InspectorImpl::contextCreated(v8_inspector::V8ContextInfo const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x9DD5C0: node::inspector::Agent::Start(std::string const&, std::shared_ptr<node::DebugOptions>) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EDE8E: node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EC1FB: node::Start(int, char**) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5D1F2E0: (below main) (libc-start.c:291)
==12927== 
==12927== 117 bytes in 1 blocks are definitely lost in loss record 2,018 of 2,853
==12927==    at 0x4C2E2B3: realloc (vg_replace_malloc.c:836)
==12927==    by 0x141E9BC: ERR_add_error_vdata (in /bmark/.node-get-run/bin/node)
==12927==    by 0x141EB6A: ERR_add_error_data (in /bmark/.node-get-run/bin/node)
==12927==    by 0x13F8FF7: dlfcn_load (in /bmark/.node-get-run/bin/node)
==12927==    by 0x13F9898: DSO_load (in /bmark/.node-get-run/bin/node)
==12927==    by 0x13F9ECE: DSO_dsobyaddr (in /bmark/.node-get-run/bin/node)
==12927==    by 0x1445AB3: ossl_init_load_crypto_nodelete_ossl_ (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5AF0758: __pthread_once_slow (pthread_once.c:116)
==12927==    by 0x1488F78: CRYPTO_THREAD_run_once (in /bmark/.node-get-run/bin/node)
==12927==    by 0x1446036: OPENSSL_init_crypto (in /bmark/.node-get-run/bin/node)
==12927==    by 0x146E7FC: do_rand_lock_init_ossl_ (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5AF0758: __pthread_once_slow (pthread_once.c:116)
==12927== 
==12927== 304 bytes in 1 blocks are possibly lost in loss record 2,248 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA4B453: uv_thread_create (thread.c:202)
==12927==    by 0x96DBD9: node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x96DDC2: node::NodePlatform::NodePlatform(int, v8::TracingController*) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EC0C1: node::Start(int, char**) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5D1F2E0: (below main) (libc-start.c:291)
==12927== 
==12927== 304 bytes in 1 blocks are possibly lost in loss record 2,249 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0x9DD872: node::inspector::Agent::Start(std::string const&, std::shared_ptr<node::DebugOptions>) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EDE8E: node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EC1FB: node::Start(int, char**) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5D1F2E0: (below main) (libc-start.c:291)
==12927== 
==12927== 320 bytes in 1 blocks are possibly lost in loss record 2,259 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA4135B0: boost::thread::start_thread_noexcept() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3DD3A1: oboe_ssl_reporter::oboe_ssl_reporter(boost::shared_ptr<apache::thrift::transport::TSSLSocketFactory>, char const*, char const*, int, int, int, int, unsigned long) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3DEB32: oboe_reporter_init_ssl (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3B1CE1: oboe_init_reporter (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA0B0CED: oboeInit(Napi::CallbackInfo const&) (in /bmark/node_modules/appoptics-bindings/build/Release/appoptics-bindings.node)
==12927==    by 0xA0B408D: Napi::details::CallbackData<Napi::Value (*)(Napi::CallbackInfo const&), Napi::Value>::Wrapper(napi_env__*, napi_callback_info__*) (in /bmark/node_modules/appoptics-bindings/build/Release/appoptics-bindings.node)
==12927==    by 0x8EF8E4: (anonymous namespace)::v8impl::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9AE5E: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /bmark/.node-get-run/bin/node)
==12927== 
==12927== 320 bytes in 1 blocks are possibly lost in loss record 2,260 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA4135B0: boost::thread::start_thread_noexcept() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3DD4B8: oboe_ssl_reporter::oboe_ssl_reporter(boost::shared_ptr<apache::thrift::transport::TSSLSocketFactory>, char const*, char const*, int, int, int, int, unsigned long) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3DEB32: oboe_reporter_init_ssl (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3B1CE1: oboe_init_reporter (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA0B0CED: oboeInit(Napi::CallbackInfo const&) (in /bmark/node_modules/appoptics-bindings/build/Release/appoptics-bindings.node)
==12927==    by 0xA0B408D: Napi::details::CallbackData<Napi::Value (*)(Napi::CallbackInfo const&), Napi::Value>::Wrapper(napi_env__*, napi_callback_info__*) (in /bmark/node_modules/appoptics-bindings/build/Release/appoptics-bindings.node)
==12927==    by 0x8EF8E4: (anonymous namespace)::v8impl::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9AE5E: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /bmark/.node-get-run/bin/node)
==12927== 
==12927== 320 bytes in 1 blocks are possibly lost in loss record 2,261 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA3C3AF1: boost::asio::detail::posix_thread::posix_thread<boost::asio::detail::resolver_service_base::work_io_service_runner>(boost::asio::detail::resolver_service_base::work_io_service_runner, unsigned int) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3C3EEE: boost::asio::async_result<boost::asio::handler_type<urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >, void (boost::system::error_code, boost::asio::ip::basic_resolver_iterator<boost::asio::ip::tcp>)>::type>::type boost::asio::ip::resolver_service<boost::asio::ip::tcp>::async_resolve<urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > > >(boost::shared_ptr<void>&, boost::asio::ip::basic_resolver_query<boost::asio::ip::tcp> const&, urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > > const&) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CB37E: urdl::detail::connect_coro<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >::operator()(boost::system::error_code, boost::asio::ip::basic_resolver_query<boost::asio::ip::tcp> const*) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CB792: void urdl::detail::async_connect<urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> > >(boost::asio::basic_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&, boost::asio::ip::basic_resolver<boost::asio::ip::tcp, boost::asio::ip::resolver_service<boost::asio::ip::tcp> >&, urdl::url const&, urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> >) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CA00F: urdl::detail::http_read_stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::open_coro<urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler> >::operator()(boost::system::error_code, unsigned long) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3CA82B: urdl::read_stream::open_coro<urdl::detail::istreambuf_open_handler>::operator()(boost::system::error_code) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3B7EBC: urdl::istreambuf::open(urdl::url const&) (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927==    by 0xA3D6E70: oboe_ssl_reporter::getAWSInstanceId() (in /bmark/node_modules/appoptics-bindings/oboe/liboboe-1.0-x86_64.so.0.0.0)
==12927== 
==12927== 960 bytes in 3 blocks are possibly lost in loss record 2,492 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA4B453: uv_thread_create (thread.c:202)
==12927==    by 0xA394CA: init_threads (threadpool.c:164)
==12927==    by 0xA394CA: init_once (threadpool.c:191)
==12927==    by 0x5AF0758: __pthread_once_slow (pthread_once.c:116)
==12927==    by 0xA4B6E8: uv_once (thread.c:363)
==12927==    by 0xA395F5: uv__work_submit (threadpool.c:199)
==12927==    by 0xA43E7D: uv_getaddrinfo (getaddrinfo.c:187)
==12927==    by 0x8B7CDB: node::cares_wrap::(anonymous namespace)::GetAddrInfo(v8::FunctionCallbackInfo<v8::Value> const&) (in /bmark/.node-get-run/bin/node)
==12927==    by 0xB9AE5E: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /bmark/.node-get-run/bin/node)
==12927== 
==12927== 1,216 bytes in 4 blocks are possibly lost in loss record 2,543 of 2,853
==12927==    at 0x4C2E0BC: calloc (vg_replace_malloc.c:762)
==12927==    by 0x4011E31: allocate_dtv (dl-tls.c:322)
==12927==    by 0x40127BD: _dl_allocate_tls (dl-tls.c:539)
==12927==    by 0x5AEA189: allocate_stack (allocatestack.c:584)
==12927==    by 0x5AEA189: pthread_create@@GLIBC_2.2.5 (pthread_create.c:663)
==12927==    by 0xA4B453: uv_thread_create (thread.c:202)
==12927==    by 0x96DCA7: node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x96DDC2: node::NodePlatform::NodePlatform(int, v8::TracingController*) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x8EC0C1: node::Start(int, char**) (in /bmark/.node-get-run/bin/node)
==12927==    by 0x5D1F2E0: (below main) (libc-start.c:291)
==12927== 
==12927== LEAK SUMMARY:
==12927==    definitely lost: 117 bytes in 1 blocks
==12927==    indirectly lost: 0 bytes in 0 blocks
==12927==      possibly lost: 4,016 bytes in 17 blocks
==12927==    still reachable: 31,445,812 bytes in 40,411 blocks
==12927==                       of which reachable via heuristic:
==12927==                         stdstring          : 209,704 bytes in 429 blocks
==12927==                         newarray           : 102,040 bytes in 152 blocks
==12927==         suppressed: 0 bytes in 0 blocks
==12927== Reachable blocks (those to which a pointer was found) are not shown.
==12927== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12927== 
==12927== For lists of detected and suppressed errors, rerun with: -s
==12927== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)

What can I do next to help isolate this?

The text was updated successfully, but these errors were encountered:

addaleax · 2019-06-25T13:50:07Z

Thanks for the report!

nothing stood out to me other than the v8 changes.

I agree, looking at e917a23...922a1b0 that seems like the only reasonable source of issues – and V8 updates are big changes, so this is likely the culprit.

Here is output from valgrind for an 11 hour run. This is the first time I've used valgrind so I don't have much to compare it with, but nothing stands out to me - there's nothing that looks like it could account for the memory growth shown in the second graph.

I agree 👍 That valgrind shows nothing of interest makes it less likely that this is a memory leak in C++, so that’s already helpful.

What can I do next to help isolate this?

The fact that this is unlikely to be a C++ memory leak makes it more likely that this is one in JS; valgrind doesn’t track memory allocated using mmap() & co. by default, which is what V8 uses for the JS heap.

First, if possible, it would be good to verify that this is the case. process.memoryUsage(), require('v8').getHeapStatistics() and require('v8').getHeapSpaceStatistics() all provide potentially useful information about the memory region in which the leak occurs.

Secondly, if possible, taking heap dumps at different points in time and comparing then in Chrome DevTools should help a lot with figuring out what objects are being retained and why, if this is a JS memory leak. For taking heap dumps you can use e.g. https://www.npmjs.com/package/heapdump, or, if that’s more convenient, directly attach DevTools to the process.

Given that this is likely a V8 bug, you may want to run the process with --expose-gc and manually perform global.gc() calls from time to time – you obviously shouldn’t do this in production, and it’s unlikely to be helpful, but if it is, then we know it’s likely an issue with V8 detecting when to run GC, and not a real memory leak, which would also be helpful.

Finally, it would also be good to know if a more recent version of Node.js (e.g. current nightlies) has the same issue. V8 issues are often caught and later addressed, and it may be the case that the solution is to figure out which V8 commit fixed this issue and then backport that to the relevant Node.js versions.

The APM product comprises JavaScript, a C/C++ library, and C++ code that uses the node-addon-api.

If this does turn out to be an issue with one of the C++ parts (e.g. by seeing an increase in RSS but not the heap as reported by process.memoryUsage()), it can still be useful to take heap dumps and compare them; I would expect that most resources allocated by addons are either tied to JS objects or would be reported by valgrind.

I hope this helps!

bmacnaughton · 2019-06-25T18:48:49Z

it would also be good to know if a more recent version of Node.js (e.g. current nightlies) has the same issue.

v13.0.0-nightly20190625e2d445be8f still exhibits the problem. I am in the process of adding the additional metrics you suggested (v8 heap statistics and process.memoryUsage beyond RSS) and will run again with that information available.

Here are the graphs using v13.0.0-nightly20190625e2d445be8f (the righthand chart is process.memoryUsage().rss but that's the only property the current app reports).

I am proceeding with the additional memory metrics and the heapdump approach as well as checking whether deliberately invoking gc helps.

bmacnaughton · 2019-06-26T00:07:22Z

graphs for process.memoryUsage() and v8.getHeapStatistic() - I haven't included v8.getHeapSpaceStatistics() due to the array not being compatible with the metrics reporting API I'm using. If that additional information is important I can dump them to a CSV file and graph.

These runs were executed with garbage collection being called once every minute.

left chart top-to-bottom (leaves out heap_size_limit and total_available_size):
total_heap_size/total_physical_size (appear as one line on top)
used_heap_size (orange)
peak_malloced_memory
total_heap_size_executable
malloced_memory
number_of_native_contexts

right chart top-to-bottom:
rss
heap
heapUsed
external

addaleax · 2019-06-26T00:26:35Z

@bmacnaughton It’s not fully obvious from the graphs, but I’d said that this does look like a JS memory leak because the heap size and the RSS curves in the right graph do about the same thing?

bmacnaughton · 2019-06-26T00:32:49Z

@addaleax - when you say a JS memory leak you mean v8, right? If yes, it seems that way to me. I'm adding heapdump code to the todo server now; I'll probably dump every 10 minutes or so.

I'm going to let this run a while longer before starting the version that will call heapdump every 10 minutes. I'd like to get a little longer view. I will also turn off the forced garbage collection for the next run to have a comparison to this run.

btw, the v8 heap total_available_size is slowly going down. see attached chart. I put these two values in a separate chart because their scale was very different. the top line (which is flat but is a bit of an optical illusion) is the heap_size_limit. the bottom line is the total_available_size.

bmacnaughton · 2019-06-26T10:26:36Z

OK, here's charts at the end of the previous run. It's looking pretty similar to the previous runs. The gc doesn't prevent the memory loss; it would take more work to determine whether it slows down the rate of loss. I'm going to pass on that now.

The v8.total_available_size is now down to 1.4 GB; it started at 1.5 GB. The heap_size_limit is 1.518 GB (constant)

Charts:

I'm starting a run with heap dumps every 10 minutes.

mayinghan · 2019-06-27T03:43:53Z

Hi, may I ask what did you use to generate the memory usage graph?

bmacnaughton · 2019-06-27T10:03:03Z

The data is coming from the JavaScript calls process.memoryUsage() and v8.getHeapStatistics(). I'm sending those to the appoptics metrics api at api.appoptics.com/v1/measurements. and they're available at my.appoptics.com in dashboards/metrics. The todo server app (my interactive test harness, not really meant for general application) is at github.com/bmacnaughton/todo. the metrics generation code is in server.js (search for argv.metrics) and lib/metrics.js.

@addaleax - I have almost 20 hours of heapdump snapshots at this time - one every 15 minutes. I haven't looked at them before but am just starting to look now; I'll see what I learn. If they would be helpful to you I can post all or specific time periods.

bmacnaughton · 2019-06-27T11:03:57Z

comparison a little over 13 hours apart. any thoughts about what specifically to look for would be appreciated.

maybe this is helpful?

bmacnaughton · 2019-06-27T11:38:25Z

Oh, the charts that can help place the snapshots in time (heap snapshots are UTC while charts are U.S. Pacific).

addaleax · 2019-06-27T12:17:28Z

@bmacnaughton You’re on the right track – you typically want to look at the retainers of objects created later in time, because those are more likely part of the memory leak. (The map for Array objects, which iiuc is what you are looking at in your screenshot, is something that is expected to live forever – unless maybe you are creating a lot of vm.Contexts?).

bmacnaughton · 2019-06-27T13:19:39Z

@addaleax I am not creating any vm.Contexts (that I am aware of). I am trying to investigate:

and within that, system@2499219 - that is the array you see (I think). Is there a good doc on understanding what is being shown here? e.g., what @55515 means after an array entry, etc.?

bmacnaughton · 2019-06-27T13:32:05Z

@addaleax - OK, here's a wag based on a guess. I'm guessing the @Number is sequencing of some sort so that larger numbers imply later creation. If that's workable then this object is later in time and it's retainer is related to our use of async_hooks. That seems like a likely candidate to be impacted by a v8 change. (Sorry for the miserable formatting - there's too much for a screen shot and the copy didn't format well.)

mapinObject@2941061
4100in(internal array)[]@2889197 | 6 | 229 4160 % | 1 689 2641 % |  
tableinMap@226307 | 5 | 320 % | 1 689 2961 % |  
_contextsinNamespace@236189 | 4 | 1360 % | 1360 % |  
ao-cls-contextinObject@168663 | 3 | 560 % | 560 % |  
namespacesinprocess@1239 | 2 | 240 % | 21 1520 % |  
process_objectinNode / Environment@108006912🗖 | 1 | 2 3840 % | 49 2690 % |  
processinsystem / Context@168883 | 3 | 3280 % | 6640 % |  
processinsystem / Context@168881 | 3 | 2800 % | 2 3040 % |  
processinsystem / Context@168879 | 3 | 2080 % | 6320 % |  
processinsystem / Context@168839 | 3 | 3920 % | 1 0640 % |  
_processinsystem / Context@212835 | 3 | 640 % | 640 % |  
processinsystem / Context@136221 | 3 | 1840 % | 1 8240 % |  
processinsystem / Context@70437 | 3 | 2480 % | 3 4000 % |  
9in@175189 | 4 | 2 4320 % | 5 9360 % |  
processinsystem / Context@296149 | 4 | 1360 % | 1 0880 % |  
processinsystem / Context@179163 | 4 | 1840 % | 5120 % |  
objectinsystem / Context@170107 | 4 | 720 % | 720 % |  
processinsystem / Context@168529 | 4 | 2560 % | 2 0640 % |  
processinsystem / Context@149249 | 4 | 1120 % | 1120 % |  
processinsystem / Context@148233 | 4 | 5440 % | 10 0800 % |  
processinsystem / Context@169109 | 4 | 3840 % | 4160 % |  
processinsystem / Context@170047 | 4 | 4640 % | 1 8720 % |  
processinsystem / Context@28599 | 4 | 2720 % | 2 8000 % |  
processinsystem / Context@169087 | 5 | 960 % | 960 % |  
processinsystem / Context@169079 | 5 | 1360 % | 3920 % |  
processinsystem / Context@116733 | 5 | 6000 % | 4 4880 % |  
processinsystem / Context@146687 | 5 | 2320 % | 9920 % |  
processinsystem / Context@64745 | 5 | 960 % | 2 1920 % |  
processinsystem / Context@68209 | 5 | 5440 % | 3 9040 % |  
9in@179989 | 6 | 1 5040 % | 2 8640 % |  
4in@171337 | 6 | 3 9040 % | 7 5600 % |  
processinsystem / Context@171335 | 6 | 1920 % | 5280 % |  
processinsystem / Context@214235 | 6 | 1840 % | 9440 % |  
processinsystem / Context@213851 | 6 | 800 % | 1 1360 % |  
processinsystem / Context@106515 | 6 | 2400 % | 1 0080 % |  
processinsystem / Context@188547 | 6 | 5600 % | 227 4320 % |  
exportsinModule@102009 | 7 | 880 % | 5760 % |  
processinsystem / Context@60579 | 7 | 6320 % | 4 3040 % |  
processinsystem / Context@107777 | 7 | 4320 % | 4 0880 % |  
processinsystem / Context@106047 | 7 | 7120 % | 4 1840 % |  
processinsystem / Context@10703 | 7 | 1120 % | 1120 % |  
6in(internal array)[]@322865 | 8 | 960 % | 960 % |  
processinsystem / Context@213369 | 8 | 960 % | 4320 % |  
processinsystem / Context@221567 | 8 | 2160 % | 6400 % |  
processinsystem / Context@236123 | 8 | 1600 % | 2 6560 % |  
processinsystem / Context@119065 | 8 | 5040 % | 11 8880 % |  
processinsystem / Context@116925 | 8 | 2320 % | 2 1200 % |  
processinsystem / Context@191171 | 8 | 3680 % | 10 8800 % |  
44in@2888137 | 9 | 7 6480 % | 19 7360 % |  
processinsystem / Context@213883 | 9 | 2720 % | 1 6960 % |  
processinsystem / Context@96405 | 9 | 1280 % | 6560 % |  
pnainsystem / Context@53365 | 9 | 3040 % | 4 0960 % |  
processinsystem / Context@118915 | 9 | 3600 % | 26 4160 % |  
processinsystem / Context@191517 | 9 | 3200 % | 4880 % |  
processinsystem / Context@188041 | 9 | 6720 % | 3 9840 % |  
processinsystem / Context@187989 | 9 | 4800 % | 2 6560 % |  
pnainsystem / Context@164473 | 9 | 960 % | 4320 % |  
pnainsystem / Context@164523 | 9 | 3760 % | 5 3600 % |  
25in@407329 | 10 | 4 4480 % | 10 7440 % |  
processinsystem / Context@212821 | 10 | 960 % | 3200 % |  
processinsystem / Context@237613 | 10 | 1520 % | 3 5920 % |  
processinsystem / Context@313911 | 10 | 3200 % | 1 4560 % |  
pnainsystem / Context@96225 | 10 | 640 % | 2320 % |  
processinsystem / Context@168193 | 10 | 3280 % | 2 8080 % |  
processinsystem / Context@43051 | 10 | 2400 % | 66 2240 % |  
processinsystem / Context@119653 | 10 | 2160 % | 3440 % |  
processinsystem / Context@119513 | 10 | 4240 % | 13 3920 % |  
processinsystem / Context@117197 | 10 | 1440 % | 8160 % |  
processinsystem / Context@17725 | 10 | 2880 % | 1 7600 % |  
processinsystem / Context@17659 | 10 | 1760 % | 6800 % |  
processinsystem / Context@68641 | 10 | 5760 % | 6 5680 % |  
freeProcessinsystem / Context@10795 | 10 | 1 0480 % | 20 3760 % |  
40in(internal array)[]@377579 | 11 | 4160 % | 4160 % |  
5in@2277855 | 11 | 2 2400 % | 3 8960 % |  
19in@527441 | 11 | 3 8400 % | 10 2320 % |  
24in@988155 | 11 | 3 9680 % | 10 6160 % |  
6in(internal array)[]@323153 | 12 | 720 % | 720 % |  
47in(Isolate)@17 | − | 00 % | 2080 % |  
26in(Global handles)@31 | − | 00 % | 4 3600 % |  
60insystem@168665 | 3 | 5200 % | 5200 % |  
3in@240155 | 5 | 2 0480 % | 3 5680 % |  
namespaceinsystem / Context@225603 | 5 | 640 % | 640 % |  
16in@225607 | 7 | 2 0160 % | 3 8960 % |  
6in@225607 | 7 | 2 0160 % | 3 8960 % |  
16in@225605 | 7 | 2 0800 % | 3 5120 % |  
8in@225605 | 7 | 2 0800 % | 3 5120 % |  
namespaceinsystem / Context@2949321 | 8 | 640 % | 1200 % |  
13in(internal array)[]@324253 | 9 | 1440 % | 1440 % |  
13in(internal array)[]@324267 | 9 | 2000 % | 2000 % |  
9in@240155 | 5 | 2 0480 % | 3 5680 % |  
5in@240155 | 5 | 2 0480 % | 3 5680 % |  
7in(internal array)[]@324295 | 7 | 800 % | 800 % |  
4in@225609 | 7 | 9920 % | 1 7360 % |  
3in@225607 | 7 | 2 0160 % | 3 8960 % |  
3in@225605 | 7 | 2 0800 % | 3 5120 %

I appreciate the time you've provided so far and any pointers you can provide me at this time.

bmacnaughton · 2019-06-27T17:30:29Z

for anyone following along that has as much to learn here as me: https://developers.google.com/web/tools/chrome-devtools/memory-problems/memory-101
https://developers.google.com/web/tools/chrome-devtools/memory-problems/heap-snapshots
https://www.youtube.com/watch?v=L3ugr9BJqIs
https://www.youtube.com/watch?v=hliOMEQRqf8

bmacnaughton · 2019-06-28T12:11:35Z

@addaleax - one video I watched said that I have no control over the parenthesized items in the memory display. If that's correct then the (system) category is in the control of (I suspect) v8 only.

addaleax · 2019-06-28T12:33:14Z

@bmacnaughton So … that’s partially true – these objects are still kept alive by other objects, and if those other objects are controlled by your code, then you do have some control over that.

Also, the fact that these objects are being controlled by V8 would align with this pointing to a bug in V8… which is a good reason to figure out where the leak comes from ;)

bmacnaughton · 2019-06-28T12:34:02Z

This is last commit to src/async_wrap.cc prior to the nightly build where the memory leak started (2018-09-08). I'm not sure how to tell what went into a given nightly, but there's definitely some activity in the async_wrap area here.

commit da8641f3b46f41be11ba932f4228a36bfee9fd1c
Author: Daniel Bevenius <daniel.bevenius@gmail.com>
Date:   Tue Aug 21 08:54:02 2018 +0200

    src: move process.binding('async_wrap') internal
    
    This commit makes the async_wrap builtin an internal builtin, and
    changes usage of the builtin from using process.binding('async_wrap')
    to use internalBinding instead.
    
    Refs: https://github.com/nodejs/node/issues/22160
    
    PR-URL: https://github.com/nodejs/node/pull/22469
    Refs: https://github.com/nodejs/node/issues/22160
    Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
    Reviewed-By: Anna Henningsen <anna@addaleax.net>
    Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
    Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
    Reviewed-By: James M Snell <jasnell@gmail.com>
    Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>

At this point tbh, I'm not sure I would recognize a smoking gun if I saw one. I do see that a number of the array elements are my code's async_hooks/cls-hooked context.

bmacnaughton · 2019-06-28T13:09:04Z

@addaleax - I'm sorry that I can't make faster progress on this. I am a bit like a dog watching TV. The things I'm looking at sound like things I've heard but the form is abstracted in a way that makes it hard for me to grasp.

addaleax · 2019-06-28T14:35:44Z

@bmacnaughton Would you be able to share a reproduction, or maybe even “just” a heap dump for inspection?

da8641f seems like an unlikely candidate for causing a memory leak, and also isn’t part of the commit range that we would primarily look into (according to the different nightlies).

It might still be good to know which objects in your application keep the newly added (system) objects (are they all Maps?) alive, if that’s something you can figure out, and how they relate to each other.

I'm sorry that I can't make faster progress on this.

Well, I’ll be without access to a computer for the next 48 hours, so as far as I am concerned, take your time. I’m pretty interested in finding out where this is coming from, though :)

addaleax · 2019-06-28T14:37:42Z

And in the worst case, there’s always the brute-force option of bisecting against the V8 commit range corresponding to the update; it’s going to require building Node.js 11 times from scratch and seeing whether the leak also occurs there, but it would likely lead to the commit that caused this change.

bmacnaughton · 2019-06-28T16:06:32Z

I'm happy to share all the heap dumps you would be willing to look at. This is from an internal benchmark and has no customer data in it. I would prefer to send a dump, or put it somewhere for you specifically. I've sent a google docs share to you. Please advise if you have any issues (I used an organization account - not sure how that handles shares).

I've done the node bisecting twice now (once to find where the default heap limits changed and once to find where the leak started - I thought those were the same problem initially) but they didn't involve building node from scratch. I did that a couple times at Vancouver Node+JS Interactive but haven't since.

It's a little harder to share a repro - I have not been able to reproduce this outside of a docker-compose environment. Setting up access to the EC2 environment and going through the process for that is probably best left as a last resort. If I could repro on my dev machine then I would be able to share it.

It might still be good to know which objects in your application keep the newly added (system) objects (are they all Maps?) alive, if that’s something you can figure out, and how they relate to each other.

I'll be looking at this while you're away. I appreciate your hints.

bmacnaughton · 2019-06-29T12:40:56Z

t might still be good to know which objects in your application keep the newly added (system) objects (are they all Maps?) alive, if that’s something you can figure out, and how they relate to each other.

Yes, they are all maps.

One thing that makes no sense to me is this:

There are 784 elements in 1 :: system @2499219 each of which is no larger than 80 bytes. How can the shallow size be 110 761 200?

The majority of the 784 elements appear to be cls-hooked context which is created/destroyed based on async_hooks callbacks.

bmacnaughton · 2019-07-01T15:03:54Z

I'm not sure how to make sense of this either. The bulk of memory is not labeled at all - it can't just be C/C++ code as the area is bigger than the initial memory footprint of the whole program.

@addaleax - If you are able to take a look at the snapshots and they are clear to you I would very much like to understand the process you went through to find the error (I wish I could just watch the process). If they don't help then I will give bisecting the commits in v11.0.0-nightly20180908922a1b03b6 and using each to do a build. That will require some fiddling around with the test suite as well as re-familiarizing myself with building node, so I'll wait until I hear news from you.

bmacnaughton · 2019-07-02T11:58:25Z

@addaleax - many of the objects being retained in system / Map elements are created by cls-hooked, a wrapper around async_hooks. But looking at retainers serializer (and serializeInto, which I don't see in the node source) are holding on to it; serializer is part of v8 so maybe it's related to the issue in some way?

In the meantime, I've built node again and can start trying to bisect v8 releases if you can provide a little guidance as to how to go about that.

bmacnaughton · 2019-07-02T13:45:44Z

@addaleax - clarifying: your suggestion of building each specific v8 commit is to use each of the commits in e917a23...922a1b0 to build node and then test that particular binary in our test harness, right?

addaleax · 2019-07-02T14:22:36Z

Hi again @bmacnaughton!

There are 784 elements in 1 :: system @2499219 each of which is no larger than 80 bytes. How can the shallow size be 110 761 200?

Yes. This doesn’t make sense to me either, and, sadly, points to some relevant information simply not ending up in the heap snapshot. I feel like at this point it would make sense for somebody from @nodejs/v8 to take a closer look.

and serializeInto, which I don't see in the node source

It’s in the V8 source code, for the versions of V8 that come into question here.

clarifying: your suggestion of building each specific v8 commit is to use each of the commits in e917a23...922a1b0 to build node and then test that particular binary in our test harness, right?

It’s unfortunately not that easy, because we bump V8 versions as a bulk commit, not by picking individual commits.

Bisecting against V8 versions is a painful process that’s best performed on a machine with a lot of CPU power, but what I typically do if I have to do this is:

Use a checkout of the node repository just before or after the V8 update commits in question
Move deps/v8 to deps/v8~
Get a clone of the V8 repository, and bisect the version range in question (in this case, v8/v8@131866f... v8/v8@d7b61ab)
Create a symlink from the v8 checkout to deps/v8 in the Node.js tree (not sure if this works with current versions of Node.js, but for the older ones it should work)
For each bisecting step, compile Node.js and test whether the issue occurs with it

It’s not a lot of manual work, but it takes time because it usually means recompiling all of Node.js from scratch a dozen times. It might be easier to wait for an opinion from somebody more familiar with V8 internals first…

targos · 2019-07-02T15:23:47Z

There is a way to narrow the range of commits before bisecting V8: go to https://nodejs.org/download/v8-canary/ and try with canary versions. you can look at this table to know which V8 version is in which canary version.

bmacnaughton · 2019-09-13T14:36:39Z

I've been trying to reproduce the leak using the simple application @laino created (#28787) but have had no luck. i don't know how the debugger is using async_hooks so it's possible that i'm just doing something differently.

here's the simple program that i've run to try to reproduce the leak using #28787's basic logic - but it does not reproduce the memory leak shown in the graphs above. it's clearly too simple (or i've made a bad assumption somewhere) to reproduce either my app's or the debugger's use of async_hooks. i know that the debugger isn't using cls-hooked so that's one big difference but my application is using it so my thought was that the program could check two things at once.

i'm running the app in a docker container on node. my next step is to create a more complex application that adds more logic within the main() function to see if i can generate a memory leak. if anyone has ideas please let me know. this program is really simple because it was my hope that reproducing @laino's memory leak would be the fastest way to get there.

'use strict';

let message;
let hooks;
let runner;
let context;
if (process.argv.length > 2) {
  hooks = require('cls-hooked');
  message = 'with cls-hooked';
  context = hooks.createNamespace('leak-investigation');
  runner = context.runAndReturn.bind(context);
} else {
  message = 'without cls-hooked';
  runner = function (fn) {return fn()};
}
console.log('node', process.version, message);


let counter = 0;
async function noop() {
  return new Promise(resolve => setTimeout(resolve, 1));
};

const initial = process.memoryUsage();
let lastRss = initial.rss;

let lastWasDot = false;
const id = setInterval(function () {
  const current = process.memoryUsage();
  if (current.rss <= lastRss) {
    process.stdout.write('.');
    lastWasDot = true;
  } else {
    if (lastWasDot) {
      process.stdout.write('\n');
    }
    console.log(`[${counter}] rss ${current.rss}`);
    lastWasDot = false;
    lastRss = current.rss;
  }
}, 10000);

async function main () {
  while (true) {
    if (context) {
      if (context.get('something') !== 'anything') {
        console.log('oops, something is missing');
      }
    }
    counter += 1;
    await noop();
  }
}

async function mainWrapper () {
  if (context) {
    context.set('something', 'anything');
  }
  return main();
}

runner(mainWrapper).then(console.log('started'));

mhdawson · 2019-09-30T22:33:28Z

Maybe a long shot but would this be related at all #28988, seems like @gabrielschulhof has found that there is a fix for that issue in a later v8 version.

bmacnaughton · 2019-09-30T22:46:59Z

thanks for connecting the possibly related dots!

any idea when that version of v8 will make it into node? i'm happy to test against my big, ugly, messy, hard-to-package benchmark where i see the issue come up. i've been working on reproducing it in something simple but haven't had any luck. in the meantime i'll keep trying to repro in something packageable.

mhdawson · 2019-10-03T21:37:54Z

I'm not sure yet but I'm hoping @gabrielf will comment in #28988 (comment)

bmacnaughton · 2019-10-25T00:21:26Z

i have isolated the memory leak to using our agent on an application running mongodb-core v2. if i update mongodb-core to v3 there is no memory leak.

i know that's not exactly a reproducible case but i wanted to provide an update. maybe someone familiar with mongodb 2 => 3 changes will have a lightbulb go on.

i am able to work around this now, albeit by removing functionality for the problematic combination. disabling our instrumentation for the mongodb-core package removes the interplay of async_hooks, mongodb-core, and versions of node since v11.0.0-nightly20180908922a1b03b6 so it's clear that it's not mongodb-core by itself.

apparently there is some activity in mongodb-core regarding async_hooks: https://jira.mongodb.org/browse/NODE-1603

amit777 · 2019-11-07T03:04:27Z

@bmacnaughton are you using the mongodb-core driver directly or through abstraction layer such as Mongoose?

bmacnaughton · 2019-11-07T13:21:10Z

there are two levels here - we instrument mongodb-core. so mongoose looks like

mongoose (patch) => mongodb (prior to v3.3.0) => mongodb-core (instrument)
mongoose (patch) => mongodb (v3.3.0+ instrument)

while we instrument mongodb-core we do patch some mongoose function in order to maintain context across queues and the like, e.g., addQueue, _wrapCallback, etc. by binding those functions to a async_hooks context. we don't instrument mongoose-level transactions or functions.

likev · 2019-12-05T02:29:41Z

I have written a simple code to reproduce the problem and submit an issue to google https://bugs.chromium.org/p/chromium/issues/detail?id=1030569

alex3d · 2019-12-13T10:06:16Z

@likev thanks for reproduction code! I've created v8 issue: https://bugs.chromium.org/p/v8/issues/detail?id=10057

It seems we observe this memory leak issue in TypeGraphQL(?) code in our backend.

Slightly modified repro code (node 12, 13 will OOM after a few minutes):

const leakMemory = () => {
  class Leak{
    constructor(){this.unused();}
    unused(){}
  }
  new Leak();
};

setInterval(function(){
  for(let i = 0; i < 1000000; i++) {
    leakMemory();
  }
  console.log(Object.entries(process.memoryUsage()).map(([k,v])=>`${k}: ${(v/1024/1024).toFixed()}`).join(' '));
}, 1);

likev · 2019-12-18T00:53:05Z

V8 8.1.39 with the commit v8/v8@d3a1a5b have fixed it.

ericrange · 2020-01-23T12:06:25Z

@alex3d this memory leak is still present in the latest LTS branch!! (12-alpine docker image)

omg... we had so much trouble in the past...

changed back to node 10, fixed problems.

ericrange · 2020-01-23T12:08:36Z

also in version 13, the problem seems to be fixed

ggoodman · 2020-01-23T17:33:29Z

Looks like the fix in v8 was cherry-picked in #31005.

ruiaraujo · 2020-01-23T20:50:28Z

v12.15.0 to be released soon will include the fix.

bencripps · 2020-01-28T14:17:14Z

Is there an ETA on publishing 12.15.0? We're running into this now, and it would be awesome if this was published soon!

alex3d · 2020-01-28T14:49:15Z

@bencripps Any time now, it is scheduled for 2020-01-28 https://github.com/nodejs/node/blob/v12.15.0-proposal/doc/changelogs/CHANGELOG_V12.md#2020-01-28-version-12150-erbium-lts-targos

richardlau · 2020-01-28T19:13:23Z

We're going to delay 12.15.0 until a week after the upcoming security releases: nodejs/Release#494 (comment)

woodb · 2020-02-12T21:12:43Z

Has this issue been resolved in Node >=12.15.0? Can we close this issue?

ruiaraujo · 2020-02-12T21:15:02Z

Node 12.16.0 includes the fix. This issue can be closed.

bmacnaughton · 2020-02-14T19:44:15Z

the leak that i was seeing appears to be closed as well, so it appears to be the same thing.

bmacnaughton · 2020-02-14T19:45:32Z

@likev - how is it that you isolated the leak to the precise test case? i ask because i only saw it in a large system and wasn't able to isolate it.

likev · 2020-02-18T10:39:42Z

@likev - how is it that you isolated the leak to the precise test case? i ask because i only saw it in a large system and wasn't able to isolate it.

ohh, I observed the leak in my own small project and isolated the leak is surely rather hard with lucky.

bencripps · 2020-02-26T16:16:20Z

Looks like the leak has been resolved in our application as well. Upon deploying 12.16.0 memory growth has remained steady for the past 2 weeks.

achille-roussel · 2020-04-14T18:35:49Z

Hello, I believe this issue may still exist in one form or another in Node 12.16.1. We've observed an increasing memory usage after upgrading a production system to Node 12. The reason I believe it may be related to this issue is we're seeing the exact same 110761200 bytes of memory used in system objects.

We also attempted an upgrade to Node 13, but we are still seeing a similar leaky behavior:

We don't have a minimal repro unfortunately because we are only able to exhibit this behavior on a fairly large application. It also seems to be related to the volume of work/concurrency in the program as we are unable to reproduce in staging environments.

Also likely causing googleapis/cloud-debug-nodejs#811.

addaleax added memory Issues and PRs related to the memory management or memory footprint. v8 engine Issues and PRs related to the V8 dependency. labels Jun 25, 2019

bmacnaughton mentioned this issue Jul 2, 2019

Any hints at tracking memory usage/leaks in node 12 when using node-addon-api? nodejs/node-addon-api#498

Closed

Marsup mentioned this issue Oct 25, 2019

Memory Leak MichalLytek/type-graphql#449

Closed

mike-marcacci mentioned this issue Jan 21, 2020

Growing performance penalty over time googleapis/cloud-debug-nodejs#814

Closed

DominicKramer mentioned this issue Jan 23, 2020

Potential Memory Leak googleapis/cloud-debug-nodejs#811

Closed

jcchavezs mentioned this issue Jan 31, 2020

feat: adds support for cls-hooked. openzipkin/zipkin-js#476

Merged

addaleax closed this as completed Feb 12, 2020

Supinic mentioned this issue Jun 17, 2021

Running a process with DevTools debugger attached leaks memory #39063

Open

memory leak starting with node v11.0.0-nightly20180908922a1b03b6 #28420

memory leak starting with node v11.0.0-nightly20180908922a1b03b6 #28420

Comments

bmacnaughton commented Jun 25, 2019 • edited Loading

addaleax commented Jun 25, 2019

bmacnaughton commented Jun 25, 2019 • edited Loading

bmacnaughton commented Jun 26, 2019 • edited Loading

addaleax commented Jun 26, 2019

bmacnaughton commented Jun 26, 2019 • edited Loading

bmacnaughton commented Jun 26, 2019

mayinghan commented Jun 27, 2019

bmacnaughton commented Jun 27, 2019 • edited Loading

bmacnaughton commented Jun 27, 2019 • edited Loading

bmacnaughton commented Jun 27, 2019

addaleax commented Jun 27, 2019

bmacnaughton commented Jun 27, 2019

bmacnaughton commented Jun 27, 2019

bmacnaughton commented Jun 27, 2019 • edited Loading

bmacnaughton commented Jun 28, 2019

addaleax commented Jun 28, 2019

bmacnaughton commented Jun 28, 2019 • edited Loading

bmacnaughton commented Jun 28, 2019

addaleax commented Jun 28, 2019

addaleax commented Jun 28, 2019

bmacnaughton commented Jun 28, 2019

bmacnaughton commented Jun 29, 2019

bmacnaughton commented Jul 1, 2019 • edited Loading

bmacnaughton commented Jul 2, 2019

bmacnaughton commented Jul 2, 2019

addaleax commented Jul 2, 2019 • edited Loading

targos commented Jul 2, 2019

bmacnaughton commented Sep 13, 2019

mhdawson commented Sep 30, 2019

bmacnaughton commented Sep 30, 2019

mhdawson commented Oct 3, 2019

bmacnaughton commented Oct 25, 2019 • edited Loading

amit777 commented Nov 7, 2019

bmacnaughton commented Nov 7, 2019 • edited Loading

likev commented Dec 5, 2019 • edited Loading

alex3d commented Dec 13, 2019

likev commented Dec 18, 2019

ericrange commented Jan 23, 2020 • edited Loading

ericrange commented Jan 23, 2020

ggoodman commented Jan 23, 2020

ruiaraujo commented Jan 23, 2020

bencripps commented Jan 28, 2020

alex3d commented Jan 28, 2020

richardlau commented Jan 28, 2020

woodb commented Feb 12, 2020 • edited Loading

ruiaraujo commented Feb 12, 2020

bmacnaughton commented Feb 14, 2020

bmacnaughton commented Feb 14, 2020

likev commented Feb 18, 2020

bencripps commented Feb 26, 2020

achille-roussel commented Apr 14, 2020

bmacnaughton commented Jun 25, 2019 •

edited

Loading

bmacnaughton commented Jun 25, 2019 •

edited

Loading

bmacnaughton commented Jun 26, 2019 •

edited

Loading

bmacnaughton commented Jun 26, 2019 •

edited

Loading

bmacnaughton commented Jun 27, 2019 •

edited

Loading

bmacnaughton commented Jun 27, 2019 •

edited

Loading

bmacnaughton commented Jun 27, 2019 •

edited

Loading

bmacnaughton commented Jun 28, 2019 •

edited

Loading

bmacnaughton commented Jul 1, 2019 •

edited

Loading

addaleax commented Jul 2, 2019 •

edited

Loading

bmacnaughton commented Oct 25, 2019 •

edited

Loading

bmacnaughton commented Nov 7, 2019 •

edited

Loading

likev commented Dec 5, 2019 •

edited

Loading

ericrange commented Jan 23, 2020 •

edited

Loading

woodb commented Feb 12, 2020 •

edited

Loading