-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tracking issue] src: clean up resources on Environment teardown #19422
Comments
Issue 1. SIGSEGV in
Please let me know what to look for. |
here is the debug dump - basically we got a bad piece of memory when iterating over (lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 76788 exited with status = 9 (0x00000009)
Process 76851 launched: '../node_g' (x86_64)
recycling node!
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001019ef2e9 libnode.62.dylib`node::Environment::RunCleanup(this=0x00003782c44822e1) at env.cc:315
312 }
313
314 void Environment::RunCleanup() {
-> 315 CleanupHandles();
316
317 while (!cleanup_hooks_.empty()) {
318 // Copy into a vector, since we can't sort an unordered_set in-place.
Target 0: (node_g) stopped.
(lldb) step
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
frame #0: 0x00000001019ee3f9 libnode.62.dylib`node::Environment::CleanupHandles(this=0x00003782c44822e1) at env.cc:233
230 }
231
232 void Environment::CleanupHandles() {
-> 233 for (ReqWrap<uv_req_t>* request : req_wrap_queue_)
234 request->Cancel();
235
236 for (HandleWrap* handle : handle_wrap_queue_)
Target 0: (node_g) stopped.
(lldb) p req_wrap_queue_
(node::Environment::ReqWrapQueue) $28 = {
head_ = {
prev_ = 0xe100003782c44822
next_ = 0x5100003782c44822
}
}
(lldb) n
Process 76851 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x00000001019ee46c libnode.62.dylib`node::Environment::CleanupHandles(this=0x00003782c44822e1) at env.cc:234
231
232 void Environment::CleanupHandles() {
233 for (ReqWrap<uv_req_t>* request : req_wrap_queue_)
-> 234 request->Cancel();
235
236 for (HandleWrap* handle : handle_wrap_queue_)
237 handle->Close();
Target 0: (node_g) stopped.
(lldb) p request
(node::ReqWrap<uv_req_s> *) $29 = 0x5100003782c447f2
(lldb) me read 0x5100003782c447f2
error: memory read failed for 0x5100003782c44600
(lldb) |
Issue 2. v8::Platform pointer not recycled This is currently shielded by issue 1. Looks like the #./foo test.js
starting 1th iteration.
recycling node!
recycling complete!
completed 1th iteration.
starting 2th iteration.
#
# Fatal error in ../deps/v8/src/v8.cc, line 96
# Check failed: !platform_.
#
Illegal instruction: 4
#lldb ./foo
(lldb) target create "./foo"
Current executable set to './foo' (x86_64).
(lldb) r test.js
Process 2067 launched: './foo' (x86_64)
starting 1th iteration.
recycling node!
recycling complete!
completed 1th iteration.
starting 2th iteration.
#
# Fatal error in ../deps/v8/src/v8.cc, line 96
# Check failed: !platform_.
#
Process 2067 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x0000000101d969e1 libnode.62.dylib`v8::base::OS::Abort() at platform-posix.cc:372
369
370 void OS::Abort() {
371 if (g_hard_abort) {
-> 372 V8_IMMEDIATE_CRASH();
373 }
374 // Redirect to std abort to signal abnormal program termination.
375 abort();
Target 0: (foo) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
* frame #0: 0x0000000101d969e1 libnode.62.dylib`v8::base::OS::Abort() at platform-posix.cc:372
frame #1: 0x0000000101d84a57 libnode.62.dylib`V8_Fatal(file="../deps/v8/src/v8.cc", line=96, format="Check failed: %s.") at logging.cc:136
frame #2: 0x0000000101505e4d libnode.62.dylib`v8::internal::V8::InitializePlatform(platform=0x0000000105300070) at v8.cc:96
frame #3: 0x0000000100362655 libnode.62.dylib`v8::V8::InitializePlatform(platform=0x0000000105300070) at api.cc:6131
frame #4: 0x0000000101a34673 libnode.62.dylib`node::$_0::Initialize(this=0x0000000102dec4f0, thread_pool_size=4) at node.cc:290
frame #5: 0x0000000101a34039 libnode.62.dylib`node::Start(argc=2, argv=0x0000000105300000) at node.cc:4631
frame #6: 0x0000000100000eca foo`main(argc=2, argv=0x00007fff5fbffaf0) at foo.cc:14 [opt]
frame #7: 0x0000000100000e7c foo`start + 52
(lldb) f 2
frame #2: 0x0000000101505e4d libnode.62.dylib`v8::internal::V8::InitializePlatform(platform=0x0000000105300070) at v8.cc:96
93
94
95 void V8::InitializePlatform(v8::Platform* platform) {
-> 96 CHECK(!platform_);
97 CHECK(platform);
98 platform_ = platform;
99 v8::base::SetPrintStackTrace(platform_->GetStackTracePrinter());
(lldb) p platform_
(v8::Platform *) $0 = 0x0000000105200170 |
@gireeshpunathil It’s hard to tell you where to look without knowing how the code looks like… Regarding the second item, it sounds like Node’s trying to initialize the V8 platform again? That shouldn’t happen anyway, it’s basically a per-process thing, not per-Environment. |
@addaleax - code is trivial, a native addon that just calls void Terminate(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
fprintf(stderr, "recycling node!\n");
node::Environment *env = static_cast<node::Environment*>(
isolate->GetCurrentContext()->GetAlignedPointerFromEmbedderData(33));
env->RunCleanup();
fprintf(stderr, "recycling complete!\n");
} and the addon is invoked as: setTimeout(() => {
require('./term').t()
}, 1000) For the second: I just want to point out that embedders may invoke node through a plurality of means. In the absence of a clear guideline, I could call node::Start, I could mimic a portion of what node::Start() does and then invoke node::Init() so on and so forth.
|
Shouldn’t that be 32?
I would like to get there, yes, but I think it’s out of scope for #19377 – there’s a lot more to be done to have a proper API. :/ Currently, embedders would basically have to re-implement the
Let’s not introduce any more pseudo-global state to Node. If we want to have this, We should provide a way to get the current |
thanks @addaleax - with For the second: without the ability to (and test) |
@gireeshpunathil Platform management is the embedder’s responsibility, neither Node nor an addon can do anything about that.
Could you describe what you mean by “recycle” and “tear down”? |
teardown: bring down the machine state (including the resources) prior to node::Start |
I mean, for that we’d probably want to just create another environment instance, right? |
I think it sounds like what you want isn’t really a way to call |
@addaleax - I believe this is where our conceptual difference:
If I understand correctly, effectively Environment teardown === node tear down, in the most common use case for node (single-env) right? So for the embedders that does not have sufficient insight and control over node internals, the easy thing to perform is to recycle everything and start with fresh configuration. Example: An embedder offloaded some I/O bound workload to node under normal conditions. After monitoring the resource usage and the throughput, it would want to re-spin node with modified input (configuration and arguments.) So rather thean working with Environment, the more easy thing would be to work at the platform level itself. What do you think? |
We should provide embedders with ways to achieve both, I think. But generally, Node maintains only a really small portion of its state in a global manner instead of per-Environment, so it might be good to know what kind of options you'd have in mind for this?
It's come up in other PRs that we should probably be separating between options that are per-Environment and options that are per-Isolate (or even global). All of these exist, and we don't really have a consistent story around those. (I've been sketching some ideas around that, but that's still very much an early work in progress.) So, if you're concerned about options, it probably depends on which you think an embedder might tune for re-use...? |
@addaleax - thanks for the clarification / guidance.
...
When I did a debug walk-through, I see that for an embedder to manage this, it will end up doing:
or copy the code from node. I believe this will be too much to expect from an embedder because:
Ideally, an embedder will be interested in:
I understand the scope of #19377 does not cover these, but my original issue #19365 does.
thanks for that, so at the moment, to satisfy #19365 minimally, is it possible to:
|
this should have been closed long back; as its stated purpose is achieved. |
This is to track #19377 exclusively, gets resolved when that PR lands. I am expecting a few runtime errors to come up, and opening this to leave the PR discussions free from debugging discussions.
I have developed a small test case that invokes
node::Environment::RunCleanup
at some arbitrary program control point. To start with, I have this code:#cat test.js
where
term
is small module that wraps a methodt
throughterm::Terminate
with the sole purpose of callingnode::Environment::RunCleanup()
. I will come up with some scenarios to invoke this through a variety of means that mimics launch and re-lauch of node in different possible ways - goal is to have good degree of resilience and stability for emedders./cc @addaleax
The text was updated successfully, but these errors were encountered: