Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-weak causes JS code preemption due to MakeCallback #35

Closed
metamatt opened this issue Oct 11, 2014 · 6 comments · Fixed by #36
Closed

node-weak causes JS code preemption due to MakeCallback #35

metamatt opened this issue Oct 11, 2014 · 6 comments · Fixed by #36

Comments

@metamatt
Copy link

(Pre-disclaimer, apologies if anything I say here is incorrect or overly alarmist. I think this is a real problem but I don't pretend to understand everything that's going on here, especially the Node and V8 codebase I'm not very familiar with.)

I've seen some problems in my NodeJS app using node-weak which seemed like they could only be explained by my code getting preempted by other JS code that changed state out from under it. I believe I've eventually traced this to the use of MakeCallback in Node's native code (src/node.cc); when it's called, it calls not only the callback you told it to call but also all registered nextTick callbacks. And node-weak invokes MakeCallback from GC, which basically means, from anywhere.

Here's a demo of the problem.

I started a thread about this on the NodeJS mailing list, too.

To the best of my evolving understanding, either

  • MakeCallback should be taught to only invoke process._tickCallback when it was invoked directly from the libuv event loop, not if it was invoked by something like node-weak in a nested execution context
  • MakeCallback is a dangerous API to expose to extension code directly, or at least needs to be better documented about when to use it and when to avoid it
  • node-weak, when invoking the cleanup callback (and its "global" callback which I don't really understand), should invoke those callbacks directly instead of via MakeCallback

I don't mean to malign node-weak; it's very cool and very useful functionality, but this side effect of preempting code and executing pretty much arbitrary any other code during GC is quite alarming.

@metamatt
Copy link
Author

Hey @TooTallNate, what's your protocol for accepting contributions to this repo, how do I build and test? I have a fix for this I'd like to propose.

My first assumption was "check out the source repo, run npm install --dev, run the tests" -- that npm install --dev invocation ran for a long time downloading way more stuff than I'd expect including several recursively nested copies of weak (how's weak@0.3.3 install /Users/magi/src/open/node-weak/node_modules/mocha/node_modules/should/node_modules/gulp-util/node_modules/through2/node_modules/brtapsauce/node_modules/sauce-tap-runner/node_modules/wd-tap-runner/node_modules/wd-tap/node_modules/st/node_modules/async-cache/node_modules/lru-cache/node_modules/weak for a fun path?!), spit out a bunch of errors about Failed resolving git HEAD (git://github.com/embarkmobile/karma-sauce-launcher.git) and some other unmet or broken dependencies, then hung npm.

So then I just did npm install; npm install mocha; node_modules/.bin/mocha --expose-gc test and the tests pass.

I see the appveyor.yml which I assume might automate some of this but I don't know anything about appveyor.

metamatt added a commit to metamatt/node-weak that referenced this issue Oct 14, 2014
Nan's NanCallback::Call goes through Node's MakeCallback, which is
designed to be used only by event callbacks (i.e. functions called
from the libuv event loop), not invoked synchronously from a call
stack where JS code was already running. This is because when
MakeCallback returns it assumes it's the end of the current "tick", so
it calls not only the callback you told it to, but also all tick
callbacks registered for the current tick.

Since node-weak's TargetCallback is invoked from GC, which could be
invoked from anywhere at any time, the code it calls into (the
callback attached to each weakref object) needs to be designed and
written carefully, essentially to the standards of signal or interrupt
handlers, since they preempt any JS code that was already running when
GC kicked in.

This is fine as long as the weakref callbacks are written to this
standard, but it also means TargetCallback needs to avoid accidentally
calling other callbacks that weren't written to this standard.

The fix is to call the callback directly, instead of via MakeCallback.
metamatt added a commit to metamatt/node-weak that referenced this issue Oct 14, 2014
Nan's NanCallback::Call goes through Node's MakeCallback, which is
designed to be used only by event callbacks (i.e. functions called
from the libuv event loop), not invoked synchronously from a call
stack where JS code was already running. This is because when
MakeCallback returns it assumes it's the end of the current "tick", so
it calls not only the callback you told it to, but also all tick
callbacks registered for the current tick.

Since node-weak's TargetCallback is invoked from GC, which could be
invoked from anywhere at any time, the code it calls into (the
callback attached to each weakref object) needs to be designed and
written carefully, essentially to the standards of signal or interrupt
handlers, since they preempt any JS code that was already running when
GC kicked in.

This is fine as long as the weakref callbacks are written to this
standard, but it also means TargetCallback needs to avoid accidentally
calling other callbacks that weren't written to this standard.

The fix is to call the callback directly, instead of via MakeCallback.

Fixes TooTallNate#35.
@TooTallNate
Copy link
Owner

Nice detective work here :)

metamatt added a commit to metamatt/node-weak that referenced this issue Oct 14, 2014
Nan's NanCallback::Call goes through Node's MakeCallback, which is
designed to be used only by event callbacks (i.e. functions called
from the libuv event loop), not invoked synchronously from a call
stack where JS code was already running. This is because when
MakeCallback returns it assumes it's the end of the current "tick", so
it calls not only the callback you told it to, but also all tick
callbacks registered for the current tick.

Since node-weak's TargetCallback is invoked from GC, which could be
invoked from anywhere at any time, the code it calls into (the
callback attached to each weakref object) needs to be designed and
written carefully, essentially to the standards of signal or interrupt
handlers, since they preempt any JS code that was already running when
GC kicked in.

This is fine as long as the weakref callbacks are written to this
standard, but it also means TargetCallback needs to avoid accidentally
calling other callbacks that weren't written to this standard.

The fix is to call the callback directly, instead of via MakeCallback.

Fixes TooTallNate#35.
@achingbrain
Copy link

I think I'm seeing something similar on io.js 1.2.0 using node-weak via dnode. My process runs for a bit then exits with a SIGABRT signal and dumps core. I upgraded node-weak to use nan 1.6.2 but it didn't help. Backtrace from the core dump (with nan 1.6.2):

(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff892f2286 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff892f2286 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff9072242f libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff8dcddbf3 libsystem_c.dylib`__abort + 145
    frame #3: 0x00007fff8dcddb62 libsystem_c.dylib`abort + 144
    frame #4: 0x00007fff8dca5c39 libsystem_c.dylib`__assert_rtn + 321
    frame #5: 0x00000001005c1e30 node`node::MakeCallback(node::Environment*, v8::Handle<v8::Value>, v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) + 1090
    frame #6: 0x00000001005c2157 node`node::MakeCallback(v8::Isolate*, v8::Handle<v8::Object>, v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) + 103
    frame #7: 0x00000001016e863c weakref.node`NanCallback::Call(this=<unavailable>, argc=2, argv=0x00007fff5fbf37e0) const + 124 at nan.h:1411
    frame #8: 0x00000001016e830d weakref.node`void (anonymous namespace)::TargetCallback<v8::Object, (data=<unavailable>)::proxy_container>(_NanWeakCallbackData<v8::Object, (anonymous namespace)::proxy_container> const&) + 180 at weakref.cc:153
    frame #9: 0x00000001016e83c3 weakref.node`void _NanWeakCallbackDispatcher<v8::Object, (anonymous namespace)::proxy_container>(data=<unavailable>)::proxy_container> > const&) + 21 at nan.h:468
    frame #10: 0x0000000100299297 node`v8::internal::GlobalHandles::Node::PostGarbageCollectionProcessing(v8::internal::Isolate*) + 247
    frame #11: 0x0000000100297db7 node`v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector) + 103
    frame #12: 0x00000001002b7708 node`v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 1384
    frame #13: 0x00000001002b7079 node`v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) + 505
    frame #14: 0x00000001002b6db9 node`v8::internal::Heap::CollectAllGarbage(int, char const*, v8::GCCallbackFlags) + 137
    frame #15: 0x0000000100263d03 node`v8::internal::StackGuard::HandleInterrupts() + 115
    frame #16: 0x00000001004af1f3 node`v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) + 51
    frame #17: 0x00000686038060bb
    frame #18: 0x0000068604868e1f
... snip
    frame #201: 0x00000686045b12c7
    frame #204: 0x000006860381dff1
    frame #205: 0x0000000100261ee8 node`v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 568
    frame #206: 0x00000001001268f1 node`v8::Function::Call(v8::Handle<v8::Value>, int, v8::Handle<v8::Value>*) + 193
    frame #207: 0x00000001005b91a1 node`node::AsyncWrap::MakeCallback(v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) + 541
    frame #208: 0x00000001005dd288 node`node::After(uv_fs_s*) + 578
    frame #209: 0x000000010062a3e2 node`uv__work_done + 175
    frame #210: 0x000000010062bcd2 node`uv__async_event + 62
    frame #211: 0x000000010062be45 node`uv__async_io + 136
    frame #212: 0x0000000100639a6d node`uv__io_poll + 1581
    frame #213: 0x000000010062c2bb node`uv_run + 276
    frame #214: 0x00000001005c8a0c node`node::Start(int, char**) + 420
    frame #215: 0x0000000100000bb4 node`start + 52

@achingbrain
Copy link

FWIW I don't see the issue after applying the patch in #36.

On that PR @rvagg says:

You're just going to get into trouble with 0.11,0.12 support here aren't you?

For me it compiles for node 0.10, 0.12 and io.js 1.1 and 1.2 with nan 1.6.2. Not much of a comprehensive test but a good starter.

@metamatt
Copy link
Author

Yeah, the issue is real and will affect all nodejs versions and iojs versions, and it's not dependent on the version of nan. (While I did request a change to nan, which if merged can make my patch to node-weak look nicer, the change to nan alone will not be enough; this needs a patch to node-weak.)

@metamatt
Copy link
Author

Correction: the issue is no longer real. Excellent. Thanks @TooTallNate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants