Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: convert to n-api #1

Merged
merged 3 commits into from
Aug 19, 2019
Merged

refactor: convert to n-api #1

merged 3 commits into from
Aug 19, 2019

Conversation

deepak1556
Copy link
Contributor

Required for microsoft/vscode#75802

Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepak1556 Thank you for looking into this! I have realised that this project lacks a test for what it does so I've added a test on master and have merged it to this branch. Unfortunately, the module does not work as it should after your proposed changes.

Here is how it works on master:

Alexs-MBP:node-native-watchdog alex$ npm run test

> native-watchdog@1.0.0 test /Users/alex/src/node-native-watchdog
> node test/test.js

reference pid - 10044, bad child pid - 10045
[10044] - reference process running...
[10045] - bad process entering endless loop...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[10044] - reference process exiting now!
[WAIT] - The reference process has exited 0s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 1s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 2s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 3s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 4s ago, the bad child should exit soon...
[TEST PASSED] - The bad child process has exited!

And here is how it no longer works:

Alexs-MBP:node-native-watchdog alex$ npm run test

> native-watchdog@1.0.0 test /Users/alex/src/node-native-watchdog
> node test/test.js

reference pid - 10408, bad child pid - 10410
[10408] - reference process running...
[10410] - bad process entering endless loop...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[10408] - reference process exiting now!
[WAIT] - The reference process has exited 0s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 1s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 2s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 3s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 4s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 5s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 6s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 7s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 8s ago, the bad child should exit soon...
[WAIT] - The reference process has exited 9s ago, the bad child should exit soon...
[TEST FAILED] - The reference process has exited 10s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 11s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 12s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 13s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 14s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 15s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 16s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 17s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410
[TEST FAILED] - The reference process has exited 18s ago, the bad child should have exited already!!! You should kill it manually to not hog your machine -- PID of bad child is 10410

@deepak1556
Copy link
Contributor Author

@alexandrudima should be working now. Seems like an issue with event loop contention in forked process because of the while(true) {} loop, the async Complete got never called on the main thread and hence the cleanup step didn't run to exit the process .

Another test case to prove this theory, change bad_child.js to:


const nw = require('../index.js');

const cp = require('child_process');
const path = require('path');

const parentpid = parseInt(process.argv[2]);

// Start a reference process
const ref_child = cp.fork(path.join(__dirname, 'reference_child.js'));

nw.start(ref_child.pid);

Run node test/bad_child.js and the process should exit with my earlier changes.

But since we don't have any purpose for the async completion callback , I have now moved the cleanup step to the worker thread and things work with your test case as well.

@alexdima
Copy link
Member

@deepak1556

Thank you!

Indeed, the purpose of this node module is to introduce a mechanism to self-exit processes where the event loop is no longer responsive (see the README). In VSCode, we have had multiple cases where extensions end up by accident blocking the event loop for a very very long time. When we close the window or restart it (via reload), we get a new process id and the old extension host will then terminate in 6s even if the event loop is permanently busy.

With you new changes, this now exits correctly, but I just ran this and I saw Fatal error in HandleScope::HandleScope which was not happening before:

Alexs-MBP:node-native-watchdog alex$ npm run test

> native-watchdog@1.0.0 test /Users/alex/src/node-native-watchdog
> node test/test.js

reference pid - 31254, bad child pid - 31255
[31254] - reference process running...
[31255] - bad process entering endless loop...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[WAIT] - Both processes are running...
[31254] - reference process exiting now!

#
# Fatal error in HandleScope::HandleScope
# Entering the V8 API without proper locking in place
#

[TEST PASSED] - The bad child process has exited!

I wonder if the call to NAPI_CALL_RETURN_VOID(env, napi_delete_async_work(env, info->request)); causes this fatal error. At that point, the event loop is busy and will never yield, so no v8 APIs can be used and I guess no lock can be obtained... So couldn't we just remove that line since the event loop is blocked and the process will exit anyways in 5 seconds?

@deepak1556
Copy link
Contributor Author

deepak1556 commented Aug 16, 2019

I wonder if the call to NAPI_CALL_RETURN_VOID(env, napi_delete_async_work(env, info->request)); causes this fatal error

Yup thats the cause, I shouldn't be making use of any napi calls that will trigger v8 execution in the worker thread callback as they are not lock protected.

I could also change NAPI_CALL_RETURN_VOID(env, napi_delete_async_work(env, info->request)); => napi_delete_async_work(env, info->request) to fix this.

So couldn't we just remove that line since the event loop is blocked and the process will exit anyways in 5 seconds?

yeah since the process is exiting any way, we can get away with leaking worker objects. Removing that line as suggested.

@alexdima
Copy link
Member

Brilliant! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants