Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsfn: implement TypedThreadSafeFunction #742

Merged
merged 46 commits into from
Dec 7, 2020

Conversation

KevinEady
Copy link
Contributor

@KevinEady KevinEady commented Jun 8, 2020

Implements a templated TypedThreadSafeFunction<ContextType, DataType, Callback> class. The class also adds static, compile-time support for the optional Function argument on N-API 5+ using std::nullptr_t (as well as completely missing Function argument) on N-API 5+.

TODO:

Fixes:

@KevinEady
Copy link
Contributor Author

Hey @gabrielschulhof , @mhdawson ... Soooo I am a little stuck now. I have no idea why this is happening, but it seems that the TSFNEx::CallJs wrapper (which, in turn, calls the user-defined static CallJs function) is getting called by Node with an empty napi_env.

I did as you suggested, and migrated the old "multi-threaded" TSFN test to this new TSFNEx methodology. The changes to do so seem minimal, so I'm not sure why this is erroring.

I've added this signal(SIGTRAP) call, and you can see it is indeed erroring there the CI build.

This is what I've got in my debugger:

image

and here's the stack...

__pthread_kill (@__pthread_kill:6)
pthread_kill (@pthread_kill:109)
raise (@raise:12)
Napi::ThreadSafeFunctionEx<ThreadSafeFunctionInfo, int, &(TSFNCallJS(Napi::Env, Napi::Function, ThreadSafeFunctionInfo*, int*))>::CallJsInternal(napi_env__*, napi_value__*, void*, void*) (/Users/kevineady/Documents/Projects/node-addon-api/napi-inl.h:4477)
void node::Environment::CloseHandle<uv_handle_s, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::'lambda'(uv_handle_s*)::operator()(uv_handle_s*) const::'lambda'(uv_handle_s*)>(uv_handle_s*, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::'lambda'(uv_handle_s*)::operator()(uv_handle_s*) const::'lambda'(uv_handle_s*))::'lambda'(uv_handle_s*)::__invoke(uv_handle_s*) (@void node::Environment::CloseHandle<uv_handle_s, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::'lambda'(uv_handle_s*)::operator()(uv_handle_s*) const::'lambda'(uv_handle_s*)>(uv_handle_s*, v8impl::(anonymous namespace)::ThreadSafeFunction::CloseHandlesAndMaybeDelete(bool)::'lambda'(uv_handle_s*)::operator()(uv_handle_s*) const::'lambda'(uv_handle_s*))::'lambda'(uv_handle_s*)::__invoke(uv_handle_s*):71)
uv_run (@uv_run:142)
node::NodeMainInstance::Run() (@node::NodeMainInstance::Run():100)
node::Start(int, char**) (@node::Start(int, char**):61)
start (@start:4)

Hopefully we can discuss in today's call.

@KevinEady KevinEady requested a review from legendecas June 8, 2020 14:45
@KevinEady KevinEady changed the title Contexted tsfn api gcc 4 tsfn: implement ThreadSafeFunctionEx Jun 8, 2020
@KevinEady
Copy link
Contributor Author

KevinEady commented Jun 12, 2020

Hi @gabrielschulhof ,

I took into a look into what you suggested, that maybe all threads have called Release() prior to queue being completed.

I pinpointed the failure to this specific portion:

  // Start the thread in blocking mode, and assert that it could not finish.
  // Quit early by aborting.
  .then(() => testWithJSMarshaller({
    threadStarter: 'startThread',
    quitAfter: 1,
    maxQueueSize: binding.threadsafe_function_ex_threadsafe.MAX_QUEUE_SIZE,
    abort: true
  }))
  .then((result) => assert.strictEqual(result.indexOf(0), -1))

... and when my CallJS is triggered, the environment and callback are null, but my context and data are still valid:

image

So I think you are completely right: this test has the TSFN abort, so I suppose there are still items on the queue when an abort happens.

So, am I correct that:

  • Inside your CallJS, checking that env == nullptr signifies that the thread count on the TSFN is 0, ie. all have called Release / Abort / Call (and received napi_closing)
  • However, if you want to handle the case that 'tsfn is aborted' differently inside your CallJS, this must be managed by user, as napi does not provide a threadsafe way to get the tsfn state.
    • ... but I suppose it wouldn't make sense to handle the logic that "thread is aborted" in the CallJS callback, as you would probably handle it on the caller side, when receiving napi_closing fromtsfn.Call

EDIT: I ask for clarification, because I want to put this in the documentation. I don't think it's inherently clear that your TSFN's "environment" is cleaned up on the last Release. I think it's clear that the finalizer will run on the last release, but I don't think it's clear that your tsfn can be finalized with items still on the queue. Possible workflow: last Release() is called -> queue is executed -> tsfn finalizer ran... but I suppose it is done to prevent the CallJS callback from potentially adding more items to the tsfn.

Do you know of any place where we have documentation about the "CallJS can be ran with empty napi env/function after the TSFN has been finalized, for you to process the remaining items on the queue" or whatever?

@mhdawson
Copy link
Member

mhdawson commented Nov 20, 2020

From discussion in the meeting we came up with another suggestion which was TypedThreadSafeFunction

@KevinEady
Copy link
Contributor Author

When #832 lands, I will rebase, rename, and push.

@mhdawson
Copy link
Member

@KevinEady, landed #832

@mhdawson
Copy link
Member

@KevinEady we suggested TypedThreadSafeFunction as a possibility but wanted your input on whether you thought that make sense or not.

@KevinEady KevinEady force-pushed the contexted-tsfn-api-gcc-4 branch from be32dc8 to 559ad8c Compare November 24, 2020 17:13
@KevinEady KevinEady changed the title tsfn: implement ThreadSafeFunctionEx tsfn: implement TypedThreadSafeFunction Nov 24, 2020
@KevinEady
Copy link
Contributor Author

Hi @mhdawson , I have rebased + renamed to TypedThreadSafeFunction. I noticed the CI on push only checks for style changes. Are you doing changes on this @legendecas?

@gabrielschulhof / @mhdawson, can you run the CI?

@KevinEady KevinEady requested a review from legendecas November 24, 2020 17:40
@legendecas
Copy link
Member

@KevinEady I'm not sure what's going on with TravisCI. We do have .travis.yml unchanged since 8 days ago on the master branch.

I checked the travis permission and found out that Node.js github org doesn't grant any public write access to travis. Not sure if it is a recent change.

@mhdawson
Copy link
Member

@mhdawson
Copy link
Member

This was the failure on 12.x on Mac 15

Running test 'threadsafe_function/threadsafe_function_sum'
Build timed out (after 60 minutes). Marking the build as failed.

@KevinEady
Copy link
Contributor Author

@mhdawson That is... odd. That test isn't part of this PR, and there were no changes made to the existing TSFN implementation. FWIW, the previous 12.x run passed, which was manual test for HandleScope fix with old ThreadSafeFunctionEx name.

@KevinEady
Copy link
Contributor Author

Looks like https://ci.nodejs.org/view/x%20-%20Abi%20stable%20module%20API/job/node-test-node-addon-api-new/3174/ passed... do you think there is some issue with the original tsfn test?

@mhdawson
Copy link
Member

Seems like the re-run I launched was on 16.x. Another one on 12.x this time: https://ci.nodejs.org/view/x%20-%20Abi%20stable%20module%20API/job/node-test-node-addon-api-new/3175/

@KevinEady my first guess is an existing problem with the test unless somehow the PR refactored code which the base ThreadSafeFunction uses?

@mhdawson
Copy link
Member

Looks like this simply adds to napi-inl.h so most likely an existing flaky issue with the test. As long as the new 12.x run I launched passes I think we can land this PR.

@KevinEady if you are ok to land then I think we should be good to go.

@mhdawson
Copy link
Member

mhdawson commented Dec 1, 2020

@legendecas are you ok with this landing now? Just want to make sure as it shows requested changes, but I think that was just to close on the name change.

@KevinEady
Copy link
Contributor Author

@KevinEady if you are ok to land then I think we should be good to go.

@mhdawson I have no problems with that.

@gabrielschulhof
Copy link
Contributor

@legendecas I think you can remove the -1, because the class was renamed.

Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This is a great work! LGTM

@gabrielschulhof gabrielschulhof merged commit 48e6b58 into nodejs:master Dec 7, 2020
@gabrielschulhof
Copy link
Contributor

CI on resulting master was green:

48e6b58#commitcomment-44890203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants