Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting for user functions, fixes #140 #448

Closed
wants to merge 2 commits into from

Conversation

wbyoung
Copy link

@wbyoung wbyoung commented May 13, 2015

This allows the registration of functions via JavaScript that can be called in queries. I have a little left to do to handle errors, but I wanted to check that this is on the right track before continuing. I've not worked with C++ in a long time and this is the first that I've worked with v8, so I did my best.

It takes the approach of calling on the Node.js thread, so this would certainly have a huge impact on performance (though in some cases that may not matter). The suggestion in #140 to create a new v8 isolate and transfer the function there seems like a far better way to make this work, but is way beyond what I'd be able to do.

This seems like a start at least.

@wbyoung
Copy link
Author

wbyoung commented May 13, 2015

So the last thing that I wanted to figure out in this is how to know when a function call has produced an exception.

src/database.cc Outdated
uv_async_init(uv_default_loop(), &baton->async, (uv_async_cb)Database::AsyncFunctionExecute);
uv_async_send(&baton->async);
uv_mutex_lock(&mutex);
uv_cond_wait(&baton->condition, &mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv_cond_wait may produce spurious wakeups. We need to check whether the function call is actually complete.

@wbyoung wbyoung force-pushed the user-functions branch 2 times, most recently from 28de83f to 78fd0bb Compare June 30, 2015 00:43
@springmeyer
Copy link
Contributor

heads up @wbyoung: I disabled travis builds for pull requests on this repo because the OS X builds were hanging and blocking the queue for other builds. Sorry about the hassle. /cc @rclark

@wbyoung
Copy link
Author

wbyoung commented Jun 30, 2015

@kkaefer I had trouble enabling C++11 features to get std::future/std::promise working, so I just stuck with uv_cond and am now ensuring that the work has been completed to properly handle spurious wakeups.

Each function call is now being addd to a queue & the queue is being processed on the main thread. I'm not sure whether this is necessary or not. The SQLite3 docs don't really specify how callbacks are invoked with regards to threading. I'm not sure whether SQLite3 will make multiple simultaneous calls to a callback registered with sqlite3_create_function, but for some reason I seem to think that it wouldn't. The current implementation should work regardless & be future proof.

The failures with iojs on Linux seem to be addressed, so I assume that was related to the improper use of uv_async_init and/or not handling the spurious wakeups.

I'd still be interested in any advice on how to know when a function call has produced an exception.

@wbyoung
Copy link
Author

wbyoung commented Jun 30, 2015

@springmeyer thanks for the heads up. It's hard to test all those configurations, but I probably shouldn't have pushed each of those commits either. I didn't really realize until later that each build was taking about an hour to complete/hanging.

@springmeyer
Copy link
Contributor

@wbyoung nothing to worry about on your end - it was just tough timing with travis being slow.

@wbyoung
Copy link
Author

wbyoung commented Jun 30, 2015

@springmeyer let me know if you're able to re-enable at any point or any other ideas to test all configurations. I can't enable Travis on my own fork because of the use of sudo.

I believe this is the most appropriate way to handle exceptions. Using
v8::Function::Call rather than node::MakeCallback (via NanMakeCallback)
means that these functions will not work with domains, but that seems
appropriate.

I don't know if nodejs/node-v0.x-archive#9245 or nodejs/nan#284 should be a concern
here or not.
@wbyoung wbyoung changed the title Started working on supporting user functions to fix #140. Supporting for user functions, fixes #140 Jun 30, 2015
@wbyoung
Copy link
Author

wbyoung commented Jun 30, 2015

A faster version of this with some issues can be found at wbyoung/node-sqlite3 user-functions-isolate that creates a new v8 isolate.

@kkaefer
Copy link
Contributor

kkaefer commented Jul 1, 2015

@wbyoung interesting; what implications does the use of isolates have? Can the user provided functions still access other parts of the script?

@wbyoung
Copy link
Author

wbyoung commented Jul 1, 2015

@kkaefer as of right now, that branch won't even let you require anything or even use console from within a user provided function because node::CreateEnvironment has not been called. So you only have pure javascript. If I have some time, I can dig a little deeper and try to make that work, but it may be a bit challenging to get right as I'm still feeling quite unfamiliar with the inner workings of Node & v8.

Anyway, even if it it were loading the node environment, v8 stipulates that you can't access objects from a different isolate. The function has been stringified and copied to the new isolate, so the following would produce a ReferenceError: outofscope is not defined:

var outofscope = 0;
db.registerFunction('MY_FUNC', function(value) {
  return value + outofscope;
});

I'm torn between which approach seems better.

dispatch to default loop

✓ Function executes in expected environment. Closures & scoping work as expected.
✗ Slower. When a user registers a custom function, I don't think they'd expect that it could run hundreds of times & interrupt the default loop that should be focusing on more important things (perhaps processing incoming HTTP requests).

v8 isolate

✓ Faster.
✗ The function executes in an unexpected environment. This could have consequences that wouldn't necessarily be obvious or clear when looking at the code. For instance, requiring modules would re-load them since you'd have a new require cache. Basically, it'd be like that one function was executing in its own little world.

wbyoung added a commit to wbyoung/node-sqlite3 that referenced this pull request Jul 1, 2015
This has many problems which are discussed in TryGhost#448.
@wbyoung
Copy link
Author

wbyoung commented Jul 1, 2015

@kkaefer an update on isolates…

wbyoung/node-sqlite3@f8cd018 creates an isolate for a user defined function. It does not load the node environment.

wbyoung/node-sqlite3@79756a4 actually loads the node environment (and loads of functions from a separate file). At this point, I'm pretty convinced that a separate v8 isolate is a bad idea.

  • Without loading the node environment, you're pretty limited in what you can do (pure JS only, no node stuff).
  • When you load the node environment, you'll need to bring all the assumptions that node brings with it. Correct me if I'm wrong, but I believe Node.js makes the assumption that all JS code run will run on a single thread. To avoid contention with the uv_default_loop, we could create a separate uv_loop_t and start it running in a separate thread. The node environment setup would allow that. But then we'd have to dispatch all work to that loop so we didn't break assumptions for how Node.js was architected (pure v8 would allow JS execution from multiple threads w/ locks, though). At that point, the performance would be similar to simply dispatching to the uv_default_loop, but with the advantage that it wouldn't be competing with activity in the the main node isolate. I don't think that's a convincing enough reason to pursue this.

The point to drive home is that if what you want is performance, you should create a user defined function via a native extension. Most who are looking for performance are probably going to look to another database anyway. If you're not looking for performance, then it's okay to take the performance hit that comes with dispatching to the default loop & executing the function in the original node isolate. Plus, there'd be no weird scoping issues.

At this point, I'm hoping that this pull request may be sufficient to be incorporated into the project.

Thanks for the tips so far!

@kkaefer
Copy link
Contributor

kkaefer commented Jul 2, 2015

Thanks for all the exploratory work; very insightful and informative.

As for using isolates, what we could do is the following: Instead of using the libuv work pool, all SQlite functions could be run in a custom thread that we spawn. This thread could also run a uv_loop_t, and load a node environment. Instead of registering custom functions in code, the user specifies a root entry file/module that is getting loaded in that thread, and all exported JS functions are available as SQLite functions. This means that users can't modify the custom SQLite functions on the fly from the original node file, but that may not be a common use case anyway.

@kkaefer
Copy link
Contributor

kkaefer commented Jul 2, 2015

var db = new sqlite3.Database(':memory:', done);
db.loadEnvironment('./functions.js');


db.all('SELECT FOO() AS txt', function(err, rows) {
    console.warn(rows);
});

functions.js:

exports.FOO = function() {
    return 'bar';
};

wbyoung added a commit to wbyoung/node-sqlite3 that referenced this pull request Jul 2, 2015
This has many problems which are discussed in TryGhost#448.
@wbyoung
Copy link
Author

wbyoung commented Jul 2, 2015

wbyoung/node-sqlite3@c24fc5e loads all functions from a single file.

@wbyoung
Copy link
Author

wbyoung commented Jul 2, 2015

@kkaefer by running all SQLite functions, you mean that all queries would be run on a single thread rather than being run as they are right now (in a threadpool via uv_queue_work)?

Wouldn't that approach mean that all queries would potentially block each other?

Currently, as I'm understanding the code, all sqlite_* operations are within a Work_<Type> function that's initiated via a call to uv_queue_work and then the result is processed (and callbacks made) via the uv_default_loop in a Work_After<Type>. Since uv_queue_work uses a threadpool, none of the operations will block one another (up to the thread limit imposed by libuv). This seems consistent with your latest comment as well, @kkaefer, so I hope I've understood correctly.


Ramblings/Thoughts

The way that I'm reading the SQLite3 docs, it seems that it has a few different thread modes. If my understanding above is correct, then I'm also led to believe that this library simply assumes that the SQLite library it's linking to has at least multi-thread if not serialized (the default). What I'm getting at is that adding the ability to switch modes into one operating only in a single thread could allow node-sqlite3 to also work with SQLite libraries that are operating in single-thread mode.

So the following would be how the library worked:

  • Operate in multi-threaded mode (using uv_queue_work)
  • Switch to single-threaded mode when sqlite3_threadsafe returns false.
  • Switch to single-threaded mode when using db.loadEnvironment()

I'm not convinced that the tradeoffs are worth it. With the pull request as it is (dispatching user functions to the default loop), only queries that are using user defined functions are affected. Running all queries on a single thread would affect all queries even if you want to add a user function that only runs occasionally.

Also, the SQLite3 docs doesn't actually specify on which thread user defined function callbacks are made. I think it's safe to assume that it's the same thread as the one on which the statement was executed, but I just wanted to throw that out there. Please correct me if I've overlooked something in the docs.

I actually hope that I'm wrong on all of this and that there's a way to make it work in its own isolate. Regardless, the back and forth with you, @kkaefer has given me a far better understanding of Node & v8, so thanks!

@kkaefer
Copy link
Contributor

kkaefer commented Jul 3, 2015

A single SQLite database can never be used from more than one thread at a time (unless in serialized mode, but that one is, well, serialized), so giving each Database handle its own thread wouldn't slow down performance. In fact, the current code may produce SQLITE_BUSY errors quite often if used incorrectly (we're technically violating SQLite's requirement about not using the same connection in more than one thread). When you call .serialize(), we're manually serializing all requests to a database connection, and this should be the default mode anyway. In reality, I doubt that this module will become slower when assigning each database handle its own thread.

@wbyoung
Copy link
Author

wbyoung commented Jul 4, 2015

@kkaefer ok, I was somehow confused into thinking that multiple queries could run at once for any given connection, but the SQLite3 docs are clear that's not possible.

Just to be sure we're on the same page (since this could be a lot of work to get working), the process of opening a connection (Database::New or Database::Work_BeginOpen) would start the worker thread & loop. Then all methods would be refactored to run their initial work Work_<Type> on that thread / in that loop.

I'll give it a shot if I have some time.

@suchitpuri
Copy link

Hi Guys, When can we expect this to be merged in master ?

@wbyoung i tried your branch user-functions, but it kills the application as soon as the registerFunction is called.

@wbyoung
Copy link
Author

wbyoung commented Nov 21, 2015

@suchitpuri try using revision 3db73b9.

I haven't had time to work on this and am not sure if/when I will again.

@rlindner81
Copy link

Hi @kkaefer, I'm curious what is the main blocking point for this work? I would really like to see the user-function feature added to node's sqlite. I could take over the code if @wbyoung doesn't have time to finish it.

@rlindner81 rlindner81 mentioned this pull request Aug 3, 2019
@SamuelBrucksch
Copy link

Hi,

any update on this?

Regards

@rlindner81
Copy link

@SamuelBrucksch I re-did most of the work in this branch in #1200. Still waiting for a review though.

@thearchivalone
Copy link

thearchivalone commented Mar 6, 2020

@kewde any way this PR can be closed? #1200 that replaced it was closed out by its maintainer due to code fixes being unmanageable from being outdated. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants