Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite: add support for SQLite Session Extension #54181

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

louwers
Copy link

@louwers louwers commented Aug 2, 2024

We were talking about adding support for the Session Extension of SQLite in #53752. This is not a run-time extension but rather a feature built in to the SQLite amalgamation. It can be enabled with a compile flag and is enabled by default on some platforms.

I have never contributed to Node.js before so I will probably need some help to bring this to a conclusion.


TODO:

  • Make sure sessions are deleted before the database they are attached to is closed
  • Consensus on API
  • Documentation
  • Allow custom conflict handler
  • Throw with specific (documented) exception in case of conflict when applying changeset since SQLite doesn't consider this to be an error I return false when applying the changeset is aborted due to a conflict
  • Allow generating a patchset as well
  • Allow specifying table name when creating session (to only track that table)
  • Implement Session.close()

Example usage:

const database1 = new DatabaseSync(':memory:');
const database2 = new DatabaseSync(':memory:');

database1.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');
database2.exec('CREATE TABLE data(key INTEGER PRIMARY KEY, value TEXT)');

const session = database1.createSession();

const insert = database1.prepare('INSERT INTO data (key, value) VALUES (?, ?)');
insert.run(1, 'hello');
insert.run(2, 'world');

const changeset = session.changeset();
database2.applyChangeset(changeset);
// Now database2 contains the same data as database1 

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg

@louwers louwers marked this pull request as draft August 2, 2024 17:57
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Aug 2, 2024
@karimfromjordan
Copy link

Wouldn't this be solved if users could use their own SQLite build? That's already being considered according to the original issue.

@louwers
Copy link
Author

louwers commented Aug 2, 2024

@karimfromjordan Nope, because you need access to the C API to access this feature.

@anonrig
Copy link
Member

anonrig commented Aug 2, 2024

Can you add the appropriate documentation as well?

Copy link

codecov bot commented Aug 2, 2024

Codecov Report

Attention: Patch coverage is 91.05263% with 17 lines in your changes missing coverage. Please review.

Project coverage is 87.33%. Comparing base (7fea010) to head (fa4ac48).
Report is 2 commits behind head on main.

Files Patch % Lines
src/node_sqlite.cc 92.02% 2 Missing and 13 partials ⚠️
src/node_sqlite.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #54181    +/-   ##
========================================
  Coverage   87.32%   87.33%            
========================================
  Files         649      649            
  Lines      182618   182808   +190     
  Branches    35033    35083    +50     
========================================
+ Hits       159475   159653   +178     
- Misses      16407    16409     +2     
- Partials     6736     6746    +10     
Files Coverage Δ
src/node_sqlite.h 0.00% <0.00%> (ø)
src/node_sqlite.cc 86.27% <92.02%> (+2.27%) ⬆️

... and 22 files with indirect coverage changes

@louwers
Copy link
Author

louwers commented Aug 2, 2024

I need to make sure sessions are deleted before the database they are attached to is closed.

I think I need the DatabaseSync hold onto the Sessions it creates. Both are BaseObjects so I need to look into how to do that.

When a Session goes out of scope the corresponding DatabaseSync needs to no longer hold onto it. Conversely I also need to make sure that all operations on Session objects are no-ops after the corresponding database is closed.

@louwers louwers force-pushed the session-extension branch 2 times, most recently from 9fe44b8 to 36e1920 Compare August 3, 2024 14:01
@louwers louwers changed the title src, test: support for SQLite Session Extension src, test, doc: support for SQLite Session Extension Aug 3, 2024
@louwers
Copy link
Author

louwers commented Aug 3, 2024

@anonrig I have added documentation, most of the functionality and fixed the lint issues. Could you trigger another CI run?

I want to add support for generating a patchset and then this PR is ready for review.

I have some ideas for future improvements, e.g. more fine-grained conflict handling or exposing the utilities SQLite provides for inspecting and merging changesets, but they can be added in a backward-compatible manner.

@anonrig anonrig requested a review from cjihrig August 3, 2024 20:04
@louwers louwers marked this pull request as ready for review August 3, 2024 21:51
@louwers louwers force-pushed the session-extension branch 2 times, most recently from 8fc1cf0 to aaf0928 Compare August 3, 2024 22:17
@louwers louwers force-pushed the session-extension branch 2 times, most recently from 08b3562 to b544d2f Compare August 4, 2024 14:47
@RedYetiDev RedYetiDev added the sqlite Issues and PRs related to the SQLite subsystem. label Aug 4, 2024
@TheOneTheOnlyJJ
Copy link

This is excellent work that many projects will surely be based on. I am curious about the eventual wrapping of the entire Sessions C API (mixed & mashed into higher-level Node functions). Does this PR leave space for that? There are a few more Sessions API functions that would be of great use in Node.

return tmpl;
}

void Session::MemoryInfo(MemoryTracker* tracker) const {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to implement these.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell Do you know if anything needs to happen here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd say it is ok to leave this as is.

@louwers
Copy link
Author

louwers commented Aug 4, 2024

@TheOneTheOnlyJJ Thanks! I agree that exposing more functionality of the Session Extension in Node.js would be great. Here are some ideas:

  • Wrappers for sqlite3changeset_invert() (creates a changeset that undoes the changes in another changeset) and sqlite3changeset_concat() (combines changesets).
  • Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.
  • Allowing the contents of a changeset to be read. You probably want to reuturn a Generator for this so you don't need to read the entire changeset it in memory at once.
  • Change the onConflict handler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change.
  • A wrapper for sqlite3session_diff().
  • Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.

Here is the full API for future reference: https://www.sqlite.org/session/funclist.html

Right now I first want to gather some feedback, but I'd be interested continue working on this, either in a follow-up PR or in this one. Since the SQLite integration is still under active development I think working incrementally is the preferred approach.

@TheOneTheOnlyJJ
Copy link

Good point, here's my view on them:

Yes, I was about to suggest these. There's also sqlite3session_isempty() and sqlite3session_memory_used() which are simpler (compared to the rest of the API) and could be wrapped more easily.

  • Allow closing a session. Right now a session is closed when it is garbage collected or when the corresponding database is closed. It is helpful to be able to manually close session in some cases.

This should be a high priority right now, as long-running Node processes (like server back-ends) that would use Sessions will have their memory slowly but surely filled up with no way of freeing it without closing the database connection. Applications could run for months or even longer without closing a database connection, so not being able to close Sessions and free their memory would deter developers from using them. The database should, of course, still keep a list of all active sessions and delete them when it is closed, just in case not all Sessions were manually closed. From my point of view, this function should be wrapped as soon as possible and be included in this PR. Not having it surely decreases the chances of getting this merged, and it would be a shame to lose out on this.

  • Allowing the contents of a changeset to be read. You probably want to reuturn a Generator for this so you don't need to read the entire changeset it in memory at once.
  • Change the onConflict handler to take a function as well. You would be able to inspect the conflicting change and decide on the action (omit, replace, abort) for each conflicting change.
  • A wrapper for sqlite3session_diff().

Very important functionalities, but again, they're not vital for achieving functional basic interaction with the extension. These 3 points could be bundled in a follow-up PR that handles the finer-grained data management side of the API. The changegroup could be exposed as an object with its API functions bound to it as methods.

  • Utilize streaming versions of the Session API (you would need another API for Node.js as well). These are helpful for reading or applying changesets that do not fit in memory.

I've noticed this in the documentation and it's definitely a part of the API that would be crucial in large apps with large databases. But again, for the current goal of incrementally proposing a stable API that exposes the API of the extension, this is not required right now. It would also require wrapping sqlite3session_config(), which allows configuring SQLITE_SESSION_CONFIG_STRMSIZE. This could be handled after the 3 points above get sorted out.

Also, before I wrap up...

I noticed that you proposed the createSession() method as a wrapper around both sqlite3session_create() and sqlite3session_attach(), combined. Looking at the API, I'm trying to figure out if there's any potential functionality or use case where exposing these functions separately in Node would make sense. The only potential use case I see is creating Sessions in one part of the application and then attaching them somewhere else. Maybe this would be useful if using a global Session that gets created at app start and attached later in the app life cycle? But, given that a Session cannot be unattached, this is likely not going to be useful in any way, so the separation of the functions is not needed. What's you input on this, do you see any potential use cases for the splitting of these 2 functions?

@louwers
Copy link
Author

louwers commented Aug 5, 2024

This should be a high priority right now

Agreed. I will still add this one as part of this PR.

What's you input on this, do you see any potential use cases for the splitting of these 2 functions?

I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:

better-sqlite3's public API must be as simple as possible. Rather than calling 3 functions in a specific order, it's simpler for users to call a single function. Rather than providing many similar functions for doing similar things (e.g., "convenience functions"), there should just be one function that is already convenient by design. Sane defaults should be applied when possible. A function's minimal call signature should be as small as possible, with progressively complex customization available when needed. Function names should only be as long as necessary to convey their purpose. For any new feature, it should be easy to showcase code examples that is are so simple that they are self-explanatory.

In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.

@TheOneTheOnlyJJ
Copy link

Agreed. I will still add this one as part of this PR.

Yes, this would be ideal. For further updates other than this, I believe it's best to wait for the final form of the node:sqlite API itself.

I don't see any benefits. Furthermore, I like the philosophy of better-sqlite3:

I was checking if any additional functionality could be squeezed out by separating the functions. Given that there is none, your decision is a sensible one and I support it.

In this vein I also oped to default to aborting on conflict, instead of requiring an explicit conflict handler to be passed like in the API of SQLite.

This is also the best sane default option, as every developer should customize this behavior according to their use case.

I'm not experienced in the internals of Node, so my contribution here can only amount to discussing the API design and smaller implementation details.

I guess the next step now (after implementing the Session close wrapper) is to wait for a maintainer to review this?

louwers and others added 25 commits November 2, 2024 19:24
Co-authored-by: James M Snell <jasnell@gmail.com>
Co-authored-by: Aviv Keller <redyetidev@gmail.com>
Co-authored-by: Aviv Keller <redyetidev@gmail.com>
Co-authored-by: Aviv Keller <redyetidev@gmail.com>
Co-authored-by: Aviv Keller <redyetidev@gmail.com>
Co-authored-by: Colin Ihrig <cjihrig@gmail.com>
@louwers
Copy link
Author

louwers commented Nov 2, 2024

Thanks for your reviews! Took me a while to get back to this.

I have addressed most comments and left a few questions. Please have another look if you have the chance.

If someone could kick off CI that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. sqlite Issues and PRs related to the SQLite subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants