Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Should we use SQLAlchemy within SessionManager? #1063

Open
kevin-bates opened this issue Nov 10, 2022 · 4 comments
Open

[DISCUSS] Should we use SQLAlchemy within SessionManager? #1063

kevin-bates opened this issue Nov 10, 2022 · 4 comments

Comments

@kevin-bates
Copy link
Member

kevin-bates commented Nov 10, 2022

Earlier this week we received an issue asking if other database types could be used to store Session information. This was due to the user wanting to have multiple servers hosting their applications while sharing the same database.

This topic was discussed in our Jupyter Server meeting earlier today. Since this is precisely where we want to take "the server", I was asked to open an issue for discussion. As a team, there appeared to be general enthusiasm (consensus may be too strong) for introducing a means for users to substitute different database applications and SQLAlchemy appeared to be the de-facto approach taken within the Python community.

Here are some points worth discussing (please feel free to add others):

  1. If SQLAlchemy were to be the approach, should we use SessionManager as the litmus test where SQLAlchemy could be introduced with a default database type of SQLite. We could open an issue with a 'help wanted' tag and let the community contribute to that effort. With this approach, there should be roughly zero user-facing changes (perhaps a configurable trait or two) as the underlying functionality should work identically to today as everything is currently internal.
  2. Create a different repo that would house an alternative SessionManager implementation that also takes this approach. In this case, users would need to opt-in to using this SessionManager. This approach also required repository maintenance and release management.
  3. Should we offer the ability for users to use a NoSQL database instead, which SQLAlchemy does not support? There are definitely cases where a NoSQL database makes better sense, especially when talking about scaling the server horizontally. This would manifest in the form of a general ORM layer.
  4. How much of this should be deferred to Jupyverse if that is indeed the next-generation, multi-tenant, server?
@davidbrochart
Copy link
Contributor

davidbrochart commented Nov 11, 2022

4. How much of this should be deferred to Jupyverse if that is indeed the next-generation, multi-tenant, server?

Jupyverse could definitely support this kind of use case. There is already a database for user settings that uses SQLAlchemy (through FastAPI-Users), and sessions are very much tied to users. I think using SQLModel would make a lot of sense there, as it is based on SQLAlchemy and it plays well with FastAPI.

@dlqqq
Copy link
Contributor

dlqqq commented Mar 14, 2023

I agree that SQLite is a bit lacking, especially when it comes to schema migrations. Furthermore, the presence of read/write locks essentially require an additional layer like aiosqlite to use a message queue to send transactions one-at-a-time to the DB, and that brings its own problems w.r.t. having an additional external dependency and clunky syntax. I've run into both of these issues with Jupyter File ID and Jupyter AI, and it's especially annoying. I think we should investigate NoSQL alternatives such that schema changes won't necessarily require a major version bump.

However, I'm hesistant on agreeing that the right solution is to add an ORM abstraction layer between the server and the database. What specific benefits are there to using SQLAlchemy? We've experimented with this in Jupyter Scheduler, and I did not find it particularly helpful. I spent more time trying to decipher the syntax and cross-reference it with documentation than actually interfacing with the database. Yes, I know that aiosqlite is also "another layer", but for the most part, the syntax is identical to the DB-API spec and doesn't require nearly as much documentation cross-referencing.

I think that excessive usage of external dependencies makes code harder to maintain, as each external dependency is a prerequisite for the developer to learn on their own before contributing. We should only add an external dependency if there's an acute need that only it fills, and I don't think that is the case for SQLAlchemy or other declarative SQL ORMs.

@dlqqq
Copy link
Contributor

dlqqq commented Mar 14, 2023

BTW, I'm open to experimenting with a NoSQL database implementation on Jupyter AI. I'm happy to take recommendations here and serve as a guinea pig for this issue. 😂

@kevin-bates
Copy link
Member Author

Came upon Prisma today. This looks interesting, particularly in that it supports (some) NoSQL DBs and is strongly typed and has a python client that uses pydantic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants