-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make prql a PostgreSQL stored Procedure Language? #725
Comments
That looks very exciting! So IIUC this would intercept the query, written in PRQL, compile it to SQL, and then send than on to postgres? IIUC, extensions can do much more than we'd need — e.g. be involved during the query — but even if we don't use that, this would let people send PRQL directly to postgres. Is that correct? Thanks for raising this @michelp . |
Yes to both, and you can cache the execution plan itself and only pay for the compilation once. In addition prql could call other routines, be called by other routines, accept arguments, return simple and composite types, set returning functions: it would make prql a first class stored procedure language just like sql, plpgsql, plv8, plpythonu, etc. I'm not a Rust programmer so unfortunately that's as specific as I can get, an extension can register a new language callback and be instantiated into a database with https://www.postgresql.org/docs/current/sql-createlanguage.html Here's plpython3's callback as a starting point: https://doxygen.postgresql.org/plpy__main_8c.html#afe79221c59a91b3b176c75a28a28709b
|
Looks like there's also a "sample" pl language for tests https://doxygen.postgresql.org/plsample_8c.html |
Great, thanks! If anyone would like to explore this more, please do! I'm guessing it's a big project, but it could be very cool indeed. |
This would be very cool indeed! I just want to clarify one point as I think there might be a slight misunderstanding (possibly on my part). This only applies to Stored Procedures and Functions. This doesn't actually allow Postgres to understand general queries written in PRQL, unless those are wrapped in a Stored Procedure. I've been using Postgres for a number of years and in my data warehouse workloads with dbt I haven't had to use Stored Procedures for anything so far. Still very cool though. Just wanted to clarify that. |
Weird, I received an email that there was a comment on this thread but it's not showing it. The url from the thread was: #725 (comment) Anyway, I'm going to reply to the comment which was:
I had never of DO blocks before but they look interesting:
My only concern is that, if they return void then can a DO block return any data? All the examples I have found so far just show granting some roles on a schema. Similarly the documentation for Python Anonymous Code Blocks states that:
This also makes me believe that a DO block can't return any data. So while you could execute a PRQL query inside the DO block, since DML is not part of the scope of PRQL and you can't return the result from the DO block, there isn't really any way to do anything useful with the query result. |
Yeah same here, weird.
Nope DO blocks cannot return data, if you want them to DO something, they have to have a side effect (DDL, CALL procedure, etc).
Right, maybe an interesting integration point for the future, but it seems like the most useful thing for prql would be as a body of a function you write (and returns data). |
Thanks for the clarification @michelp . Agreed. |
Hi, I'm taking a stab at this as mentioned in #3455 and hope you can help me with a quick question. I would like to replace identifiers with |
That's great you're working on this, looking forward to seeing how it goes. Definitely ask questions here.
What sort of identifiers? Like column names? I doubt this is the answer, but we can get a output from midway through the compilation stage by looking at the RQ. That's available from the library, and also from
There's an even later AST called SRQ, although when I look at the output it looks less descriptive, I can look more if that'd be helpful. We could also possibly expose the
What's the function we want there? We can make things public if needed (I thought much of it was already public, but it's possible we've missed things — our main uses for exporting are to the bindings, which don't go that deep into the lib) |
Thanks! I now see the prql_to_* family of functions, I think these are perfect.
So if I need to walk the AST and replace identifier names, would the "RQ" level be best you think? And what would be the easiest way to walk the AST you think? Let me provide a bit more context for you this time 😃 :
Any parameter value (the function's parameters) needs to be considered when evaluating the resulting SQL. The easiest way to do this with PGRX seems to be with their prepare_statements functionality. And this requires renaming the argument names referenced in the function body to $1 and $2 etc.
I think so yes, but also a lot of other kinds of identifers. I don't have a complete idea of when we are allowed to do it or not. I hope we can just do a "dumb" replace and let prql or postgres report errors if the user wrote invalid code. |
OK great!
To confirm my understanding — do we want to replace things like If the SQL AST is sufficient, then that would be the easiest — it's the AST of sqlparser-rs which the compiler produces before stringifying it. It's processed at this line. I don't think we have a public function to grab this, but that would be trivial to add, I'd be happy to do that. (You could also take the SQL string and pass it back into That said, not sure if that is sufficient — it depends if we need context from the PRQL query which gets lots as we compile to SQL. |
First milestone would be to replace things like
Should compile to:
and we would replace I think we are both unsure which AST is best to use. Your suggestion for passing the SQL string back into sqlparser-rs is a very good idea. Let me explore this a bit. Thanks for offering to add a public function to expose it. I will let you know if we should go this route. Finally, If there are postgres or pgrx experts out there who have suggestions for other approaches than prepared statements, please chime in! |
I see! And it's not possible to have And to confirm — we're not doing "create SQL from arbitrary PRQL" atm, just replacing specific params — is that right? (Is it possible to do the arbitrary compilation? Otherwise we're limited to the PRQL that's been pre-written, is that correct?) Assuming we do have to do the replacement, I think the easiest way is to grab the PL with Re this query — we need to add it to the from movies
-select {title, release}
+select {title, release, year}
filter release == year |
Hello! I'm excited this idea is getting some traction, go @kaspermarstal ! I don't have too much to add but wanted to point out some thoughts based on what I've seen so far. A prepared statement is a good approach, but worth nothing that it can contain only one statement, and only DML operations (may not be an issue for prql). Also the prepared statement's plan caching ability is only good for the life of the session. Another issue is that prepared statements (to my knowledge) cannot be inlined into a larger query like, say, a function written in straight SQL. These issues might not be a stopper for a proof of concept. It may be worth looking into reusing as much of the sql language function implementation as possible. If prql compiles to sql, it may be possible to pipeline that into the existing sql function framework to do all the heavy lifting. This may net you some good benefits like the inlining support. I did a quick look over the sql language function implementation in postgres, it can be found here: https://github.com/postgres/postgres/blob/master/src/backend/executor/functions.c It's a bit above my head and I'm guess a lot of this code is some of the earliest written in Postgres, and the git history goes back to 1996. The core devs on the postgres mailing list may be able to point you to some good ideas on how to reuse some of this code. |
It might be possible to enforce only unnamed arguments and have the user write $n. At least to begin with. Will look into that. Otherwise I will look into the prql_to_pl approach you spelled out. Thanks!
Can elaborate what you mean with "arbitrarty PRQL" and "arbitrary compilation"? Also, what do you mean with PRQL that's been pre-written? A function is always pre-written prior to execution, right? @michelp you raise some great points. I noticed that pgrx recently merged "fn_call" which may be exactly what we need to call a sql function. And this we can create via SPI. |
This is fine for proof-of-concept but it feels wrong to me to select a column from the table that does not exist in the table if only to bring the name into scope. Are there other workarounds? Like, can we "declare" a variable like "year" on beforehand? |
Totally — the easiest approach is to move the from movies
+filter release == year
select {title, release}
-filter release == year |
I had meant — would it theoretically be possible to send PRQL to Postgres, have the extension translate to SQL, and then have Postgres execute the SQL? By "arbitrary" I had meant that we could send any PRQL, rather than using only being able to use PRQL within stored procedures. (tbc, I have basically no experience in postgres extensions / stored procs, so am asking these questions from a place of not knowing. And great to make some proof-of-concept and then we iterate on it in the future) |
Ah I see!
Not natively, at least to my knowledge. You would have rewrite the query before it hits the database like https://github.com/JPMoresmau/prqlproxy (linked above) does. Based on your inputs @max-sixty and @michelp I will do the following:
This lets us discuss design decisions based on something concrete, and perhaps we can make a roadmap from there. When pgrx make their next release, which hopefully includes their fn_call functionality, I would also like to explore your excellent suggestion of using SQL functions @michelp. |
Hi again, the prototype is now available at https://github.com/kaspermarstal/plprql. Let's treat this as an RFC. To move our discussion forward, I suggest you read the design doc and look at the tests to see how the extension is exercised. You can also look at the code of course, if you have the time. Then let me know what you think. There is one major blocker which is related to the pgrx API. This is described in detail in the design doc here. I should also mention that I have little experience with rust and I am doing this as a learning project so tips on small things, coding style, error handling, and whatnot is also very welcome. |
Hi, the last major blocker mentioned above is now resolved. While still alpha, the extension is now in a state where it can actually be used! |
Awesome @kaspermarstal ! Sorry if I'm being slow to understand the discussion above — but to what extent could an extension translate any PRQL to SQL without having to create a new function for each query? IIUC, at the moment using the extension relies on writing a (I don't know much about postgres functions, and while not reliable, here's a ChatGPT conversation claiming that it's possible) |
It's a great question, and I'll try to clarify how the extension works and what the limitations of PostgreSQL are. Yes, the extension requires you to define a specific function for each PRQL query you want to execute. Postgres only supports SQL natively. Support for other languages must use the "Procedural Language" mechanism in the same way that Python, Rust, Javascript etc etc are supported. These implementations also requires the user to create functions. (If someone out there on the big internet knows something I don't, please correct me) So while this extension does not make PRQL a first-class citizen like SQL, it supports PRQL in the same way as other officially supported languages and community-supported languages. I don't think there is a way around functions. Perhaps we could hook into lower layers at postgres and rewrite the incoming SQL there. And pgrx would have to expose what we need at these lower layers. But this is beyond my current knowledge of Postgres. Would have to investigate. |
I am really not confident here, so forgive me if this is quite far off the mark. I'm pushing because this seems quite close to being really cool... How about something like this? While Postgres functions seem to require being typed with the return type (i.e. the columns), it sounds like it's possible to return a cursor. Here's an example an LLM made (I doubt it compiles at first, but hopefully offers an idea) use pgx::*;
use std::ffi::CString;
pg_module_magic!();
// This function calls the external `compile_prql` function
extern "C" {
fn compile_prql(input: *const c_char) -> *const c_char;
}
#[pg_extern]
fn execute_prql(query: &str) -> String {
let query_cstr = CString::new(query).expect("Failed to create CString");
let compiled_sql = unsafe { compile_prql(query_cstr.as_ptr()) };
// Ensure the compiled SQL is valid
if compiled_sql.is_null() {
panic!("compile_prql returned a null pointer");
}
let sql = unsafe { CStr::from_ptr(compiled_sql).to_string_lossy().into_owned() };
Spi::connect(|client| {
let portal = client.open_anonymous_portal(&sql);
portal.name().to_string()
})
} ...and then would be called by SQLAlchemy like: from sqlalchemy import create_engine
# Replace with your database connection details
engine = create_engine('postgresql://username:password@localhost/dbname')
with engine.connect() as conn:
# Call the function and get the cursor name
result = conn.execute("SELECT execute_prql('your_custom_query')")
cursor_name = result.fetchone()[0]
# Fetch from the cursor
result = conn.execute(f"FETCH ALL IN \"{cursor_name}\"")
for row in result:
print(row) # Each row from your query This obv isn't all the way there — it's not possible to just do |
I see why a function taking a PRQL string as an argument here is convenient. But as you point out, the return type is the challenge. There's an interesting note on
Let me get back to you on this. |
You are actually full of good ideas Max, the cursor approach works. This function:
Can be used like this:
Just need to make the cursor name configurable (edit: it is now configurable). See the test for details. The record-based approach would be slightly more ergonomic but have not gotten it to work yet. Oh well. |
That's awesome! Nice work! To what extent is that shippable as a postgres extension? So someone could install the extension and then call it like your example? That would be really cool, we'd be excited to put that on the home page etc.
Yeah, it would save the extra line. But IIUC postgres doesn't seem to allow returning results themselves unless at least one of the function or the caller has an explicit type. |
Thanks! Yes, at this point it is fully shippable as a postgres extension and people can install it and call it like the example above. There's a guide on how to install from source in the readme. I'm exicited that your are excited, but do you agree that ideally I would work on documentation, CI, and binary distribution channels before official announcement? Do you have any comments on: |
Great! Yes agree re ensuring it's easy to install. TBH I don't have much context for what's standard among postgres extensions — those links look very reasonable though. From PRQL's end, there are a couple of places it would be good to add docs:
Those should be quite easy to add to. If it's confusing where they come from, searching for the current text in the repo should yield where they're at. (I would also vote to update the readme a bit so it references the new queries. and super-narrow point but the current design doc link isn't current :) ) Let us know anything we can do! |
Great, for now, it would be really good to smoke test the extension, like installing from source and playing around on test data, and see if we can catch the most obvious bugs. Any volunteers? I will get started on the documentation. README has been fixed :D |
Great! I'm happy to try things. I'd also encourage having automated tests for this if possible! |
Just opened a PR for docs. In the extension's own repo, I added CI for test suite and installing from source to system installation of PostgreSQL. |
https://github.com/tcdi/pgx allows making PostgreSQL extensions with Rust, it seems like a logical connection would be to use pgx to make prql a stored procedure language in the database.
Thoughts?
The text was updated successfully, but these errors were encountered: