-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi-statement transactions #433
Conversation
703bfda
to
37fd2da
Compare
#include "postgres.h" | ||
#include "access/tableam.h" | ||
} | ||
#include "pgduckdb/pg/declarations.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because we cannot define the XactEvent
and SubXactEvent
twice, and tableam.h
was including xact.h
from Postgres. Which would then cause compilation errors if pgduckdb_table_am.hpp
was included in the same file as pg/transactions.hpp
e2955e1
to
3cf0422
Compare
@@ -1,6 +1,8 @@ | |||
CREATE TABLE foo AS SELECT 'bar'::text AS t; | |||
BEGIN; SET duckdb.force_execution = true; SELECT t::integer AS t1 FROM foo; ROLLBACK; | |||
ERROR: invalid input syntax for type integer: "bar" | |||
ERROR: (PGDuckDB/Duckdb_ExecCustomScan) Conversion Error: Could not convert string 'bar' to INT32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this was being executed in Postgres, now it's executed in duckdb again. We still don't crash though.
AS | ||
$$ | ||
BEGIN | ||
CALL duckdb.recycle_ddb(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use the CALL
syntax, so we can easily detect when recycle_ddb
is not run at the top level (like shown here).
@@ -125,8 +125,8 @@ def test_prepared_pipeline(conn: Connection): | |||
# the duckdb table that should fail because the insert into the heap | |||
# table opens an implicit transaction. | |||
with pytest.raises( | |||
psycopg.errors.ActiveSqlTransaction, | |||
match="DuckDB queries cannot be executed within a pipeline", | |||
psycopg.errors.InternalError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we would do some errorcode conversions between DuckDB and Postgres errors. That seems like a task for a separate PR though.
375a90e
to
8319881
Compare
0b49790
to
fcf7a01
Compare
Only supports multi-statement transactions for non-DDL queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tests show that it is doing what we expect. Added a few comments :-)
src/pgduckdb_hooks.cpp
Outdated
@@ -207,8 +203,21 @@ DuckdbPlannerHook_Cpp(Query *parse, const char *query_string, int cursor_options | |||
} | |||
/* If we can't create a plan, we'll fall back to Postgres */ | |||
} | |||
if (parse->commandType != CMD_SELECT && pgduckdb::DuckdbDidWrites() && pgduckdb::IsInTransactionBlock(true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading statements like this, I feel it might be useful to either nest PG APIs in a sub-namespace pg
or to include PG in their name like PGIsInTransactionBlock
.
Since we're bridging two databases, it is helpful to be extra-pedentic when dealing with their APIs.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was thinking the same. Done now, let me know what you think. We should probably update the other code in the pg
directory in a follow up PR for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, much clearer. And agreed to update the rest of the repo to follow the same philosophy.
*/ | ||
bool | ||
IsInTransactionBlock() { | ||
return IsInTransactionBlock(top_level_statement); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is safe to be called without a lock on top_level_statement
because this will only be called in PG hooks, not in DDB threads?
(same for duckdb_command_id
and potential elog(ERROR
below?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced a pgduckdb::pg
and pgduckdb::ddb
namespace now. I moved this to the pgduckdb::pg
namespace. I think we should write down assumptions about using functions from the pgduckdb::pg
namespace. The two main assumptions are imo:
- They should be called from a Postgres hook, or while holding the global PG lock from a duckdb thread.
- They do not throw elog errors, only exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that makes total sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
I merged this a bit quicker than I originally intended, mainly to fix the broken build on the main branch due to this PG change: https://www.postgresql.org/message-id/flat/CAGECzQQS1Oun40BssQDmuHc86teXqr%2BLOndy5NSVdFce9evZvQ%40mail.gmail.com#cf1398dd28d67be9fce7e75096343f07 |
Remove the lockdowns introduced in duckdb/pg_duckdb#433 since the same restriction doesn't apply to us. In our case, Postgres remains the source of truth for columnstore tables, and DuckDB serves purely as the execution engine for reading and writing to those tables.
This starts supporting multi-statement transactions. It imposes one important restriction that is necessary to ensure the expected ACID guarantees: You cannot write to both a Postgres table and a DuckDB table in the same transaction.
This also means that we do not allow doing DuckDB DDL inside a transaction, because that needs to write both to DuckDB and to the Postgres catalogs.
Either of these restrictions might be removed in a follow up PR. Especifically the DDL restriction is one that we would like to remove (and probably can). But for now this PR only implements the obviously correct case, and any further removal of restrictions will need some design/tradeoffs discussion.
Fixes #419