Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-statement transactions #433

Merged
merged 3 commits into from
Nov 27, 2024
Merged

Support multi-statement transactions #433

merged 3 commits into from
Nov 27, 2024

Conversation

JelteF
Copy link
Collaborator

@JelteF JelteF commented Nov 14, 2024

This starts supporting multi-statement transactions. It imposes one important restriction that is necessary to ensure the expected ACID guarantees: You cannot write to both a Postgres table and a DuckDB table in the same transaction.

This also means that we do not allow doing DuckDB DDL inside a transaction, because that needs to write both to DuckDB and to the Postgres catalogs.

Either of these restrictions might be removed in a follow up PR. Especifically the DDL restriction is one that we would like to remove (and probably can). But for now this PR only implements the obviously correct case, and any further removal of restrictions will need some design/tradeoffs discussion.

Fixes #419

@JelteF JelteF force-pushed the transactions branch 7 times, most recently from 703bfda to 37fd2da Compare November 19, 2024 13:24
#include "postgres.h"
#include "access/tableam.h"
}
#include "pgduckdb/pg/declarations.hpp"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because we cannot define the XactEvent and SubXactEvent twice, and tableam.h was including xact.h from Postgres. Which would then cause compilation errors if pgduckdb_table_am.hpp was included in the same file as pg/transactions.hpp

@JelteF JelteF force-pushed the transactions branch 3 times, most recently from e2955e1 to 3cf0422 Compare November 20, 2024 16:02
@@ -1,6 +1,8 @@
CREATE TABLE foo AS SELECT 'bar'::text AS t;
BEGIN; SET duckdb.force_execution = true; SELECT t::integer AS t1 FROM foo; ROLLBACK;
ERROR: invalid input syntax for type integer: "bar"
ERROR: (PGDuckDB/Duckdb_ExecCustomScan) Conversion Error: Could not convert string 'bar' to INT32
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this was being executed in Postgres, now it's executed in duckdb again. We still don't crash though.

AS
$$
BEGIN
CALL duckdb.recycle_ddb();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use the CALL syntax, so we can easily detect when recycle_ddb is not run at the top level (like shown here).

@@ -125,8 +125,8 @@ def test_prepared_pipeline(conn: Connection):
# the duckdb table that should fail because the insert into the heap
# table opens an implicit transaction.
with pytest.raises(
psycopg.errors.ActiveSqlTransaction,
match="DuckDB queries cannot be executed within a pipeline",
psycopg.errors.InternalError,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we would do some errorcode conversions between DuckDB and Postgres errors. That seems like a task for a separate PR though.

@JelteF JelteF force-pushed the transactions branch 3 times, most recently from 375a90e to 8319881 Compare November 20, 2024 16:53
src/pgduckdb_duckdb.cpp Outdated Show resolved Hide resolved
@JelteF JelteF force-pushed the transactions branch 8 times, most recently from 0b49790 to fcf7a01 Compare November 26, 2024 15:49
Only supports multi-statement transactions for non-DDL queries.
@JelteF JelteF marked this pull request as ready for review November 26, 2024 16:09
@JelteF JelteF requested review from Y-- and mkaruza November 26, 2024 16:09
Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tests show that it is doing what we expect. Added a few comments :-)

src/pgduckdb_ddl.cpp Outdated Show resolved Hide resolved
src/pgduckdb_duckdb.cpp Outdated Show resolved Hide resolved
src/pgduckdb_duckdb.cpp Outdated Show resolved Hide resolved
src/pgduckdb_duckdb.cpp Outdated Show resolved Hide resolved
@@ -207,8 +203,21 @@ DuckdbPlannerHook_Cpp(Query *parse, const char *query_string, int cursor_options
}
/* If we can't create a plan, we'll fall back to Postgres */
}
if (parse->commandType != CMD_SELECT && pgduckdb::DuckdbDidWrites() && pgduckdb::IsInTransactionBlock(true)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading statements like this, I feel it might be useful to either nest PG APIs in a sub-namespace pg or to include PG in their name like PGIsInTransactionBlock.
Since we're bridging two databases, it is helpful to be extra-pedentic when dealing with their APIs.

What do you think?

Copy link
Collaborator Author

@JelteF JelteF Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was thinking the same. Done now, let me know what you think. We should probably update the other code in the pg directory in a follow up PR for consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, much clearer. And agreed to update the rest of the repo to follow the same philosophy.

src/pgduckdb_planner.cpp Outdated Show resolved Hide resolved
*/
bool
IsInTransactionBlock() {
return IsInTransactionBlock(top_level_statement);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is safe to be called without a lock on top_level_statement because this will only be called in PG hooks, not in DDB threads?
(same for duckdb_command_id and potential elog(ERROR below?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a pgduckdb::pg and pgduckdb::ddb namespace now. I moved this to the pgduckdb::pg namespace. I think we should write down assumptions about using functions from the pgduckdb::pg namespace. The two main assumptions are imo:

  1. They should be called from a Postgres hook, or while holding the global PG lock from a duckdb thread.
  2. They do not throw elog errors, only exceptions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that makes total sense.

Copy link
Collaborator

@Y-- Y-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@JelteF JelteF enabled auto-merge (squash) November 27, 2024 15:38
@JelteF JelteF merged commit 3f0eab7 into main Nov 27, 2024
5 checks passed
@JelteF JelteF deleted the transactions branch November 27, 2024 15:39
@JelteF
Copy link
Collaborator Author

JelteF commented Nov 27, 2024

I merged this a bit quicker than I originally intended, mainly to fix the broken build on the main branch due to this PG change: https://www.postgresql.org/message-id/flat/CAGECzQQS1Oun40BssQDmuHc86teXqr%2BLOndy5NSVdFce9evZvQ%40mail.gmail.com#cf1398dd28d67be9fce7e75096343f07

dpxcc added a commit to Mooncake-Labs/pg_mooncake that referenced this pull request Dec 17, 2024
Remove the lockdowns introduced in duckdb/pg_duckdb#433 since the same restriction doesn't apply to us. In our case, Postgres remains the source of truth for columnstore tables, and DuckDB serves purely as the execution engine for reading and writing to those tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CustomScan does not handle backward scan correctly
2 participants