Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT ... SELECT or multi-row VALUES errors when using default values #448

Merged
merged 12 commits into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/vendor/pg_ruleutils_14.c
Original file line number Diff line number Diff line change
Expand Up @@ -6626,6 +6626,10 @@ get_insert_query_def(Query *query, deparse_context *context,
if (tle->resjunk)
continue; /* ignore junk entries */

/* When multi-VALUES, ignore an entry with a default value unless it is specified */
if (values_rte != NULL && !(IsA(tle->expr, Var)))
continue;

appendStringInfoString(buf, sep);
sep = ", ";

Expand Down
4 changes: 4 additions & 0 deletions src/vendor/pg_ruleutils_15.c
Original file line number Diff line number Diff line change
Expand Up @@ -6707,6 +6707,10 @@ get_insert_query_def(Query *query, deparse_context *context,
if (tle->resjunk)
continue; /* ignore junk entries */

/* When multi-VALUES, ignore an entry with a default value unless it is specified */
if (values_rte != NULL && !(IsA(tle->expr, Var)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is not specific to VALUES, an INSERT ... SELECT has the same problem.

I think this is a better way of ignoring the filled in Const values.

Suggested change
if (values_rte != NULL && !(IsA(tle->expr, Var)))
if ((IsA(tle->expr, Const)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't notice the INSERT... SELECT also has problems.😒
Simply ignoring Const here is not enough;
FuncExpr also needs to be ignored when the column type is varchar and has a default value.
At present, this PR only deals with multi-VALUES. I would like to ask if I need to add a new PR to deal with INSERT... SELECT, or put them together?

/* Values_rte is currently required. Is this OK*/
if (values_rte && (IsA(tle->expr, Const) || IsA(tle->expr, FuncExpr)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const and FuncExpr are not enough. Column default can be arbitrary expression, e.g. a int DEFAULT 1 + 2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I looked into the behavior a bit more and I think the original Var code was probably best. But it should include select_rte.

if ((select_rte && values_rte) && !(IsA(tle->expr, Var)))

And we should have a few more tests for different types of defaults and using INSERT ... SELECT. Like:

INSERT INTO ta (b) SELECT generate_series(1, 3);

Copy link
Contributor

@dpxcc dpxcc Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant

if ((select_rte || values_rte) && !(IsA(tle->expr, Var)))

Copy link
Collaborator

@JelteF JelteF Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (c) SELECT 'ta';  -- case 1
INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2
INSERT INTO tb (a) SELECT (2)::numeric; -- case 3

I took a look at the querytree using the pprint C function included in Postgres.

For case 1, the query tree also contains a Const node. Luckily we can differentiate it from the Const nodes created by DEFAULT clauses by taking a look at its location field. If that's -1, then it's from the DEFAULT clause, otherwise it's coming from the query.

For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an expression_tree_walker to see if there are any Var nodes in the expression. If there are, then it's not a DEFAULT expression.

I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in pgduckdb_ruleutils.cpp that we can call from those vendored files. Just like we call pgduckdb_function_name.

Copy link
Contributor

@dpxcc dpxcc Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I believe the invariant is that calling ToSQL() on any querytree should yield correct SQL. However, rewriteTargetListIU seems to break this invariant in Postgres since ToSQL() produces incorrect SQL on its output querytree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (c) SELECT 'ta';  -- case 1
INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2
INSERT INTO tb (a) SELECT (2)::numeric; -- case 3

I took a look at the querytree using the pprint C function included in Postgres.

For case 1, the query tree also contains a Const node. Luckily we can differentiate it from the Const nodes created by DEFAULT clauses by taking a look at its location field. If that's -1, then it's from the DEFAULT clause, otherwise it's coming from the query.

For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an expression_tree_walker to see if there are any Var nodes in the expression. If there are, then it's not a DEFAULT expression.

I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in pgduckdb_ruleutils.cpp that we can call from those vendored files. Just like we call pgduckdb_function_name.

Wow, I'm stuck with these two CONST's! The method you give is really a good idea. I will follow your method to try to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I believe the invariant is that calling ToSQL() on any querytree should yield correct SQL. However, rewriteTargetListIU seems to break this invariant in Postgres since ToSQL() produces incorrect SQL on its output querytree

I have tried to run it on PG and there seems to be no problem. Can you give me the abnormal test SQL and let me run it.

postgres=# INSERT INTO tb (a) VALUES (123), (456);
INFO:   INSERT INTO tb (a) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (b) VALUES (123), (456);
INFO:   INSERT INTO tb (b) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (c) VALUES ('ta'), ('tb');
INFO:   INSERT INTO tb (c) VALUES ('ta'::character varying), ('tb'::character varying)
INSERT 0 2
postgres=# INSERT INTO tb (a) SELECT 789;
INFO:   INSERT INTO tb (a)  SELECT 789
INSERT 0 1
postgres=# INSERT INTO tb (b) SELECT 789;
INFO:   INSERT INTO tb (b)  SELECT 789
INSERT 0 1
postgres=# SELECT * FROM tb;
INFO:   SELECT a,
    b,
    c
   FROM tb
  a  |  b  |     c     
-----+-----+-----------
 123 |     | pg_duckdb
 456 |     | pg_duckdb
   3 | 123 | pg_duckdb
   3 | 456 | pg_duckdb
   3 |     | ta
   3 |     | tb
 789 |     | pg_duckdb
   3 | 789 | pg_duckdb
(8 rows)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I think once we have something that works it's worth trying to upstream it. But I think there's a good chance that it would not be accepted. In postgres pg_get_querydef is only used on Query structures that have not been passed through the rewriter, so this problem isn't exposed there. But they might very well accept it, as there is an argument that it's useful for extensions.

continue;

appendStringInfoString(buf, sep);
sep = ", ";

Expand Down
4 changes: 4 additions & 0 deletions src/vendor/pg_ruleutils_16.c
Original file line number Diff line number Diff line change
Expand Up @@ -6667,6 +6667,10 @@ get_insert_query_def(Query *query, deparse_context *context,
if (tle->resjunk)
continue; /* ignore junk entries */

/* When multi-VALUES, ignore an entry with a default value unless it is specified */
if (values_rte != NULL && !(IsA(tle->expr, Var)))
continue;

appendStringInfoString(buf, sep);
sep = ", ";

Expand Down
4 changes: 4 additions & 0 deletions src/vendor/pg_ruleutils_17.c
Original file line number Diff line number Diff line change
Expand Up @@ -6681,6 +6681,10 @@ get_insert_query_def(Query *query, deparse_context *context,
if (tle->resjunk)
continue; /* ignore junk entries */

/* When multi-VALUES, ignore an entry with a default value unless it is specified */
if (values_rte != NULL && !(IsA(tle->expr, Var)))
continue;

appendStringInfoString(buf, sep);
sep = ", ";

Expand Down
29 changes: 28 additions & 1 deletion test/regression/expected/temporary_tables.out
Original file line number Diff line number Diff line change
Expand Up @@ -246,4 +246,31 @@ pg_temp main CREATE TABLE webpages(column00 INTEGER, column01 VARCHAR, column02

(1 row)

DROP TABLE webpages, t, t_heap, t_heap2;
CREATE TEMP TABLE ta (a int DEFAULT 3, b int) USING duckdb;
INSERT INTO ta (b) VALUES (123), (456);
INSERT INTO ta (a, b) VALUES (123, 456), (456, 123);
SELECT * FROM ta;
a | b
-----+-----
3 | 123
3 | 456
123 | 456
456 | 123
(4 rows)

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (a) VALUES (123), (456);
INSERT INTO tb (b) VALUES (123), (456);
INSERT INTO tb (c) VALUES ('ta'), ('tb');
SELECT * FROM tb;
a | b | c
-----+-----+-----------
123 | | pg_duckdb
456 | | pg_duckdb
3 | 123 | pg_duckdb
3 | 456 | pg_duckdb
3 | | ta
3 | | tb
(6 rows)

DROP TABLE webpages, t, t_heap, t_heap2, ta, tb;
13 changes: 12 additions & 1 deletion test/regression/sql/temporary_tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,15 @@ SELECT * FROM t_heap2;

SELECT duckdb.raw_query($$ SELECT database_name, schema_name, sql FROM duckdb_tables() $$);

DROP TABLE webpages, t, t_heap, t_heap2;
CREATE TEMP TABLE ta (a int DEFAULT 3, b int) USING duckdb;
INSERT INTO ta (b) VALUES (123), (456);
INSERT INTO ta (a, b) VALUES (123, 456), (456, 123);
SELECT * FROM ta;

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
Leo-XM-Zeng marked this conversation as resolved.
Show resolved Hide resolved
INSERT INTO tb (a) VALUES (123), (456);
INSERT INTO tb (b) VALUES (123), (456);
INSERT INTO tb (c) VALUES ('ta'), ('tb');
SELECT * FROM tb;

DROP TABLE webpages, t, t_heap, t_heap2, ta, tb;