Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT ... SELECT or multi-row VALUES errors when using default values #448

Merged
merged 12 commits into from
Dec 4, 2024

Conversation

Leo-XM-Zeng
Copy link
Contributor

@Leo-XM-Zeng Leo-XM-Zeng commented Nov 23, 2024

If it's an INSERT ... SELECT or multi-VALUES INSERT, then rewriteTargetListIU changes the query to put the DEFAULT expression into the targetlist of INSERT command. Since pg_get_querydef normally doesn't have to deal with rewritten queries, it generates a wrong query because of this. So we need to check if the expression is coming from the DEFAULT clause or the actual query. There's a bunch of different cases that this code needs to handle which are all covered in the tests.

Fixes #445

the entry with the default value is ignored
unless it is specified
@@ -6707,6 +6707,10 @@ get_insert_query_def(Query *query, deparse_context *context,
if (tle->resjunk)
continue; /* ignore junk entries */

/* When multi-VALUES, ignore an entry with a default value unless it is specified */
if (values_rte != NULL && !(IsA(tle->expr, Var)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is not specific to VALUES, an INSERT ... SELECT has the same problem.

I think this is a better way of ignoring the filled in Const values.

Suggested change
if (values_rte != NULL && !(IsA(tle->expr, Var)))
if ((IsA(tle->expr, Const)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't notice the INSERT... SELECT also has problems.😒
Simply ignoring Const here is not enough;
FuncExpr also needs to be ignored when the column type is varchar and has a default value.
At present, this PR only deals with multi-VALUES. I would like to ask if I need to add a new PR to deal with INSERT... SELECT, or put them together?

/* Values_rte is currently required. Is this OK*/
if (values_rte && (IsA(tle->expr, Const) || IsA(tle->expr, FuncExpr)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const and FuncExpr are not enough. Column default can be arbitrary expression, e.g. a int DEFAULT 1 + 2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I looked into the behavior a bit more and I think the original Var code was probably best. But it should include select_rte.

if ((select_rte && values_rte) && !(IsA(tle->expr, Var)))

And we should have a few more tests for different types of defaults and using INSERT ... SELECT. Like:

INSERT INTO ta (b) SELECT generate_series(1, 3);

Copy link
Contributor

@dpxcc dpxcc Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant

if ((select_rte || values_rte) && !(IsA(tle->expr, Var)))

Copy link
Collaborator

@JelteF JelteF Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (c) SELECT 'ta';  -- case 1
INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2
INSERT INTO tb (a) SELECT (2)::numeric; -- case 3

I took a look at the querytree using the pprint C function included in Postgres.

For case 1, the query tree also contains a Const node. Luckily we can differentiate it from the Const nodes created by DEFAULT clauses by taking a look at its location field. If that's -1, then it's from the DEFAULT clause, otherwise it's coming from the query.

For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an expression_tree_walker to see if there are any Var nodes in the expression. If there are, then it's not a DEFAULT expression.

I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in pgduckdb_ruleutils.cpp that we can call from those vendored files. Just like we call pgduckdb_function_name.

Copy link
Contributor

@dpxcc dpxcc Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I believe the invariant is that calling ToSQL() on any querytree should yield correct SQL. However, rewriteTargetListIU seems to break this invariant in Postgres since ToSQL() produces incorrect SQL on its output querytree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:

CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (c) SELECT 'ta';  -- case 1
INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2
INSERT INTO tb (a) SELECT (2)::numeric; -- case 3

I took a look at the querytree using the pprint C function included in Postgres.

For case 1, the query tree also contains a Const node. Luckily we can differentiate it from the Const nodes created by DEFAULT clauses by taking a look at its location field. If that's -1, then it's from the DEFAULT clause, otherwise it's coming from the query.

For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an expression_tree_walker to see if there are any Var nodes in the expression. If there are, then it's not a DEFAULT expression.

I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in pgduckdb_ruleutils.cpp that we can call from those vendored files. Just like we call pgduckdb_function_name.

Wow, I'm stuck with these two CONST's! The method you give is really a good idea. I will follow your method to try to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I believe the invariant is that calling ToSQL() on any querytree should yield correct SQL. However, rewriteTargetListIU seems to break this invariant in Postgres since ToSQL() produces incorrect SQL on its output querytree

I have tried to run it on PG and there seems to be no problem. Can you give me the abnormal test SQL and let me run it.

postgres=# INSERT INTO tb (a) VALUES (123), (456);
INFO:   INSERT INTO tb (a) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (b) VALUES (123), (456);
INFO:   INSERT INTO tb (b) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (c) VALUES ('ta'), ('tb');
INFO:   INSERT INTO tb (c) VALUES ('ta'::character varying), ('tb'::character varying)
INSERT 0 2
postgres=# INSERT INTO tb (a) SELECT 789;
INFO:   INSERT INTO tb (a)  SELECT 789
INSERT 0 1
postgres=# INSERT INTO tb (b) SELECT 789;
INFO:   INSERT INTO tb (b)  SELECT 789
INSERT 0 1
postgres=# SELECT * FROM tb;
INFO:   SELECT a,
    b,
    c
   FROM tb
  a  |  b  |     c     
-----+-----+-----------
 123 |     | pg_duckdb
 456 |     | pg_duckdb
   3 | 123 | pg_duckdb
   3 | 456 | pg_duckdb
   3 |     | ta
   3 |     | tb
 789 |     | pg_duckdb
   3 | 789 | pg_duckdb
(8 rows)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?

I think once we have something that works it's worth trying to upstream it. But I think there's a good chance that it would not be accepted. In postgres pg_get_querydef is only used on Query structures that have not been passed through the rewriter, so this problem isn't exposed there. But they might very well accept it, as there is an argument that it's useful for extensions.

@Leo-XM-Zeng Leo-XM-Zeng requested a review from JelteF November 23, 2024 14:20
@Leo-XM-Zeng Leo-XM-Zeng marked this pull request as draft November 27, 2024 16:26
@Leo-XM-Zeng Leo-XM-Zeng changed the title INSERT multi-VALUES errors when using default values INSERT ... SELECT or multi-row VALUES errors when using default values Dec 3, 2024
and Var nodes to handle the more complex DEFAULT clause.
@Leo-XM-Zeng Leo-XM-Zeng requested a review from JelteF December 3, 2024 12:47
@JelteF JelteF marked this pull request as ready for review December 3, 2024 15:25
Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're almost there, but it's not handling everything yet.

src/pgduckdb_ruleutils.cpp Outdated Show resolved Hide resolved
src/pgduckdb_ruleutils.cpp Outdated Show resolved Hide resolved
@JelteF JelteF merged commit 1422606 into duckdb:main Dec 4, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INSERT errors when using default values
3 participants