-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INSERT ... SELECT or multi-row VALUES errors when using default values #448
Conversation
the entry with the default value is ignored unless it is specified
src/vendor/pg_ruleutils_15.c
Outdated
@@ -6707,6 +6707,10 @@ get_insert_query_def(Query *query, deparse_context *context, | |||
if (tle->resjunk) | |||
continue; /* ignore junk entries */ | |||
|
|||
/* When multi-VALUES, ignore an entry with a default value unless it is specified */ | |||
if (values_rte != NULL && !(IsA(tle->expr, Var))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is not specific to VALUES
, an INSERT ... SELECT
has the same problem.
I think this is a better way of ignoring the filled in Const values.
if (values_rte != NULL && !(IsA(tle->expr, Var))) | |
if ((IsA(tle->expr, Const))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I didn't notice the INSERT... SELECT
also has problems.😒
Simply ignoring Const
here is not enough;
FuncExpr
also needs to be ignored when the column type is varchar and has a default value.
At present, this PR only deals with multi-VALUES. I would like to ask if I need to add a new PR to deal with INSERT... SELECT
, or put them together?
/* Values_rte is currently required. Is this OK*/
if (values_rte && (IsA(tle->expr, Const) || IsA(tle->expr, FuncExpr)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Const
and FuncExpr
are not enough. Column default can be arbitrary expression, e.g. a int DEFAULT 1 + 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I looked into the behavior a bit more and I think the original Var code was probably best. But it should include select_rte
.
if ((select_rte && values_rte) && !(IsA(tle->expr, Var)))
And we should have a few more tests for different types of defaults and using INSERT ... SELECT. Like:
INSERT INTO ta (b) SELECT generate_series(1, 3);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant
if ((select_rte || values_rte) && !(IsA(tle->expr, Var)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:
CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb;
INSERT INTO tb (c) SELECT 'ta'; -- case 1
INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2
INSERT INTO tb (a) SELECT (2)::numeric; -- case 3
I took a look at the querytree using the pprint
C function included in Postgres.
For case 1, the query tree also contains a Const
node. Luckily we can differentiate it from the Const
nodes created by DEFAULT clauses by taking a look at its location
field. If that's -1
, then it's from the DEFAULT clause, otherwise it's coming from the query.
For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an expression_tree_walker
to see if there are any Var
nodes in the expression. If there are, then it's not a DEFAULT expression.
I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in pgduckdb_ruleutils.cpp
that we can call from those vendored files. Just like we call pgduckdb_function_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?
I believe the invariant is that calling ToSQL()
on any querytree should yield correct SQL. However, rewriteTargetListIU
seems to break this invariant in Postgres since ToSQL()
produces incorrect SQL on its output querytree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I looked into this a bit more. I think there are at least three cases that we still need to handle:
CREATE TEMP TABLE tb (a int DEFAULT 3, b int, c varchar DEFAULT 'pg_duckdb') USING duckdb; INSERT INTO tb (c) SELECT 'ta'; -- case 1 INSERT INTO tb (c) SELECT 'ta' || 'tb'; -- case 2 INSERT INTO tb (a) SELECT (2)::numeric; -- case 3I took a look at the querytree using the
pprint
C function included in Postgres.For case 1, the query tree also contains a
Const
node. Luckily we can differentiate it from theConst
nodes created by DEFAULT clauses by taking a look at itslocation
field. If that's-1
, then it's from the DEFAULT clause, otherwise it's coming from the query.For case 2, the Var node gets wrapped in a RelabelType node. That node type can also occur in DEFAULT clauses, so we need to do a bit more. Similarly for case 3, where the Var is wrapped in a FuncExpr. To solve these, I think we want to walk the expression tree using an
expression_tree_walker
to see if there are anyVar
nodes in the expression. If there are, then it's not a DEFAULT expression.I think this logic starts becoming complex enough that we don't want to copy paste it into every vendored in ruleutils file, but instead want to create a small helper function in
pgduckdb_ruleutils.cpp
that we can call from those vendored files. Just like we callpgduckdb_function_name
.
Wow, I'm stuck with these two CONST's! The method you give is really a good idea. I will follow your method to try to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?
I believe the invariant is that calling
ToSQL()
on any querytree should yield correct SQL. However,rewriteTargetListIU
seems to break this invariant in Postgres sinceToSQL()
produces incorrect SQL on its output querytree
I have tried to run it on PG and there seems to be no problem. Can you give me the abnormal test SQL and let me run it.
postgres=# INSERT INTO tb (a) VALUES (123), (456);
INFO: INSERT INTO tb (a) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (b) VALUES (123), (456);
INFO: INSERT INTO tb (b) VALUES (123), (456)
INSERT 0 2
postgres=# INSERT INTO tb (c) VALUES ('ta'), ('tb');
INFO: INSERT INTO tb (c) VALUES ('ta'::character varying), ('tb'::character varying)
INSERT 0 2
postgres=# INSERT INTO tb (a) SELECT 789;
INFO: INSERT INTO tb (a) SELECT 789
INSERT 0 1
postgres=# INSERT INTO tb (b) SELECT 789;
INFO: INSERT INTO tb (b) SELECT 789
INSERT 0 1
postgres=# SELECT * FROM tb;
INFO: SELECT a,
b,
c
FROM tb
a | b | c
-----+-----+-----------
123 | | pg_duckdb
456 | | pg_duckdb
3 | 123 | pg_duckdb
3 | 456 | pg_duckdb
3 | | ta
3 | | tb
789 | | pg_duckdb
3 | 789 | pg_duckdb
(8 rows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we eventually plan to fix the bug upstream in Postgres rather than just in pg_duckdb?
I think once we have something that works it's worth trying to upstream it. But I think there's a good chance that it would not be accepted. In postgres pg_get_querydef
is only used on Query structures that have not been passed through the rewriter, so this problem isn't exposed there. But they might very well accept it, as there is an argument that it's useful for extensions.
and Var nodes to handle the more complex DEFAULT clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're almost there, but it's not handling everything yet.
If it's an INSERT ... SELECT or multi-VALUES INSERT, then
rewriteTargetListIU
changes the query to put theDEFAULT
expression into the targetlist of INSERT command. Sincepg_get_querydef
normally doesn't have to deal with rewritten queries, it generates a wrong query because of this. So we need to check if the expression is coming from theDEFAULT
clause or the actual query. There's a bunch of different cases that this code needs to handle which are all covered in the tests.Fixes #445