Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvq complains about ambiguous column name in LATERAL JOIN query. #30

Closed
derekmahar opened this issue May 21, 2020 · 6 comments
Closed

Comments

@derekmahar
Copy link
Contributor

derekmahar commented May 21, 2020

As I stated very briefly at the start of my LATERAL JOIN test, my test slightly modified the example queries in PostgreSQL’s LATERAL JOIN. Here, I repeat my test using the original and unaltered queries:

$ cat orders.csv | csvq "SELECT * FROM STDIN"
+----+---------+----------------------------+
| id | user_id |         created_at         |
+----+---------+----------------------------+
| 1  | 1       | 2017-06-20 04:35:03.582895 |
| 2  | 2       | 2017-06-20 04:35:07.564973 |
| 3  | 3       | 2017-06-20 04:35:10.986712 |
| 4  | 1       | 2017-06-20 04:58:10.137503 |
| 5  | 3       | 2017-06-20 04:58:17.905277 |
| 6  | 3       | 2017-06-20 04:58:25.289122 |
+----+---------+----------------------------+
-- lateral_query_original1.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  LEFT JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders
   WHERE user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
-- lateral_query_original2.sql
SELECT user_id, first_order_time, next_order_time, id FROM
  (SELECT user_id, min(created_at) AS first_order_time FROM orders GROUP BY user_id) o1
  INNER JOIN LATERAL
  (SELECT id, created_at AS next_order_time
   FROM orders
   WHERE user_id = o1.user_id AND created_at > o1.first_order_time
   ORDER BY created_at ASC LIMIT 1)
   o2 ON true;
$ cat orders.csv | csvq --source lateral_query_original1.sql
lateral_query_original1.sql [L:6 C:10] field user_id is ambiguous
$ cat orders.csv | csvq --source lateral_query_original2.sql
lateral_query_original2.sql [L:6 C:10] field user_id is ambiguous

In both cases, csvq complains that the non-scoped or unaliased reference to user_id in the second or lateral subquery in the FROM clause is ambiguous. Assuming, possibly incorrectly, that the PostgreSQL SQL interpreter follows the SQL standard and the csvq error is incorrect, in the presence of the same column reference in multiple adjacent subqueries in the same FROM clause, when resolving the non-scoped reference to user_id in the second subquery, csvq should choose the instance from within the immediate enclosing scope.

@mithrandie
Copy link
Owner

Yes, I’m aware of the difference in behavior between csvq and postgresql.

Regardless of what the specification allows, I think that table name modifiers should be used when specifying column names, and there are many minor differences between csvq and standard sql or other DBMS.
This is one of them.

@mithrandie
Copy link
Owner

Wait, I may now have found an easy way to implement it. If it's feasible, I'll do it later.

@derekmahar
Copy link
Contributor Author

derekmahar commented May 21, 2020

Regardless of what the specification allows, I think that table name modifiers should be used when specifying column names, and there are many minor differences between csvq and standard sql or other DBMS.

I agree with you that it is a good practise to specify table name modifiers explicitly in order to avoid column name ambiguity. I presume the SQL standard allows queries to omit table name modifiers on ambiguous column name references for the benefit of query concision.

mithrandie added a commit that referenced this issue May 22, 2020
Change reference scope in lateral subquery. ([Github #30](#30))
@mithrandie
Copy link
Owner

Fixed the scope in LATERAL subquery and released in version 1.13.1.

@derekmahar
Copy link
Contributor Author

csvq 1.13.1 didn't report any error when I repeated the test:

$ csvq --version
csvq version 1.13.1
$ csvq --source lateral_query_original1.sql
+---------+----------------------------+-----------------------------+------+
| user_id |      first_order_time      |       next_order_time       |  id  |
+---------+----------------------------+-----------------------------+------+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4    |
| 2       | 2017-06-20 04:35:07.564973 |            NULL             | NULL |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5    |
+---------+----------------------------+-----------------------------+------+
$ csvq --source lateral_query_original2.sql
+---------+----------------------------+-----------------------------+----+
| user_id |      first_order_time      |       next_order_time       | id |
+---------+----------------------------+-----------------------------+----+
| 1       | 2017-06-20 04:35:03.582895 | 2017-06-20 04:58:10.137503  | 4  |
| 3       | 2017-06-20 04:35:10.986712 | 2017-06-20 04:58:17.905277  | 5  |
+---------+----------------------------+-----------------------------+----+

@derekmahar
Copy link
Contributor Author

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants