Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler outputs SELECT DISTINCT even when not grouping by all columns #944

Closed
vishnumenon opened this issue Aug 31, 2022 · 1 comment
Closed
Labels
bug Invalid compiler output or panic compiler

Comments

@vishnumenon
Copy link

As described in the PRQL Language Book, expected behavior is for

from employees
select department
group department (
  take 1
)

to compile to

SELECT
  DISTINCT department
FROM
  employees

This functions as expected. However,

from employees
group department (
  take 1
)

currently also produces output that uses SELECT DISTINCT, i.e.

SELECT
  DISTINCT employees.*
FROM
  employees

However, the expected output is something like:

WITH table_0 AS (
  SELECT my_table.*, ROW_NUMBER() OVER (PARTITION BY x) AS _rn_81 FROM my_table)
SELECT table_0.* FROM table_0 WHERE _rn_81 <= 1

More generally, pipelines that include group x (take 1) seem to produce output with SELECT DISTINCT even when x is not the only selected column, which is incorrect behavior.

The source of the issue was identified by @aljazerzen as being located here: https://github.com/prql/prql/blob/b754c0a65bb8ab619a9001d00b9b451dbaa3d02d/prql-compiler/src/sql/distinct.rs#L36

@max-sixty
Copy link
Member

Closed by #2109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic compiler
Projects
None yet
Development

No branches or pull requests

2 participants