Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of group transform without aggregation #1749

Closed
RalfNorthman opened this issue Feb 9, 2023 · 1 comment
Closed

Unexpected behaviour of group transform without aggregation #1749

RalfNorthman opened this issue Feb 9, 2023 · 1 comment

Comments

@RalfNorthman
Copy link
Contributor

Maybe related to #1748

The problem

Maybe my intuition regarding this is lacking but with this query:

from alb = albums
join t = tracks [t.album_id==alb.album_id]
join art = artists [art.artist_id==alb.artist_id]

# Songs shorter than 4 minutes
filter t.milliseconds < 4 * 60 * 1000

group [alb.album_id] (
    take 1
)

take 10

I expect the result to have just one row per album but that is not what happens as of version 0.5.0. The resulting sql have SELECT DISTINCT on all columns:

WITH table_1 AS (
  SELECT
    DISTINCT alb.*,
    t.*,
    art.*
  FROM
    albums AS alb
    JOIN tracks AS t ON t.album_id = alb.album_id
    JOIN artists AS art ON art.artist_id = alb.artist_id
  WHERE
    t.milliseconds < 240000
)
SELECT
  *,
  *,
  *
FROM
  table_1
LIMIT
  10

-- Generated by PRQL compiler version:0.5.0 (https://prql-lang.org)

In contrast

If I add a sort transform inside the group I get a reasonable result (but I don't think I should have to sort to get the behavior that I'm after):

from alb = albums
join t = tracks [t.album_id==alb.album_id]
join art = artists [art.artist_id==alb.artist_id]

# Songs shorter than 4 minutes
filter t.milliseconds < 4 * 60 * 1000

group [alb.album_id] (
    sort alb.album_id
    take 1
)

take 10

Similar query, but with aggregation also behaves as expected:

from alb = albums
join t = tracks [t.album_id==alb.album_id]
join art = artists [art.artist_id==alb.artist_id]

# Songs shorter than 4 minutes
filter t.milliseconds < 4 * 60 * 1000

group [alb.album_id] (
  aggregate [
    number_tracks = count
  ]
)

take 10
@aljazerzen
Copy link
Member

I believe this is duplicate of #944. Thanks for report anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants