Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple order aware aggregate functions in a query #8582

Open
Tracked by #8583
alamb opened this issue Dec 19, 2023 · 3 comments
Open
Tracked by #8583

Support multiple order aware aggregate functions in a query #8582

alamb opened this issue Dec 19, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 19, 2023

Is your feature request related to a problem or challenge?

Today DataFusion supports three aggregate functions that can be "order aware": ARRAY_AGG, FIRST_VALUE and LAST_VALUE. This means that you can supply a ORDER BY clause to their argument, for example FIRST_VALUE(x ORDER BY time).

Today, there be only one single order specified across ALL order aware aggregate functions

For example

❯ create table t(x int, y int) as values (1, 1), (1, 2), (1, 1), (1, 4), (2, 20), (2, 10);;
0 rows in set. Query took 0.003 seconds.

❯ select x, first_value(x ORDER BY y) from t GROUP BY x;
+---+------------------+
| x | FIRST_VALUE(t.x) |
+---+------------------+
| 2 | 2                |
| 1 | 1                |
+---+------------------+
2 rows in set. Query took 0.004 seconds.

❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC) from t GROUP BY x;
+---+------------------+-----------------+
| x | FIRST_VALUE(t.x) | LAST_VALUE(t.x) |
+---+------------------+-----------------+
| 1 | 1                | 1               |
| 2 | 2                | 2               |
+---+------------------+-----------------+
2 rows in set. Query took 0.004 seconds.

❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC NULLS LAST) from t GROUP BY x;
This feature is not implemented: Conflicting ordering requirements in aggregate functions is not supported

Describe the solution you'd like

There are a few designs proposed here: #8558 (comment)

We are working on a more detailed proposal

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Dec 20, 2023

Here is a design document @mustafasrepo and I are working on: https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4/edit

@alamb
Copy link
Contributor Author

alamb commented Dec 21, 2023

@mustafasrepo and @ozankabak -- I went over and added a third design option (to move the order awareness into the aggregators), which I would like to consider as well. I think it is likely to perform signfiicantly faster for many queries as well as keep the HashAggregateExec simpler (though it makes the aggregators themselves potentially more complicated)

Can you review https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4 and let me know what you think?

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2024

FYI @mustafasrepo and @ozankabak -- I filed #8777, about reusing CTEs, as it came up in other context, and also could potentially be used for solution 2 in the the design document https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4/edit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant