You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Today DataFusion supports three aggregate functions that can be "order aware": ARRAY_AGG, FIRST_VALUE and LAST_VALUE. This means that you can supply a ORDER BY clause to their argument, for example FIRST_VALUE(x ORDER BY time).
Today, there be only one single order specified across ALL order aware aggregate functions
For example
❯ create table t(x int, y int) asvalues (1, 1), (1, 2), (1, 1), (1, 4), (2, 20), (2, 10);;
0 rows inset. Query took 0.003 seconds.
❯ select x, first_value(x ORDER BY y) from t GROUP BY x;
+---+------------------+
| x | FIRST_VALUE(t.x) |
+---+------------------+
| 2 | 2 |
| 1 | 1 |
+---+------------------+2 rows inset. Query took 0.004 seconds.
❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC) from t GROUP BY x;
+---+------------------+-----------------+
| x | FIRST_VALUE(t.x) | LAST_VALUE(t.x) |
+---+------------------+-----------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
+---+------------------+-----------------+2 rows inset. Query took 0.004 seconds.
❯ select x, first_value(x ORDER BY y), first_value(x ORDER BY y DESC NULLS LAST) from t GROUP BY x;
This feature is not implemented: Conflicting ordering requirements in aggregate functions is not supported
@mustafasrepo and @ozankabak -- I went over and added a third design option (to move the order awareness into the aggregators), which I would like to consider as well. I think it is likely to perform signfiicantly faster for many queries as well as keep the HashAggregateExec simpler (though it makes the aggregators themselves potentially more complicated)
Is your feature request related to a problem or challenge?
Today DataFusion supports three aggregate functions that can be "order aware":
ARRAY_AGG
,FIRST_VALUE
andLAST_VALUE
. This means that you can supply aORDER BY
clause to their argument, for exampleFIRST_VALUE(x ORDER BY time)
.Today, there be only one single order specified across ALL order aware aggregate functions
For example
Describe the solution you'd like
There are a few designs proposed here: #8558 (comment)
We are working on a more detailed proposal
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: