Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PREV/NEXT in PyDough #158

Open
knassre-bodo opened this issue Dec 18, 2024 · 0 comments · May be fixed by #273
Open

Support PREV/NEXT in PyDough #158

knassre-bodo opened this issue Dec 18, 2024 · 0 comments · May be fixed by #273
Labels
effort - high major issue that will require multiple steps or complex design enhancement New feature or request user feature Adding a new user-facing feature/functionality

Comments

@knassre-bodo
Copy link
Contributor

knassre-bodo commented Dec 18, 2024

Issue #157 is a pre-requisite for this issue since it starts the handling of some of the window aspects and has overlapping logic. The idea is that PREV accesses a singular collection corresponding to the previous record of the current collection, and can be used in a similar manner as BACK. By default it takes in no arguments, but can have the following:

  • Positional argument n: for PREV, this is how many records to go back (e.g. PREV(n).x is equivalent to LAG(x, n)). The default is 1. NEXT(n) is equivalent to PREV(-n).
  • by: see RANKING (order keys)
  • level: see RANKING (partition keys)

The implementation will have components in the unqualified node stages, qualification, QDAG, hybrid nodes, hybrid conversion, relational nodes, and relational conversion, but ideally much of what was done for RANKING can be re-used. For this issue, anything other than references to expressions from prev/next can be forbidden (e.g. calling HAS or accessing subcollections can be left for a followup issue, which should be opened if they are not handled in the original implementation).

Example usages:

# Finds the average time difference between all orders of parts meeting certain criteria
selected_lines = lineitems.where(CONTAINS(part.name, 'green') & CONTAINS(part.part_type, 'STEEL'))
time_differences = selected_lines.ORDER_BY(ship_date.ASC())(delta=DATEDIFF('day', ship_date, PREV(1).ship_date))
TPCH(avg_delta=AVG(time_differences.delta))

# Finds the average time difference between all orders made per-customer
time_differences = orders(delta=DATEDIFF('day', order_date, PREV(1, by=order_date.ASC(), levels=1).order_date))
customers(name, avg_order_gap=AVG(time_differences.delta))
@knassre-bodo knassre-bodo added enhancement New feature or request user feature Adding a new user-facing feature/functionality effort - high major issue that will require multiple steps or complex design labels Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort - high major issue that will require multiple steps or complex design enhancement New feature or request user feature Adding a new user-facing feature/functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant