Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support use of is_between range predicate with IEJoin operations (join_where) #19547

Merged
merged 5 commits into from
Oct 31, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Oct 31, 2024

Ref: #18365

One of the most useful applications of join_where is range joins, and the is_between expression is a natural way to express them. This PR allows for use of such in the list of join predicates, streamlining these queries.

Also improved several related error messages, providing more detail if a given predicate is not suitable for use with join_where (such as actually including that predicate in the error).

Example

Setup:

import polars as pl

df1 = pl.DataFrame({
    "id": ["aa", "bb", "cc"],
    "start": [date(2020,1,1), date(2022,10,10), date(2024,7,5)],
    "end": [date(2022,12,31), date(2024,10,1), date(2024,12,31)],
})
df2 = pl.DataFrame({
    "id": ["aa", "cc", "bb"],
    "dt": [date(2022,12,31), date(2024,2,21), date(2024,8,8)],
    "price": [100, 200, 300],
})

The following two formulations are now equivalent:

df1.join_where(
    df2,
    pl.col("id") == pl.col("id_right"),
    pl.col("dt") >= pl.col("start"),
    pl.col("dt") <= pl.col("end")
)
df1.join_where(
    df2,
    pl.col("id") == pl.col("id_right"),
    pl.col("dt").is_between("start","end")
)
shape: (2, 5)
┌─────┬────────────┬────────────┬────────────┬───────┐
│ id  ┆ start      ┆ end        ┆ dt         ┆ price │
│ --- ┆ ---        ┆ ---        ┆ ---        ┆ ---   │
│ str ┆ date       ┆ date       ┆ date       ┆ i64   │
╞═════╪════════════╪════════════╪════════════╪═══════╡
│ aa  ┆ 2020-01-01 ┆ 2022-12-31 ┆ 2022-12-31 ┆ 100   │
│ bb  ┆ 2022-10-10 ┆ 2024-10-01 ┆ 2024-08-08 ┆ 300   │
└─────┴────────────┴────────────┴────────────┴───────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 31, 2024
@alexander-beedie alexander-beedie force-pushed the improve-range-join-syntax branch from da5cfee to 39bf10b Compare October 31, 2024 10:38
@ritchie46 ritchie46 merged commit 519ccb3 into pola-rs:main Oct 31, 2024
24 of 25 checks passed
@alexander-beedie alexander-beedie deleted the improve-range-join-syntax branch October 31, 2024 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants