-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is_between in a filter for datetime gives incorrect results #16956
Labels
accepted
Ready for implementation
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
P-high
Priority: high
rust
Related to Rust Polars
Comments
daBlesr
added
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
rust
Related to Rust Polars
labels
Jun 14, 2024
daBlesr
changed the title
is_between for datetime is not working
is_between in a filter for datetime is not working
Jun 14, 2024
daBlesr
changed the title
is_between in a filter for datetime is not working
is_between in a filter for datetime gives incorrect results
Jun 14, 2024
It seems to be an optimizer issue. >>> out.collect()
# shape: (3, 3)
# ┌────────────────────────────────┬────────────────────────────────┬────────────┐
# │ start_datetime ┆ end_datetime_right ┆ is_between │
# │ --- ┆ --- ┆ --- │
# │ datetime[ms, Europe/Amsterdam] ┆ datetime[ms, Europe/Amsterdam] ┆ bool │
# ╞════════════════════════════════╪════════════════════════════════╪════════════╡
# │ 2024-06-11 08:00:00 CEST ┆ 2024-06-11 16:00:00 CEST ┆ true │
# │ 2024-06-11 08:00:00 CEST ┆ 2024-06-12 16:00:00 CEST ┆ true │
# │ 2024-06-11 08:00:00 CEST ┆ 2024-06-19 16:00:00 CEST ┆ false │
# └────────────────────────────────┴────────────────────────────────┴────────────┘ Turning off >>> out.collect(predicate_pushdown=False)
# shape: (2, 3)
# ┌────────────────────────────────┬────────────────────────────────┬────────────┐
# │ start_datetime ┆ end_datetime_right ┆ is_between │
# │ --- ┆ --- ┆ --- │
# │ datetime[ms, Europe/Amsterdam] ┆ datetime[ms, Europe/Amsterdam] ┆ bool │
# ╞════════════════════════════════╪════════════════════════════════╪════════════╡
# │ 2024-06-11 08:00:00 CEST ┆ 2024-06-11 16:00:00 CEST ┆ true │
# │ 2024-06-11 08:00:00 CEST ┆ 2024-06-12 16:00:00 CEST ┆ true │
# └────────────────────────────────┴────────────────────────────────┴────────────┘ Python repro: import polars as pl
lf = pl.LazyFrame(
[
[1718085600000, 1718172000000, 1718776800000],
[1718114400000, 1718200800000, 1718805600000]
],
schema=["start_datetime", "end_datetime"]
).cast(pl.Datetime("ms", "Europe/Amsterdam"))
out = (
lf.join(lf, how="cross")
.filter(
pl.col.end_datetime_right.is_between(
pl.col.start_datetime,
pl.col.start_datetime.dt.offset_by("132h")
)
)
.with_columns(
pl.col.end_datetime_right.is_between(
pl.col.start_datetime,
pl.col.start_datetime.dt.offset_by("132h")
)
.alias("is_between")
)
.filter(pl.col.start_datetime.dt.date().str.contains("2024-06-11"))
.select("start_datetime", "end_datetime_right", "is_between")
) The correct rows are there if the final This is probably one for @ritchie46 |
Slightly more minimal: import polars as pl
lf = pl.LazyFrame(
[
[1718085600000, 1718172000000, 1718776800000],
[1718114400000, 1718200800000, 1718805600000]
],
schema=["start_datetime", "end_datetime"]
).cast(pl.Datetime("ms", "Europe/Amsterdam"))
out = (
lf.join(lf, how="cross")
.filter(
pl.col.end_datetime_right.is_between(
pl.col.start_datetime,
pl.col.start_datetime.dt.offset_by("132h")
)
)
.select("start_datetime", "end_datetime_right")
).collect(predicate_pushdown=True) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
accepted
Ready for implementation
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
P-high
Priority: high
rust
Related to Rust Polars
Checks
Reproducible example
There is a problem where the
is_between
filter does not work well. In the below code you see that I have applied exactly sameis_between
expression in afilter
and in awith_column
statement. This dataframe returnsfalse
values for this column, which should not be possible. See example below.This returns:
Log output
No response
Issue description
filter
method does not work for some reason in this specific case. The is the smallest reproducible example I could do (the original query was bigger).Expected behavior
The last row in the example where
is_between
column reportsfalse
should not be present.Installed versions
The text was updated successfully, but these errors were encountered: