Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add a dedicated remove method for DataFrame and LazyFrame #21259

Merged
merged 3 commits into from
Feb 18, 2025

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Feb 14, 2025

We currently have frame-level filter, but after adding "DELETE" support to the SQL engine it felt like we should really offer a dedicated remove method too:

  • Aside from improved semantics (eg: it's much clearer to say "remove rows that match this" rather than using filter to say "don't remove rows that don't match this") there is also a slight subtlety as remove(predicate) is not the same thing as filter(~predicate). This is because of null value comparison.

    For filter and remove to share common/consistent semantics (we require the predicate to evaluate as True in order to be acted on) the correct conversion is actually filter(predicate.ne_missing(true)) - this is not necessarily immediately obvious; handling it correctly in a clearly-named method is good UX.

  • Common code was factored out, and no new query/graph nodes are required (remove guarantees that the correct predicate conversion is done, re-using filter internally).

  • Plenty of new tests added, various docstrings tidied-up / clarified, and SQL "DELETE" support was repointed to the new remove method.

Example

import polars as pl

df = pl.DataFrame({
    "lbl": ["xx", "zz", "yy", "xx"],
    "ccy": ["USD", "EUR", "USD", "JPY"],
    "year": [2021, 2022, 2023, 2023],
    "total": [3245, None, -6680, 25000],
})

Note where total > 0 evaluates as True...

df.with_columns(pl.col("total") > 0)
# shape: (4, 4)
# ┌─────┬─────┬──────┬───────┐
# │ lbl ┆ ccy ┆ year ┆ total │
# │ --- ┆ --- ┆ ---  ┆ ---   │
# │ str ┆ str ┆ i64  ┆ bool  │
# ╞═════╪═════╪══════╪═══════╡
# │ xx  ┆ USD ┆ 2021 ┆ true  │
# │ zz  ┆ EUR ┆ 2022 ┆ null  │
# │ yy  ┆ USD ┆ 2023 ┆ false │
# │ xx  ┆ JPY ┆ 2023 ┆ true  │
# └─────┴─────┴──────┴───────┘

...and those are the rows that remove discards:

df.remove(pl.col("total") > 0)
# shape: (2, 4)
# ┌─────┬─────┬──────┬───────┐
# │ lbl ┆ ccy ┆ year ┆ total │
# │ --- ┆ --- ┆ ---  ┆ ---   │
# │ str ┆ str ┆ i64  ┆ i64   │
# ╞═════╪═════╪══════╪═══════╡
# │ zz  ┆ EUR ┆ 2022 ┆ null  │
# │ yy  ┆ USD ┆ 2023 ┆ -6680 │
# └─────┴─────┴──────┴───────┘

If we had naïvely inverted the predicate using filter, we would also have incorrectly dropped the row with a null total:

df.filter(~(pl.col("total") > 0))
# shape: (1, 4)
# ┌─────┬─────┬──────┬───────┐
# │ lbl ┆ ccy ┆ year ┆ total │
# │ --- ┆ --- ┆ ---  ┆ ---   │
# │ str ┆ str ┆ i64  ┆ i64   │
# ╞═════╪═════╪══════╪═══════╡
# │ yy  ┆ USD ┆ 2023 ┆ -6680 │
# └─────┴─────┴──────┴───────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 14, 2025
@alexander-beedie alexander-beedie changed the title feat(python): Add a dedicated remove method for DataFrame and LazyFrame feat: Add a dedicated remove method for DataFrame and LazyFrame Feb 14, 2025
@alexander-beedie alexander-beedie added the rust Related to Rust Polars label Feb 14, 2025
Copy link

codecov bot commented Feb 14, 2025

Codecov Report

Attention: Patch coverage is 83.01887% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.87%. Comparing base (515c1b8) to head (31e9197).
Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/lazyframe/frame.py 78.04% 5 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21259      +/-   ##
==========================================
+ Coverage   79.80%   79.87%   +0.07%     
==========================================
  Files        1596     1596              
  Lines      228468   228493      +25     
  Branches     2607     2614       +7     
==========================================
+ Hits       182318   182507     +189     
+ Misses      45554    45388     -166     
- Partials      596      598       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

ritchie46 commented Feb 18, 2025

Agree, that it a nice to QoL improvement. 👍 Nice one.

@ritchie46 ritchie46 merged commit bb8efc5 into pola-rs:main Feb 18, 2025
39 checks passed
@alexander-beedie alexander-beedie deleted the frame-remove-method branch February 18, 2025 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants