Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Categorical min/max returning String dtype rather than Categorical #21232

Merged
merged 4 commits into from
Feb 15, 2025

Conversation

lukemanley
Copy link
Contributor

Min/max aggregations on categorical dtypes currently return string dtype. This PR changes the output to maintain the categorical dtype as is done for enums. This also aligns the output with the expected schema which was misaligned.

import polars as pl

lf = pl.LazyFrame(
    {
        "a": ["foo", "bar"],
        "b": ["foo", "bar"],
        "c": ["foo", "bar"],
    },
    schema={
        "a": pl.Categorical("physical"),
        "b": pl.Categorical("lexical"),
        "c": pl.Enum(["foo", "bar"]),
    },
)

q = lf.select(pl.all().max())

# expected schema is already correct:
print(q.collect_schema())
# Schema({'a': Categorical(ordering='physical'), 'b': Categorical(ordering='lexical'), 'c': Enum(categories=['foo', 'bar'])})

print(q.collect())

# BEFORE:

# ┌─────┬─────┬──────┐
# │ a   ┆ b   ┆ c    │
# │ --- ┆ --- ┆ ---  │
# │ str ┆ str ┆ enum │
# ╞═════╪═════╪══════╡
# │ foo ┆ bar ┆ foo  │
# └─────┴─────┴──────┘

# AFTER:

# ┌─────┬─────┬──────┐
# │ a   ┆ b   ┆ c    │
# │ --- ┆ --- ┆ ---  │
# │ cat ┆ cat ┆ enum │
# ╞═════╪═════╪══════╡
# │ foo ┆ bar ┆ foo  │
# └─────┴─────┴──────┘

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Feb 13, 2025
@lukemanley lukemanley changed the title fix: Categorical min/max returning string dtype rather than Categorical fix: Categorical min/max returning String dtype rather than Categorical Feb 13, 2025
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.89%. Comparing base (947dc07) to head (1e288f9).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...polars-core/src/chunked_array/ops/aggregate/mod.rs 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21232      +/-   ##
==========================================
+ Coverage   79.82%   79.89%   +0.07%     
==========================================
  Files        1596     1596              
  Lines      228517   228552      +35     
  Branches     2608     2608              
==========================================
+ Hits       182413   182604     +191     
+ Misses      45508    45352     -156     
  Partials      596      596              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the failures shuold be fixed on main. Can you rebase?

q = lf.select(pl.all().min())
result = q.collect()
assert q.collect_schema() == lf.collect_schema()
assert result.schema == lf.collect_schema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check that the schema contains categoricals and enums?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more explicit check

@ritchie46
Copy link
Member

Thanks!

@ritchie46 ritchie46 merged commit 4eb4fb7 into pola-rs:main Feb 15, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants