Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Increase categorical test coverage #20514

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Dec 31, 2024

This PR adds a fixture (well--moved from a previous PR that only used it in a few specific tests) for use in categorical testing. When used via @pytest.mark.usefixtures("test_global_and_local"), it runs the test in both a local and global context. When the global context is used, the global cache is populated by a few strings so that the physical values do not match the local values.

There were a few tests that used @StringCache() but had no local testing, so I converted these to use the updated fixture, which tests both. I avoided touching tests that manually tested for local or global conditions.

In the process, I uncovered three tests that failed in a global context, and marked these with a # TODO: fails with global string cache and disabled the fixture for these tests. One of these tests (the third below) I enhanced slightly in order to trigger the error (which required testing without maintain_order=True). I've listed them below with repros so that we can open new issues for these.

test_compare_categorical_single

import polars as pl
pl.enable_string_cache()
pl.Series(["aaaa", "bbbb", "cccc"], dtype=pl.Categorical)  # pre-fill global cache

s = pl.Series([None, "a", "b", "c", "b", "a"], dtype=pl.Categorical)
s < "b"   # pyo3_runtime.PanicException: assertion failed: i < self.len()
s <= "b"  # pyo3_runtime.PanicException: assertion failed: i < self.len()
s > "b"   # pyo3_runtime.PanicException: assertion failed: i < self.len()
s >= "b"  # pyo3_runtime.PanicException: assertion failed: i < self.len()

test_fast_unique_flag_from_arrow

import polars as pl
pl.enable_string_cache()
pl.Series(["aaaa", "bbbb", "cccc"], dtype=pl.Categorical)  # pre-fill global cache

df = pl.DataFrame(
    {
        "colB": ["1", "2", "3", "4", "5", "5", "5", "5"],
    }
).with_columns([pl.col("colB").cast(pl.Categorical)])

filtered = df.to_arrow().filter([True, False, True, True, False, True, True, True])
pl.from_arrow(filtered).select(pl.col("colB").n_unique())
# pyo3_runtime.PanicException: assertion failed: TotalOrd::tot_le(v, self.range.end())

test_unique_categorical

This is open in issue #20484 #20528.

@github-actions github-actions bot added internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars labels Dec 31, 2024
@mcrumiller mcrumiller force-pushed the cat-fixture branch 3 times, most recently from 06c62f9 to ecd05cc Compare December 31, 2024 21:47
Copy link

codecov bot commented Dec 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.00%. Comparing base (1517599) to head (93e85be).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20514      +/-   ##
==========================================
+ Coverage   78.99%   79.00%   +0.01%     
==========================================
  Files        1564     1564              
  Lines      220618   220618              
  Branches     2502     2502              
==========================================
+ Hits       174281   174305      +24     
+ Misses      45763    45739      -24     
  Partials      574      574              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller
Copy link
Contributor Author

This also fixes the flaky doctests but I see there's already a PR for this: #20516.

@mcrumiller mcrumiller marked this pull request as ready for review December 31, 2024 23:29
@ritchie46
Copy link
Member

Can you rebase and see if all is fixed now?

@mcrumiller
Copy link
Contributor Author

@ritchie46 rebased. Still seeing errors in test_compare_categorical_single and test_unique_categorical when global cache is enabled. Example repros:

import polars as pl
pl.enable_string_cache()
pl.Series(["aaa", "bbb", "ccc"], dtype=pl.Categorical)  # pre-fill cache

# test_compare_categorical_single
pl.Series(["a"], dtype=pl.Categorical) < "a"
# pyo3_runtime.PanicException: assertion failed: i < self.len()

# test_unique_categorical
pl.Series(["a"], dtype=pl.Categorical).unique()
# pyo3_runtime.PanicException: assertion failed: TotalOrd::tot_le(v, self.range.end())

@ritchie46
Copy link
Member

Alright, fixing those.

@mcrumiller
Copy link
Contributor Author

@ritchie46 this one is now all up-to-date. I've removed all of the TODO comments and enabled the local/global fixture on all of the new tests too, and everything passes.

@ritchie46 ritchie46 merged commit 409f091 into pola-rs:main Jan 3, 2025
12 checks passed
@mcrumiller mcrumiller deleted the cat-fixture branch January 3, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants