Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_series_equal and assert_frame_equal are inconsistent #18389

Open
2 tasks done
jcmuel opened this issue Aug 26, 2024 · 1 comment
Open
2 tasks done

assert_series_equal and assert_frame_equal are inconsistent #18389

jcmuel opened this issue Aug 26, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jcmuel
Copy link

jcmuel commented Aug 26, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from polars.testing import assert_series_not_equal, assert_frame_not_equal


def test_assert_series_equal_is_consistent() -> None:
    sr1 = pl.Series([{'a': True}, {'a': None}, None])
    sr2 = pl.Series([{'a': True}, {'a': None}, {'a': None}])
    assert sr1.dtype == sr2.dtype and sr1.dtype == pl.Struct({'a': pl.Boolean})

    # The following assertion passes and confirms that the two frames below differ.
    sr1_in_list = pl.DataFrame([{'c': sr1}])
    sr2_in_list = pl.DataFrame([{'c': sr2}])
    assert_frame_not_equal(sr1_in_list, sr2_in_list)

    # This assertion fails in polars 1.5.0, even though the series differ.
    assert_series_not_equal(sr1, sr2)

Log output

Launching pytest with arguments unit_tests/test_polars/test_testing.py::test_assert_series_equal_is_consistent --no-header --no-summary -q in test/python

============================= test session starts ==============================
collecting ... collected 1 item

unit_tests/test_polars/test_testing.py::test_assert_series_equal_is_consistent 

======================== 1 failed, 4 warnings in 0.17s =========================
FAILED [100%]
test/python/unit_tests/test_polars/test_testing.py:4 (test_assert_series_equal_is_consistent)
def test_assert_series_equal_is_consistent() -> None:
        sr1 = pl.Series([{'a': True}, {'a': None}, None])
        sr2 = pl.Series([{'a': True}, {'a': None}, {'a': None}])
        assert sr1.dtype == sr2.dtype and sr1.dtype == pl.Struct({'a': pl.Boolean})
    
        # The following assertion passes and confirms that the two frames below differ.
        sr1_in_list = pl.DataFrame([{'c': sr1}])
        sr2_in_list = pl.DataFrame([{'c': sr2}])
        assert_frame_not_equal(sr1_in_list, sr2_in_list)
    
        # This assertion fails in polars 1.5.0, even though the series differ.
>       assert_series_not_equal(sr1, sr2)

unit_tests/test_polars/test_testing.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (shape: (3,)
Series: '' [struct[1]]
[
	{true}
	{null}
	null
], shape: (3,)
Series: '' [struct[1]]
[
	{true}
	{null}
	{null}
])
kwargs = {}

    @wraps(function)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
        _rename_keyword_argument(
            old_name, new_name, kwargs, function.__qualname__, version
        )
>       return function(*args, **kwargs)
E       AssertionError: Series are equal

python3.12/site-packages/polars/_utils/deprecation.py:91: AssertionError

Process finished with exit code 1

Issue description

The test first creates two DataFrames. Each of them contains a single column of datatype list of struct and a single row. One list contains a Null element whereas the other one contains a struct with all Null values. All other elements are the same.

  • assert_frame_not_equal correctly confirms that the two DataFrames differ.
  • assert_series_not_equal falsely assumes that the Series that where used to build the DataFrame are the same.

This is a contradiction.

Expected behavior

The most consistent expected behavior seems to be to treat Null and a struct with all Null-fields as two different things, in the same way as assert_frame_not_equal is already doing it.

The test should pass.

Installed versions

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             macOS-14.6.1-arm64-arm-64bit
Python:               3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.2
pydantic:             2.6.4
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@jcmuel jcmuel added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 26, 2024
@jcmuel
Copy link
Author

jcmuel commented Aug 26, 2024

In this issue (#18389), I combined the examples from #18119 and #18230, to show that assert_frame_equal and assert_series_equal are inconsistent in the way how they are treating Null structs and structs with all Null fields.

In #18230, @cmdlineluser suggested that the new behavior of list.drop_nulls() should now be considered to be correct, and that function treats Null and {Null} as two different things.

But in #18119, it seems like the conclusion was that assert_series_equal should treat Null and {Null} as the same.

This creates inconsistencies and this bug report identifies one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

7 participants
@jcmuel and others