Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST/API: test column indexing copy/view semantics #35906

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pandas/tests/indexing/test_iloc.py
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,15 @@ def test_iloc_setitem_categorical_updates_inplace(self):
expected = pd.Categorical(["C", "B", "A"])
tm.assert_categorical_equal(cat, expected)

# __setitem__ under the other hand does not work in-place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this another test?

cat = pd.Categorical(["A", "B", "C"])
df = pd.DataFrame({1: cat, 2: [1, 2, 3]})

df[1] = cat[::-1]

expected = pd.Categorical(["A", "B", "C"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check whether the array that ends up in df[1] is the same object on the RHS of L713 (Block.setitem is weird)

tm.assert_categorical_equal(cat, expected)

def test_iloc_with_boolean_operation(self):
# GH 20627
result = DataFrame([[0, 1], [2, 3], [4, 5], [6, np.nan]])
Expand Down
74 changes: 74 additions & 0 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1110,3 +1110,77 @@ def test_setitem_categorical():
{"h": pd.Categorical(["m", "n"]).reorder_categories(["n", "m"])}
)
tm.assert_frame_equal(df, expected)


def test_setitem_EA_column_update():
# https://github.com/pandas-dev/pandas/issues/33457

df = pd.DataFrame(
{
"int": [1, 2, 3],
"int2": [3, 4, 5],
"float": [0.1, 0.2, 0.3],
"EA": pd.array([1, 2, None], dtype="Int64"),
}
)
original_arr = df.EA.array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use bracket notation instead of dot?


# overwrite column with new array
df["EA"] = pd.array([1, 2, 3], dtype="Int64")
# ensure original array was not modified
assert original_arr is not df.EA.array
expected = pd.array([1, 2, None], dtype="Int64")
tm.assert_extension_array_equal(original_arr, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check whether df["EA"] views the same data on the RHS of L1129 vs a copy



def test_getitem_EA_no_copy():
# ensure we don't copy the EA when taking a subset

df = pd.DataFrame(
{
"int": [1, 2, 3],
"int2": [3, 4, 5],
"float": [0.1, 0.2, 0.3],
"EA": pd.array([1, 2, None], dtype="Int64"),
}
)
original_arr = df.EA.array
subset = df[["int", "EA"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe parametrizing over several equivalent forms of doing this selection?

assert subset.EA.array is original_arr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On pandas 1.0 and older, this was apparently taking a copy of the EA (we noticed this in a geopandas test where we were testing against multiple pandas versions)

# check that they view the same data by modifying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make a helper for this

df["EA"].array[0] = 10
expected = pd.array([10, 2, None], dtype="Int64")
tm.assert_extension_array_equal(subset["EA"].array, expected)

# TODO this way it doesn't modify subset - is this expected?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure here. The (or an) issue is that we allow setting to change the dtype, which for EAs means changing the type. So there's no way for

s = pd.Series(pd.array([1, 2]))
s.iloc[0] = 0.5

to mutate the array inplace, and so I wouldn't expect your subset to be mutated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example currently raises an error, because mutating inplace for EAs doesn't allow changing the dtype (or at least for the nullable EAs, and which is a difference with normal dtypes that we should actually decide if this is desired (I think so) and if so, test and document as a difference)

# df.iloc[0, 3] = 10
# expected = pd.array([10, 2, None], dtype="Int64")
# tm.assert_extension_array_equal(subset['EA'].array, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u make this a separate test and xfail (agree we should prob allow this)



def test_getitem_column_view():
# test that getting a single column is a view on the data

df = pd.DataFrame(
{
"int": [1, 2, 3],
"int2": [3, 4, 5],
"float": [0.1, 0.2, 0.3],
"EA": pd.array([1, 2, None], dtype="Int64"),
}
)

# getitem with ExtensionArray
original_arr = df._mgr.blocks[2].values
col = df["EA"]
assert col.array is original_arr
col[0] = 10
expected = pd.array([10, 2, None], dtype="Int64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mutates the original? can u assert that as well

tm.assert_extension_array_equal(df["EA"].array, expected)

# getitem from consolidated block
col = df["int"]
with pd.option_context("chained_assignment", "warn"):
with tm.assert_produces_warning(pd.core.common.SettingWithCopyWarning):
col[0] = 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights a difference between a column backed by ExtensionBlock or consolidated block -> with EA, we don't raise a SettingWithCopyWarning for this case.

(I am personally actually fine with how it behaves for EA)

tm.assert_equal(df["int"], pd.Series([10, 2, 3], name="int"))