Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR/API: disallow lists within list for set_index #24697

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
bf0eee0
DEPR/API: disallow lists within list for set_index
h-vetinari Jan 10, 2019
ed0de1f
Add deprecation and whatsnew
h-vetinari Jan 10, 2019
dc274e3
restore test for list-of-scalars interpreted as keys
h-vetinari Jan 10, 2019
623fc9a
Small doc fixes
h-vetinari Jan 10, 2019
5f6e303
Improve docstring; small fixes
h-vetinari Jan 10, 2019
13d4e40
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 10, 2019
813b4fc
Remove last mention of "list-like"
h-vetinari Jan 10, 2019
4c130ee
rephrase "illegal"
h-vetinari Jan 10, 2019
8731834
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 10, 2019
29fbc6a
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 14, 2019
cc04a64
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 14, 2019
e1d999b
Improve warning message (review TomAugspurger)
h-vetinari Jan 14, 2019
726ef1c
typo
h-vetinari Jan 14, 2019
0e1f709
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 16, 2019
b0b326f
Tuples always considered keys; KeyError, not ValueError if missing
h-vetinari Jan 16, 2019
6cbcc47
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Jan 20, 2019
c881aaa
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Feb 24, 2019
0214801
Actually commit fix for conflict, duh
h-vetinari Feb 24, 2019
61c511d
Move whatsnew to 0.25
h-vetinari Feb 24, 2019
7381245
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Mar 1, 2019
5fa544c
Add deprecation-section (review jreback)
h-vetinari Mar 1, 2019
a016bf0
Merge remote-tracking branch 'upstream/master' into depr_LL_set_index
h-vetinari Mar 3, 2019
0c65876
Fix doc fails
h-vetinari Mar 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,34 @@ Other API Changes
Deprecations
~~~~~~~~~~~~

**Lists as arrays in :meth:`DataFrame.set_index`**

Currently, :meth:`DataFrame.set_index` accepts lists as meaning two different things - as a list of labels, and as an array-like collection of values.
This ambiguity decides in favor of the list of labels, but nested lists are interpreted as arrays:

.. ipython:: python
:okwarning:

df = pd.DataFrame(np.reshape(np.arange(12), (3, 4)),
columns=['a', 'b', 'c', 'd'])
df.set_index(['a', 'b', 'c'])
df.set_index([['a', 'b', 'c']])

The latter case has now been deprecated and will be removed in a future version. As a replacement,
it is suggested to wrap the list in a :class:`Series`, :class:`Index`, ``np.array`` or an iterator.

.. ipython:: python

df.set_index(pd.Series(['a', 'b', 'c']))

It remains possible to use lists as collecting several column keys or arrays to create multiple levels of a :class:`MultiIndex`.

.. ipython:: ipython

df.set_index(['a', pd.Series(['a', 'b', 'c'])])

**Other deprecations**

- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)

.. _whatsnew_0250.prior_deprecations:
Expand Down
14 changes: 13 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4033,6 +4033,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
arbitrary combination of column keys and arrays. Here, "array"
encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and
instances of :class:`abc.Iterator`.
Lists (in the sense of a sequence of values, not column labels)
have been deprecated, and will be removed in a future version.
drop : bool, default True
Delete columns to be used as the new index.
append : bool, default False
Expand Down Expand Up @@ -4116,13 +4118,16 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
'one-dimensional arrays.')

missing = []
depr_warn = False
for col in keys:
if isinstance(col, (ABCIndexClass, ABCSeries, np.ndarray,
list, Iterator)):
Iterator)):
# arrays are fine as long as they are one-dimensional
# iterators get converted to list below
if getattr(col, 'ndim', 1) != 1:
raise ValueError(err_msg)
elif isinstance(col, list):
depr_warn = True
else:
# everything else gets tried as a key; see GH 24969
try:
Expand All @@ -4136,6 +4141,13 @@ def set_index(self, keys, drop=True, append=False, inplace=False,

if missing:
raise KeyError('None of {} are in the columns'.format(missing))
if depr_warn:
msg = ('Passing lists within a list to the parameter "keys" is '
'deprecated and will be removed in a future version. To '
'silence this warning, wrap the lists in a Series / Index '
'or np.ndarray. E.g. df.set_index(["A", [1, 2, 3]]) should '
'be passed as df.set_index(["A", pd.Series([1, 2, 3])]).')
warnings.warn(msg, FutureWarning, stacklevel=2)

if inplace:
frame = self
Expand Down
33 changes: 24 additions & 9 deletions pandas/tests/frame/test_alter_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,8 @@ def test_set_index_after_mutation(self):
tm.assert_frame_equal(result, expected)

# MultiIndex constructor does not work directly on Series -> lambda
# Add list-of-list constructor because list is ambiguous -> lambda
# also test index name if append=True (name is duplicate here for B)
@pytest.mark.parametrize('box', [Series, Index, np.array,
list, lambda x: [list(x)],
@pytest.mark.parametrize('box', [Series, Index, np.array, list,
lambda x: MultiIndex.from_arrays([x])])
@pytest.mark.parametrize('append, index_name', [(True, None),
(True, 'B'), (True, 'test'), (False, None)])
Expand All @@ -135,7 +133,7 @@ def test_set_index_pass_single_array(self, frame_of_index_cols,
with pytest.raises(KeyError, match=msg):
df.set_index(key, drop=drop, append=append)
else:
# np.array/list-of-list "forget" the name of B
# np.array "forgets" the name of B
name_mi = getattr(key, 'names', None)
name = [getattr(key, 'name', None)] if name_mi is None else name_mi

Expand Down Expand Up @@ -163,9 +161,13 @@ def test_set_index_pass_arrays(self, frame_of_index_cols,

keys = ['A', box(df['B'])]
# np.array/list "forget" the name of B
names = ['A', None if box in [np.array, list, tuple, iter] else 'B']
names = ['A', None if box in [np.array, list] else 'B']

result = df.set_index(keys, drop=drop, append=append)
if box == list:
with tm.assert_produces_warning(FutureWarning):
result = df.set_index(keys, drop=drop, append=append)
else:
result = df.set_index(keys, drop=drop, append=append)

# only valid column keys are dropped
# since B is always passed as array above, only A is dropped, if at all
Expand Down Expand Up @@ -193,7 +195,12 @@ def test_set_index_pass_arrays_duplicate(self, frame_of_index_cols, drop,
df.index.name = index_name

keys = [box1(df['A']), box2(df['A'])]
result = df.set_index(keys, drop=drop, append=append)

if box1 == list or box2 == list:
with tm.assert_produces_warning(FutureWarning):
result = df.set_index(keys, drop=drop, append=append)
else:
result = df.set_index(keys, drop=drop, append=append)

# if either box is iter, it has been consumed; re-read
keys = [box1(df['A']), box2(df['A'])]
Expand All @@ -206,8 +213,16 @@ def test_set_index_pass_arrays_duplicate(self, frame_of_index_cols, drop,
# to test against already-tested behaviour, we add sequentially,
# hence second append always True; must wrap keys in list, otherwise
# box = list would be interpreted as keys
expected = df.set_index([keys[0]], drop=first_drop, append=append)
expected = expected.set_index([keys[1]], drop=drop, append=True)
if box1 == list or box2 == list:
with tm.assert_produces_warning(FutureWarning):
expected = df.set_index([keys[0]], drop=first_drop,
append=append)
expected = expected.set_index([keys[1]], drop=drop,
append=True)
else:
expected = df.set_index([keys[0]], drop=first_drop, append=append)
expected = expected.set_index([keys[1]], drop=drop, append=True)

tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize('append', [True, False])
Expand Down