Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Index.append() refactoring #16236

Merged
merged 5 commits into from
Aug 22, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
_TD_DTYPE)
from pandas.core.dtypes.generic import (
ABCDatetimeIndex, ABCTimedeltaIndex,
ABCPeriodIndex)
ABCPeriodIndex, ABCRangeIndex)


def get_dtype_kinds(l):
Expand All @@ -41,6 +41,8 @@ def get_dtype_kinds(l):
typ = 'category'
elif is_sparse(arr):
typ = 'sparse'
elif isinstance(arr, ABCRangeIndex):
typ = 'range'
elif is_datetimetz(arr):
# if to_concat contains different tz,
# the result must be object dtype
Expand Down Expand Up @@ -559,3 +561,47 @@ def convert_sparse(x, axis):
# coerce to object if needed
result = result.astype('object')
return result


def _concat_rangeindex_same_dtype(indexes):
"""
Concatenates multiple RangeIndex instances. All members of "indexes" must
be of type RangeIndex; result will be RangeIndex if possible, Int64Index
otherwise. E.g.:
indexes = [RangeIndex(3), RangeIndex(3, 6)] -> RangeIndex(6)
indexes = [RangeIndex(3), RangeIndex(4, 6)] -> Int64Index([0,1,2,4,5])
"""

start = step = next = None

for obj in indexes:
if not len(obj):
continue

if start is None:
# This is set by the first non-empty index
start = obj._start
if step is None and len(obj) > 1:
step = obj._step
elif step is None:
# First non-empty index had only one element
if obj._start == start:
return _concat_index_asobject(indexes)
step = obj._start - start

non_consecutive = ((step != obj._step and len(obj) > 1) or
(next is not None and obj._start != next))
if non_consecutive:
# Int64Index._append_same_dtype([ix.astype(int) for ix in indexes])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #17307

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged this, you can update this line in the other PR if we merge that

# would be preferred... but it currently resorts to
# _concat_index_asobject anyway.
return _concat_index_asobject(indexes)

if step is not None:
next = obj[-1] + step

if start is None:
start = obj._start
step = obj._step
stop = obj._stop if next is None else next
return indexes[0].__class__(start, stop, step)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly... but the only alternative I see (an import inside the function) is uglier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its ok here

Copy link
Member

@jorisvandenbossche jorisvandenbossche Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative would be is to return the (start, stop, step) and do the construction in RangeIndex._append_same_dtype (then also the rename there is not needed)

Ah no, this wouldn't play nice with the case when no range index is returned, ignore this

11 changes: 5 additions & 6 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1741,18 +1741,17 @@ def append(self, other):
names = set([obj.name for obj in to_concat])
name = None if len(names) > 1 else self.name

if self.is_categorical():
# if calling index is category, don't check dtype of others
from pandas.core.indexes.category import CategoricalIndex
return CategoricalIndex._append_same_dtype(self, to_concat, name)
return self._concat(to_concat, name)

def _concat(self, to_concat, name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this _append ? (then it is more in line with _append_same_dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it would make more sense to change _append_same_dtype to _concat_same_dtype (also in IntervalIndex, DatetimeIndex, CategoryIndex), since it already disregards self (it is conceptually a @classmethod). Shall I proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is only use by append, I prefer using append in the name, but no strong feelings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that it's currently used only by append, but usually you expect x.append(y) to concatenate x to y or to elements of y; instead this only concatenates elements of y. So since you don't object I will go with my proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead this only concatenates elements of y

in the end it is used to concatenate both y to x, just that this is passed like that in append to this helper function. So it is still only used for append.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is still only used for append.

Sure, I don't object to that. We can agree it is a concat operation used to implement appending: the switch happens when append(self, other) does to_concat = [self] + list(other).


typs = _concat.get_dtype_kinds(to_concat)

if len(typs) == 1:
return self._append_same_dtype(to_concat, name=name)
return self._concat_same_dtype(to_concat, name=name)
return _concat._concat_index_asobject(to_concat, name=name)

def _append_same_dtype(self, to_concat, name):
def _concat_same_dtype(self, to_concat, name):
"""
Concatenate to_concat which has the same class
"""
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,7 +633,11 @@ def insert(self, loc, item):
codes = np.concatenate((codes[:loc], code, codes[loc:]))
return self._create_from_codes(codes)

def _append_same_dtype(self, to_concat, name):
def _concat(self, to_concat, name):
# if calling index is category, don't check dtype of others
return CategoricalIndex._concat_same_dtype(self, to_concat, name)

def _concat_same_dtype(self, to_concat, name):
"""
Concatenate to_concat which has the same class
ValueError if other is not in the categories
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -837,7 +837,7 @@ def summary(self, name=None):
result = result.replace("'", "")
return result

def _append_same_dtype(self, to_concat, name):
def _concat_same_dtype(self, to_concat, name):
"""
Concatenate to_concat which has the same class
"""
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -867,7 +867,7 @@ def _as_like_interval_index(self, other, error_msg):
raise ValueError(error_msg)
return other

def _append_same_dtype(self, to_concat, name):
def _concat_same_dtype(self, to_concat, name):
"""
assert that we all have the same .closed
we allow a 0-len index here as well
Expand All @@ -876,7 +876,7 @@ def _append_same_dtype(self, to_concat, name):
msg = ('can only append two IntervalIndex objects '
'that are closed on the same side')
raise ValueError(msg)
return super(IntervalIndex, self)._append_same_dtype(to_concat, name)
return super(IntervalIndex, self)._concat_same_dtype(to_concat, name)

@Appender(_index_shared_docs['take'] % _index_doc_kwargs)
def take(self, indices, axis=0, allow_fill=True,
Expand Down
59 changes: 3 additions & 56 deletions pandas/core/indexes/range.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from pandas.compat.numpy import function as nv
from pandas.core.indexes.base import Index, _index_shared_docs
from pandas.util._decorators import Appender, cache_readonly
import pandas.core.dtypes.concat as _concat
import pandas.core.indexes.base as ibase

from pandas.core.indexes.numeric import Int64Index
Expand Down Expand Up @@ -443,62 +444,8 @@ def join(self, other, how='left', level=None, return_indexers=False,
return super(RangeIndex, self).join(other, how, level, return_indexers,
sort)

def append(self, other):
"""
Append a collection of Index options together

Parameters
----------
other : Index or list/tuple of indices

Returns
-------
appended : RangeIndex if all indexes are consecutive RangeIndexes,
otherwise Int64Index or Index
"""

to_concat = [self]

if isinstance(other, (list, tuple)):
to_concat = to_concat + list(other)
else:
to_concat.append(other)

if not all([isinstance(i, RangeIndex) for i in to_concat]):
return super(RangeIndex, self).append(other)

start = step = next = None

for obj in to_concat:
if not len(obj):
continue

if start is None:
# This is set by the first non-empty index
start = obj._start
if step is None and len(obj) > 1:
step = obj._step
elif step is None:
# First non-empty index had only one element
if obj._start == start:
return super(RangeIndex, self).append(other)
step = obj._start - start

non_consecutive = ((step != obj._step and len(obj) > 1) or
(next is not None and obj._start != next))
if non_consecutive:
return super(RangeIndex, self).append(other)

if step is not None:
next = obj[-1] + step

if start is None:
start = obj._start
step = obj._step
stop = obj._stop if next is None else next
names = set([obj.name for obj in to_concat])
name = None if len(names) > 1 else self.name
return RangeIndex(start, stop, step, name=name)
def _concat_same_dtype(self, indexes, name):
return _concat._concat_rangeindex_same_dtype(indexes).rename(name)

def __len__(self):
"""
Expand Down