Skip to content

Commit

Permalink
REF: IntervalIndex[IntervalArray] (#20611)
Browse files Browse the repository at this point in the history
Co-authored-by: Jeremy Schendel <jschendel@users.noreply.github.com>
  • Loading branch information
TomAugspurger and jschendel committed Jul 13, 2018
1 parent 365eac4 commit 610a19a
Show file tree
Hide file tree
Showing 26 changed files with 1,643 additions and 612 deletions.
23 changes: 18 additions & 5 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1924,11 +1924,24 @@ untouched. If the data is modified, it is because you did so explicitly.
dtypes
------

The main types stored in pandas objects are ``float``, ``int``, ``bool``,
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
for more detail on ``datetime64[ns, tz]`` dtypes.
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
columns of a DataFrame. The main types allowed in pandas objects are ``float``,
``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
timezone-aware datetimes).

In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
NumPy's type-system for a few cases.

* :ref:`Categorical <categorical>`
* :ref:`Datetime with Timezone <timeseries.timezone_series>`
* :ref:`Period <timeseries.periods>`
* :ref:`Interval <advanced.indexing.intervallindex>`

Pandas uses the ``object`` dtype for storing strings.

Finally, arbitrary objects may be stored using the ``object`` dtype, but should
be avoided to the extent possible (for performance and interoperability with
other libraries and methods. See :ref:`basics.object_conversion`).

A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series
with the data type of each column.
Expand Down
70 changes: 70 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,36 @@ Current Behavior:

result


.. _whatsnew_0240.enhancements.interval:

Storing Interval Data in Series and DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Interval data may now be stored in a ``Series`` or ``DataFrame``, in addition to an
:class:`IntervalIndex` like previously (:issue:`19453`).

.. ipython:: python

ser = pd.Series(pd.interval_range(0, 5))
ser
ser.dtype

Previously, these would be cast to a NumPy array of ``Interval`` objects. In general,
this should result in better performance when storing an array of intervals in
a :class:`Series`.

Note that the ``.values`` of a ``Series`` containing intervals is no longer a NumPy
array, but rather an ``ExtensionArray``:

.. ipython:: python

ser.values

This is the same behavior as ``Series.values`` for categorical data. See
:ref:`whatsnew_0240.api_breaking.interval_values` for more.


.. _whatsnew_0240.enhancements.other:

Other Enhancements
Expand All @@ -91,6 +121,45 @@ Other Enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. _whatsnew_0240.api_breaking.interval_values:

``IntervalIndex.values`` is now an ``IntervalArray``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :attr:`~Interval.values` attribute of an :class:`IntervalIndex` now returns an
``IntervalArray``, rather than a NumPy array of :class:`Interval` objects (:issue:`19453`).

Previous Behavior:

.. code-block:: ipython

In [1]: idx = pd.interval_range(0, 4)

In [2]: idx.values
Out[2]:
array([Interval(0, 1, closed='right'), Interval(1, 2, closed='right'),
Interval(2, 3, closed='right'), Interval(3, 4, closed='right')],
dtype=object)

New Behavior:

.. ipython:: python

idx = pd.interval_range(0, 4)
idx.values

This mirrors ``CateogricalIndex.values``, which returns a ``Categorical``.

For situations where you need an ``ndarray`` of ``Interval`` objects, use
:meth:`numpy.asarray` or ``idx.astype(object)``.

.. ipython:: python

np.asarray(idx)
idx.values.astype(object)


.. _whatsnew_0240.api.datetimelike.normalize:

Tick DateOffset Normalize Restrictions
Expand Down Expand Up @@ -350,6 +419,7 @@ Interval
^^^^^^^^

- Bug in the :class:`IntervalIndex` constructor where the ``closed`` parameter did not always override the inferred ``closed`` (:issue:`19370`)
- Bug in the ``IntervalIndex`` repr where a trailing comma was missing after the list of intervals (:issue:`20611`)
-
-

Expand Down
20 changes: 20 additions & 0 deletions pandas/_libs/interval.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,26 @@ cdef class IntervalMixin(object):
msg = 'cannot compute length between {left!r} and {right!r}'
raise TypeError(msg.format(left=self.left, right=self.right))

def _check_closed_matches(self, other, name='other'):
"""Check if the closed attribute of `other` matches.
Note that 'left' and 'right' are considered different from 'both'.
Parameters
----------
other : Interval, IntervalIndex, IntervalArray
name : str
Name to use for 'other' in the error message.
Raises
------
ValueError
When `other` is not closed exactly the same as self.
"""
if self.closed != other.closed:
msg = "'{}.closed' is '{}', expected '{}'."
raise ValueError(msg.format(name, other.closed, self.closed))


cdef _interval_like(other):
return (hasattr(other, 'left')
Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@
ExtensionScalarOpsMixin)
from .categorical import Categorical # noqa
from .datetimes import DatetimeArrayMixin # noqa
from .interval import IntervalArray # noqa
from .period import PeriodArrayMixin # noqa
from .timedelta import TimedeltaArrayMixin # noqa
6 changes: 6 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
_ensure_int64,
_ensure_object,
_ensure_platform_int,
is_extension_array_dtype,
is_dtype_equal,
is_datetimelike,
is_datetime64_dtype,
Expand Down Expand Up @@ -1243,6 +1244,11 @@ def __array__(self, dtype=None):
ret = take_1d(self.categories.values, self._codes)
if dtype and not is_dtype_equal(dtype, self.categories.dtype):
return np.asarray(ret, dtype)
if is_extension_array_dtype(ret):
# When we're a Categorical[ExtensionArray], like Interval,
# we need to ensure __array__ get's all the way to an
# ndarray.
ret = np.asarray(ret)
return ret

def __setstate__(self, state):
Expand Down
Loading

0 comments on commit 610a19a

Please sign in to comment.