Skip to content

Commit

Permalink
ENH: add 'origin and 'offset' arguments to 'resample' and 'pd.Grouper'
Browse files Browse the repository at this point in the history
a

more work
  • Loading branch information
hasB4K committed Feb 13, 2020
1 parent a9d2450 commit 3097767
Show file tree
Hide file tree
Showing 13 changed files with 382 additions and 97 deletions.
9 changes: 3 additions & 6 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1563,19 +1563,16 @@ end of the interval is closed:
ts.resample('5Min', closed='left').mean()
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
labels. ``label`` specifies whether the result is labeled with the beginning or
the end of the interval. ``loffset`` performs a time adjustment on the output
labels.
Parameters like ``label`` are used to manipulate the resulting labels.
``label`` specifies whether the result is labeled with the beginning or
the end of the interval.

.. ipython:: python
ts.resample('5Min').mean() # by default label='left'
ts.resample('5Min', label='left').mean()
ts.resample('5Min', label='left', loffset='1s').mean()
.. warning::

The default values for ``label`` and ``closed`` is '**left**' for all
Expand Down
Empty file modified doc/source/whatsnew/v1.0.0.rst
100755 → 100644
Empty file.
24 changes: 24 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,30 @@ For example:
ser["2014"]
ser.loc["May 2015"]
.. _whatsnew_110.grouper_origin:

Grouper now supports the argument origin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`Grouper` and :class:`DataFrame.resample` now supports the argument `origin`. A the timestamp on which to adjust the grouping. (:issue:`31809`)

The bins of the grouping are adjusted based on the beginning of the day of the time series starting point. This works well with frequencies that are multiples of a day (like `30D`) or that divides a day (like `90s` or `1min`). But it can create inconsistencies with some frequencies that do not meet this criteria. To change this behavior you can now specify a fixed timestamp with `origin`.

For example:

.. ipython:: python
start, end = "1/1/2000 00:00:00", "1/31/2000 00:00"
rng = pd.date_range(start, end, freq="1231min")
ts = pd.Series(np.arange(len(rng)), index=rng)
ts.groupby(pd.Grouper(freq="1399min")).agg("count")
ts.groupby(pd.Grouper(
freq="1399min",
origin=pd.Timestamp("1970-01-01"))
).agg("count")
..
.. _whatsnew_110.enhancements.other:

Other enhancements
Expand Down
36 changes: 34 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7650,9 +7650,11 @@ def resample(
convention: str = "start",
kind: Optional[str] = None,
loffset=None,
base: int = 0,
base: int = None,
on=None,
level=None,
origin=None,
offset=None,
) -> "Resampler":
"""
Resample time-series data.
Expand Down Expand Up @@ -7687,17 +7689,35 @@ def resample(
By default the input representation is retained.
loffset : timedelta, default None
Adjust the resampled time labels.
.. deprecated:: 1.1.0
You should add the loffset to the `df.index` after the resample.
like this:
``df.index = df.index.to_timestamp() + to_offset(loffset)``
(a more complete example is present below)
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
``df.resample(freq="3s", base=2)``
becomes
``df.resample(freq="3s", offset="2s")``
on : str, optional
For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like.
level : str or int, optional
For a MultiIndex, level (name or number) to use for
resampling. `level` must be datetime-like.
origin : pd.Timestamp, default None
The timestamp on which to adjust the grouping. If None is passed,
the first day of the time series at midnight is used.
offset : pd.Timedelta, default is None
An offset timedelta added to the origin.
Returns
-------
Expand Down Expand Up @@ -7916,6 +7936,16 @@ def resample(
2000-01-02 22 140
2000-01-03 32 150
2000-01-04 36 90
To replace the use of the deprecated loffset argument:
>>> df.resample(freq="3s", loffset="8H")
becomes:
>>> from pandas.tseries.frequencies import to_offset
>>> df = df.resample(freq="3s").mean()
>>> df.index = df.index.to_timestamp() + to_offset("8H")
"""

from pandas.core.resample import get_resampler
Expand All @@ -7933,6 +7963,8 @@ def resample(
base=base,
key=on,
level=level,
origin=origin,
offset=offset,
)

def first(self: FrameOrSeries, offset) -> FrameOrSeries:
Expand Down
23 changes: 23 additions & 0 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,32 @@ class Grouper:
If grouper is PeriodIndex and `freq` parameter is passed.
base : int, default 0
Only when `freq` parameter is passed.
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
``df.resample(freq="3s", base=2)``
becomes
``df.resample(freq="3s", offset="2s")``
loffset : str, DateOffset, timedelta object
Only when `freq` parameter is passed.
.. deprecated:: 1.1.0
loffset is only working for ``.resample(...)`` and not for
Grouper (:issue:`28302`).
However, loffset is also deprecated for ``.resample(...)``
See: :class:`DataFrame.resample`
origin : Timestamp, default None
Only when `freq` parameter is passed.
The timestamp on which to adjust the grouping. If None is passed, the
first day of the time series at midnight is used.
offset : pd.Timedelta, default is None
An offset timedelta added to the origin.
Returns
-------
A specification for a groupby instruction
Expand Down
Loading

0 comments on commit 3097767

Please sign in to comment.