Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError in resample+aggregate for tz-aware index and list-like aggregation #22660

Closed
frexvahi opened this issue Sep 11, 2018 · 1 comment · Fixed by #25297
Closed

OverflowError in resample+aggregate for tz-aware index and list-like aggregation #22660

frexvahi opened this issue Sep 11, 2018 · 1 comment · Fixed by #25297
Labels
Bug Resample resample method Timezones Timezone data dtype
Milestone

Comments

@frexvahi
Copy link
Contributor

frexvahi commented Sep 11, 2018

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(np.random.rand(200, 1),
                  index=pd.DatetimeIndex(start='2017-01-01', freq='15min', periods=200, tz='Europe/Berlin'),
                  columns=['t2p'])
df.resample('1d').aggregate(['mean'])

...

OverflowError [full traceback in 'details' below]
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: 't2p'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
1612 try:
-> 1613 return Index.get_loc(self, key, method, tolerance)
1614 except (KeyError, ValueError, TypeError):

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: 't2p'

During handling of the above exception, another exception occurred:

OverflowError Traceback (most recent call last)
in ()
2 index=pd.DatetimeIndex(start='2017-01-01', freq='1h', periods=100, tz='Europe/Berlin'),
3 columns=['t2p'])
----> 4 df.resample('1d').aggregate(['mean'])
5
6

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/resample.py in aggregate(self, arg, *args, **kwargs)
238
239 self._set_binner()
--> 240 result, how = self._aggregate(arg, *args, **kwargs)
241 if result is None:
242 result = self._groupby_and_aggregate(arg,

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
549 return self._aggregate_multiple_funcs(arg,
550 _level=_level,
--> 551 _axis=_axis), None
552 else:
553 result = None

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
594 try:
595 colg = self._gotitem(col, ndim=1,
--> 596 subset=obj.iloc[:, index])
597 results.append(colg.aggregate(arg))
598 keys.append(col)

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/resample.py in _gotitem(self, key, ndim, subset)
298 # try the key selection
299 try:
--> 300 return grouped[key]
301 except KeyError:
302 return grouped

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in getitem(self, key)
264
265 else:
--> 266 if key not in self.obj:
267 raise KeyError("Column not found: {key}".format(key=key))
268 return self._gotitem(key, ndim=1)

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/generic.py in contains(self, key)
1520 def contains(self, key):
1521 """True if the key is in the info axis"""
-> 1522 return key in self._info_axis
1523
1524 @property

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in contains(self, key)
379 def contains(self, key):
380 try:
--> 381 res = self.get_loc(key)
382 return (is_scalar(res) or isinstance(res, slice) or
383 (is_list_like(res) and len(res)))

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
1619
1620 try:
-> 1621 stamp = Timestamp(key, tz=self.tz)
1622 return Index.get_loc(self, stamp, method, tolerance)
1623 except KeyError:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.new()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion._localize_pydatetime()

~/.conda/envs/everything/lib/python3.6/site-packages/pytz/tzinfo.py in localize(self, dt, is_dst)
321 possible_loc_dt = set()
322 for delta in [timedelta(days=-1), timedelta(days=1)]:
--> 323 loc_dt = dt + delta
324 idx = max(0, bisect_right(
325 self._utc_transition_times, loc_dt) - 1)

OverflowError: date value out of range

Problem description

Here are some changes I have tried in order to work out which situations trigger the bug:

  • No error for tz-naive or UTC, error for 'Europe/Berlin' and 'America/New_York'
  • No error for column name 't2', 't2x', 't2q', 'T_2M', error for 't2p', 't2m', 't2m1', 'T2M'
  • The frequency of the DatetimeIndex and the resample period do not seem to matter
  • No error for .resample().mean() etc., the error only happens when using .resample().aggregate()

Expected Output

                                 t2p
                               mean
2017-01-01 00:00:00+01:00  0.397067
2017-01-02 00:00:00+01:00  0.519352
2017-01-03 00:00:00+01:00  0.534746
2017-01-04 00:00:00+01:00  0.587625
2017-01-05 00:00:00+01:00  0.497514

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-33-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: 3.7.4
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.8
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Bug Resample resample method labels Sep 11, 2018
@gfyoung
Copy link
Member

gfyoung commented Sep 11, 2018

That does seem odd indeed! Investigation and PR are welcome!

@mroeschke mroeschke added the Timezones Timezone data dtype label Sep 23, 2018
@mroeschke mroeschke changed the title OverflowError in resample+aggregate for tz-aware index for columns called 't2p' or 't2m' OverflowError in resample+aggregate for tz-aware index and list-like aggregation Sep 23, 2018
@jreback jreback added this to the 0.25.0 milestone Feb 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants