Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Out of bounds Timestamp does not raise exception #19382

Closed
cbertinato opened this issue Jan 24, 2018 · 5 comments · Fixed by #19529
Closed

BUG: Out of bounds Timestamp does not raise exception #19382

cbertinato opened this issue Jan 24, 2018 · 5 comments · Fixed by #19529
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@cbertinato
Copy link
Contributor

cbertinato commented Jan 24, 2018

An OutOfBoundsDatetime exception is raised if a datetime that goes beyond the minimum datetime is specified in both nanoseconds:

>>> Timestamp(-9223372036854775000)
Timestamp('1677-09-21 00:12:43.145225')
>>> Timestamp(-9223372036854775001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 312, in pandas._libs.tslibs.conversion.convert_to_tsobject
    check_dts_bounds(&obj.dts)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 121, in pandas._libs.tslibs.np_datetime.check_dts_bounds
    raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

and as a string:

>>> Timestamp('1677-09-21 00:12:43.145225')
Timestamp('1677-09-21 00:12:43.145225')
>>> Timestamp('1677-09-21 00:12:43.145224')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 274, in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_str_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
  File "pandas/_libs/tslibs/conversion.pyx", line 482, in pandas._libs.tslibs.conversion.convert_str_to_tsobject
    return convert_to_tsobject(ts, tz, unit, dayfirst, yearfirst)
  File "pandas/_libs/tslibs/conversion.pyx", line 299, in pandas._libs.tslibs.conversion.convert_to_tsobject
    return convert_datetime_to_tsobject(ts, tz)
  File "pandas/_libs/tslibs/conversion.pyx", line 392, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
    check_dts_bounds(&obj.dts)
  File "pandas/_libs/tslibs/np_datetime.pyx", line 121, in pandas._libs.tslibs.np_datetime.check_dts_bounds
    raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

An OverflowError is raised if going beyond the maximum datetime when instantiating with the number nanoseconds:

>>> Timestamp(9223372036854775807)
Timestamp('2262-04-11 23:47:16.854775807')
>>> Timestamp(9223372036854775808)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 581, in pandas._libs.tslibs.timestamps.Timestamp.__new__
    ts = convert_to_tsobject(ts_input, tz, unit, 0, 0)
  File "pandas/_libs/tslibs/conversion.pyx", line 289, in pandas._libs.tslibs.conversion.convert_to_tsobject
    obj.value = ts
OverflowError: Python int too large to convert to C long

but an incorrect Timestamp is returned if an out-of-bounds datetime string is specified:

>>> Timestamp('2262-04-11 23:47:16.854775807')
Timestamp('2262-04-11 23:47:16.854775807')
>>> Timestamp('2262-04-11 23:47:16.854775808')
Timestamp('2262-04-11 23:47:16.854775')

Verified that this is the case for DatetimeIndex as well.

Expected Output

OutOfBoundsDatetime exception is raised

Output of pd.show_versions()

commit: b38dc41

python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+169.gb38dc4105
pytest: 3.3.1
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.1
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.15
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@cbertinato
Copy link
Contributor Author

Has anybody seen an instance where a test will not raise an exception, but the same line in the Python interpreter does?

@chris-b1
Copy link
Contributor

There's a reason (arguable whether it's a good one) for the difference between the OutOfBoundsDatetime and OverflowError - there's some portion of the negative int64 space that aren't valid times, while in the positive space we use every available int64 value, so trying one too big can't be cast into an int64. (mostly trivia)

Your upper string example definitely looks buggy - PR would be welcome!

>>> Timestamp('2262-04-11 23:47:16.854775808')
Timestamp('2262-04-11 23:47:16.854775')

@chris-b1 chris-b1 added this to the Next Major Release milestone Jan 25, 2018
@cbertinato
Copy link
Contributor Author

So should we normalize it so that any format generates an OutOfBoundsDatetime? I understand the difference, but is it necessary to indicate overflow in this case?

@chris-b1
Copy link
Contributor

Yeah, I tend to think it'd be better if all of these raised an OutOfBoundsDatetime

@cbertinato
Copy link
Contributor Author

cbertinato commented Jan 28, 2018

There is another variation for DatetimeIndex.

In [3]: pd.DatetimeIndex(np.array(['2262-04-11 23:47:16.854775808'], dtype='datetime64'))
Out[3]: DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

So it's not exactly silent, but does differ from what happens for a Timestamp, which is the bug that we're talking about there. But suppose we address the upper bound for the Timestamp and make it, say, raise OutofBoundsDatetime or OverflowError. The behavior of the DatetimeIndex would still be inconsistent not only with that, but with its own behavior at the lower bound.

Can't an argument be made for returning NaT for all cases, Timestamp and DatetimeIndex?

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Feb 4, 2018
@jreback jreback added Error Reporting Incorrect or improved errors from pandas and removed Bug labels Feb 4, 2018
@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 4, 2018
harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants