-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: In 1.5rc0 casting Series to datetime64 with specific but non-nanosecond units has no effect #48574
Comments
This is a (small) hack that works around changes in how pandas deals with casting Series to `datetime64` types have units larger than seconds. We depended on the previous behavior. Not sure if it's something that will get fixed in pandas, but I made this issue and it's a breaking change, so hopefully: pandas-dev/pandas#48574
cc @jbrockmendel thoughts here? |
Can fix for 1.5.1. For 2.0 we'll support [s, ms, us, ns], and i think astype to anything else will raise. long-term the user should use .floor i think |
Messing around with this a bit, import numpy as np
import pandas as pd
df = (
pd.DataFrame()
.assign(
hourly=pd.Series(np.arange("2000-01-01", "2003-01-01", dtype="datetime64[h]")),
daily=lambda x: x.hourly.dt.floor("D"),
monthly=lambda x: pd.Series(x["hourly"].to_numpy().astype("datetime64[M]")),
yearly=lambda x: pd.Series(x["hourly"].to_numpy().astype("datetime64[Y]")),
)
)
df.sample(10)
|
Good catch. I think we'd need something like For 1.5.0 your best bet is, like you've found, doing the astype directly on the underlying numpy arrays. For 1.5.1 we can restore that behavior. |
moving off 2.0 as |
Is there a plan to restore |
hey @lingyielia - this will continue to raise are you trying to floor to the beginning of the year? If so, you could do ser + pd.tseries.offsets.YearBegin() - pd.tseries.offsets.YearBegin() In the future, it should be possible to do |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
In pandas 1.4.4 and earlier, it was possible to affect the contents of
datetime64[ns]
type Series by casting todatetime64
with other non-nanosecond units, even though the resulting Series still haddatetime64[ns]
as its type. As of 1.5rc0 this behavior seems to have changed. Casting todatetime64
with other units no longer seems to have any effect.Based on comments in PR #48555 (closing issue #47844) referring to the numpy unit conversion it seems like this might not be the intended behavior, and it's a breaking change (we were relying on this behavior to turn month-start dates into the corresponding year-start dates). Snippet from that PR:
Expected Behavior
I expected the dates in the series to be adjusted to be consistent with the frequency of the
datetime64
type used inastype()
, as illustrated in the example above.Installed Versions
INSTALLED VERSIONS
commit : bbf17ea
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-47-generic
Version : #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.6.0.dev0+136.gbbf17ea692
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : 0.29.32
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
The text was updated successfully, but these errors were encountered: