-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/DES: Non-Nanosecond Tracker #46587
Comments
Quick thoughts:
My gut is saying this should use/respect "unit" as it's not clear to me how "unit" and a reso-like keyword would cooperate?
Based on the GH issues I remember seeing in the past, generally people were hoping
Under the assumption |
Do keep in mind I don't have much experience using the date functionality even as-is.
For the "use/respect unit" option, I think this would mean that a user specifying Another option, if being able to specify the resolution is advantageous, would be to allow units such as
Is this value-dependent behavior? If everything "just works", then I don't have much of a concern. However, if getting back one resolution vs another can cause odd results in edge cases, this could be surprising and lead to hard to find bugs. If that is the case, I would lean toward raising and making the user specify the resolution. |
Not really, just an API change.
Yah, I've handled comparisons between scalars of different resolutions, but addition/subtraction (and division for Timedelta) it's not obvious what the right behavior is (i think numpy silently converts in ways that sometimes overflow, have examples/tests written down somewhere). I think we provide an .astype-like API to allow conversion and otherwise raise on potentially-ambiguous operations. |
The existing keyword is about "how to interpret the user input", and what we now want is "how to store the parsed value". Those two are not necessarily the same, and I personally don't think we can simply assume they are (e.g. I think it would be quite common use case that you want to convert values that represent for example hours, but you want to store then in microseconds for compatibility with other data). In addition, we allow more units here for this keyword than what I would implement as possible resolution of the dtype.
My answer would be: we don't support a resolution of days, so such a resulting value is simply not possible (and thus it would be impossible to always respect the (it's certainly unfortunate, though, that the
I think long term, the logical behaviour would be that any |
I'm reaching a point where making further progress depends on having non-nano DatetimeTZDtype. That is easy to implement, but would be an API change bc ATM passing non-nano to the DatetimeTZDtype constructor raises (which we test). Selfishly, it would be very convenient to break the no-changes-in-minor-versions rule and do this immediately. Since this is a pretty tiny corner case, I doubt anyone would complain. But the policy is there for a reason. Thoughts on a) whether this is reasonable or b) viable alternatives? cc @mroeschke @jreback @jorisvandenbossche |
Does it raise with a not-implemented-like error message? If so, I think the "API change" could be advertised as an "enhancement" |
ValueError |
If I'm not missing anything, the only user code that could "break" with what you propose is if the user has code creating second precision datetime data, which currently breaks, and after your changes would work. Personally I think:
If I'm not missing something, I'd personally add this directly without overcomplicating things. |
Indeed, enabling more keyword values is generally not seen as a breaking change, I would say. Of course, the situation is a bit different here as the keyword values already did exist elsewhere (in numpy), and so some users might have tried passing it. But relying on that passing it raises an error seems unlikely. So for the dtype constructor, I would also directly add the ability to create dtypes with other resolutions. |
Status Update 2022-06-20. The constructors point is the one the most pressing.
|
Support for non-nanosecond timedelta64, datetime64, and datetime64tz is coming along. The next big planned steps are to get the Timedelta and Timestamp scalars to support non-nano resolutions. There are a few design options that I'd like to get input on cc @pandas-dev/pandas-core
I'm doing Timedelta first mostly because that is more conducive to doing as a scoped PR with dedicated testing. Current plan is to make a dedicated constructor like Timedelta._from_value_and_reso (so i can write tests) that can can be removed once we decide on the public behavior. Which brings us to the questions:
pd.Timedelta(timedelta(days=106752))
currently raises, in the future will presumably come back with a 'us' reso. So what about currently non-raising cases likepd.Timedelta(timedelta(days=106751))
? Does it stay ns or become us?Other
Update 2023-01-16 Remaining TODO list
pd.Series([1, 2, 3], dtype="m8[s]")
to match numpy?pd.Timedelta(days=365*1000)
raisesThe text was updated successfully, but these errors were encountered: