-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Timedelta cannot span more than 293 years => implementation limitation #35687
Comments
I have removed criticism and proposed a solution. Please take a look! |
timedeltas are backed by an int64 which give a reasonable tradeoff between resolution (ns) and time range. This is the same issue w.r.t. Timestamp ranges as well where there have been many discussions. (search the repo). If someone wants to implement a But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase? |
I was doing automated carbon dating at some archeological site, for every item the machine has collected, the computer computes its history by measuring Carbon-13 isotope, so the range can go from a few years to a few thousand years back from now ("now" is different for different runs). Most people in the same field are still using MS-Excel to do manual computation. Honestly speaking, I do not mind that timedelta has a small span, but a span of only 292 years is really too small for many scientific works. |
IMO a long-term solution would be to move to a default 128bit (or 96 bit) type which could then have both very precise resolution and a long time span. This is likely even more challenging than that @jreback suggested. FWIW this is not unprecedented as MATLAB has gone down this route. |
Astronomy, archeology, and geology are three places where the natural limit will be regularly hit. |
I also run into this issue for my package staircase which provides functionality for working with mathematical step functions. An example application is representing the number of "active objects" over time, eg buses, website users etc with a step function. |
In 2.0 we support non-nanosecond Timedeltas, though depending on how you call the constructor you may still need to explicitly cast to the lower resolution:
@xuancong84 does this handle your use case? |
@jbrockmendel Yes, this handles the case. However, it is too mechanical and manual. A more elegant implementation should be able to automatically switch unit among ns/us/ms/s depending on the time scale. That is why a high-precision floating point implementation is recommended rather than a large integer. This is because the IEEE floating point mechanism is designed to handle such dynamic scaling problem. If you don't use floating point, then you have to handle it manually which is troublesome and bug-borne. Of course, the drawback is that when you are adding two durations that are out of the precision scale, e.g., adding 1 nanosecond to 1000 years, then the addition will underflow, resulting in no increment. But such a situation is rarely the case, can someone think of a case when you need to add a very small duration (such as ns) to a very large duration (>1000 years)? In conclusion, the most elegant solution in my opinion is to switch between floating point timedelta and large integer timedelta instead of switching among fixed units such as ns/us/ms/s. Otherwise, what if you want to scale to trillions of years, is a unit of 1 second enough? What if you want to deal with scales at femto-second level, can you aggregate even to a few minutes?? |
You're welcome to implement a float-based datetime dtype. I don't expect pandas to implement one internally. Closing as complete. |
I would add that with the changes made available in 2 it should be relatively straight forward to write and extension type that could support ns resolution plus a wider range as a stand along package. I agree that acceptance into pandas would be a long shot. This said, the only way forward for something like this would be to have it as a stand alone that could be shown to be very popular, and to have it mature outside pandas. |
The Pandas datetime arithmetic module is indeed very useful and it can be used for archeological research studies as well. Unfortunately, the limitation that Timedelta object cannot span more than 293 years puts a huge shame on this wonderful piece of data science library. With this limitation, it is not possible to go back for thousands or even more than 300 years in history.
In terms of practicality, time resolution and time span are always contradicting with each other. Nowadays, quantum physics often deals with time objects at a scale of nano (10**-9), pico (10**-12), or even femto(10**-15)-second. While archeology often deals with time objects at a span of thousands, millions, or even billions of years. If I remember correctly, pandas set the time counter base unit at nano-seconds, thus, the span will be short. The solution to cater for both high resolution and large span is to use floating point rather than a large integer, as the time counter base unit. The speed will be slightly slower for floating point. But if you take a look at Intel Architecture, on modern CPU, floating point arithmetic is almost the same as integer especially when SIMD is used.
The text was updated successfully, but these errors were encountered: