Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve formatting of reference time units under pandas 2.0.0 #7441

Merged
merged 20 commits into from
Apr 1, 2023

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Jan 15, 2023

As suggested by @keewis, to preserve existing behavior in xarray, this PR forces any object passed to format_timestamp to be converted to a string using strftime with a constant format. This addresses the failing tests related to the units encoding in #7420.

  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Changes in pandas 2.0.0 interfere with the way we expect some times
to be formatted.
@keewis
Copy link
Collaborator

keewis commented Jan 19, 2023

I think this is a bug in pandas: pd.Timestamp("1-01-01 00:00:00") returns a date in 2001.

In any case, t.isoformat(sep=" ") should return the year with 4 digits so maybe we should use that instead? (once again, it doesn't, but that I think is also a bug, maybe the same one?)

@spencerkclark
Copy link
Member Author

spencerkclark commented Jan 19, 2023

Thanks @keewis.

I think this is a bug in pandas: pd.Timestamp("1-01-01 00:00:00") returns a date in 2001.

I suspect this may be related to the comment here.

In any case, t.isoformat(sep=" ") should return the year with 4 digits so maybe we should use that instead? (once again, it doesn't, but that I think is also a bug, maybe the same one?)

Though indeed the isoformat behavior is definitely a bug and it would be fair to at least expect this kind of roundtrip to hold, but it doesn't:

>>> import pandas as pd
>>> pd.Timestamp(str(pd.Timestamp("0001-01-01")))
Timestamp('2001-01-01 00:00:00')

I can report that to pandas. It's not necessarily surprising considering that it previously was not possible to write pd.Timestamp("0001-01-01").

@spencerkclark
Copy link
Member Author

Issue reported here: pandas-dev/pandas#50867. Thanks again for noting that @keewis. I was too narrowly focused on fixing this in xarray.

We may still want to be careful about non-nanosecond-precision Timestamp objects leaking into our code for the time being, but that's a different conversation.

xarray/core/formatting.py Outdated Show resolved Hide resolved
@spencerkclark spencerkclark marked this pull request as ready for review March 8, 2023 03:04
@spencerkclark
Copy link
Member Author

Thanks @keewis--great suggestion--I think this should be ready for review now too!

xarray/core/formatting.py Outdated Show resolved Hide resolved
@spencerkclark
Copy link
Member Author

Hmm I guess I should have run the tests before saying anything. We could probably work around the NaT issue fairly easily, but I forgot about timezones. The ability to include a colon in the UTC offset, which we expect in the failing test, with strftime was only recently added to Python (python/cpython#95983), which makes things a little messier. I'll think about this a little more.

@spencerkclark
Copy link
Member Author

Still mulling this over a bit, but one other thing that occurs to me is that if we go with a strftime solution we should be careful (if it exists) to preserve any sub-second information of the Timestamp as well.

dcherian added a commit that referenced this pull request Mar 20, 2023
Pandas is expecting to release v2 in two weeks (pandas-dev/pandas#46776 (comment)). But we are still incompatible with their main branch: 
- #7441 
- #7420

This PR pins pandas to `<2`
@dcherian dcherian mentioned this pull request Mar 20, 2023
dcherian added a commit that referenced this pull request Mar 22, 2023
* Pin pandas  < 2

Pandas is expecting to release v2 in two weeks (pandas-dev/pandas#46776 (comment)). But we are still incompatible with their main branch: 
- #7441 
- #7420

This PR pins pandas to `<2`

* modify requirements.txt

* Update requirements
xarray/core/formatting.py Outdated Show resolved Hide resolved
spencerkclark and others added 2 commits March 31, 2023 19:32
Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
@spencerkclark
Copy link
Member Author

Thanks @keewis for fixing this upstream (pandas-dev/pandas#52220)!

Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
doc/whats-new.rst Outdated Show resolved Hide resolved
@keewis
Copy link
Collaborator

keewis commented Apr 1, 2023

the commit with my fix seems to have passed all CI (although using isoformat(sep=" ") is much better), and the upstream-dev CI only failed with unrelated issues, so once CI passes I think we can finally merge this.

@spencerkclark
Copy link
Member Author

Thanks for cleaning up the merge error I must have introduced; I agree this should be ready to go.

@keewis keewis merged commit 84607c3 into pydata:main Apr 1, 2023
@spencerkclark spencerkclark deleted the fix-time-format branch April 1, 2023 12:41
@keewis keewis mentioned this pull request Apr 5, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants