-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporal matcher can't handle timezone information in DataFrame indices #150
Comments
Hm, looks like there's another point where timezone information is discarded: pytesmo.temporal_matching.df_match, line 115:
|
I've only taken a brief look but I think that both cases should work. Could it be that we are looking at pandas problems? |
It looks like problem comes from pandas index manipulation, so maybe. I'll see if I can do some tests with different pandas versions. |
I just wrote and ran a script that installs pandas versions 0.19.2, 0.20.1, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, and 0.23.4 (with conda) and then runs the unit test above. Edit: It does. I'm so glad I'm not seeing imaginary problems... |
This seems to be pandas-dev/pandas#15750 Also easily reproduced in your test by looking at the
The question for me is do we need timezone support? If yes then we should fix it in our temporal matching routine by either using UTC internally or using julian dates internally. |
And also pandas-dev/pandas#14052 |
As I understood Tracy, it's needed for TU's current work - I hope she can explain it here herself. Whatever the matcher uses internally, we need to make sure that the data returned has the same format as the data put into the matcher. |
@tracyscanlon Could you tell us about the need for timezone support? What about summer time and all the other ugly things that come with it? If we really need all of that then we should also check the library that pandas uses for all of this to see if everything we need is supported. |
OK, discussion in our review meeting revealed that we currently don't expect data to be in non-UTC format. What currently happens in my code using pytesmo is: if data has tz info, convert to UTC and then drop tz info. |
I think converting to UTC, raising a Warning stating the fact that this was done and then dropping the tz info is a good approach. |
Since I just ran into it with the AscatNc class: what do we do if read/read_ts don't return a DataFrame but an object that has-a DataFrame, like AscatTimeSeries? |
Then it does not fit our interface and we either write a simple adapter class or we change AscatNc. |
We can of course also just check the type and let our interface be a little smarter. But we should not let that get out of hand |
Maybe we can do it the other way around and move the metadata currently provided by AscatNc into the |
Not sure this is so good if pandas does not have support for it. See pandas-dev/pandas#2485 We can extend the interface to support objects that have a |
So validation_framework/data_manager.py:254 does this:
Does this mean that pytesmo doesn't use anything from the TS attributes but .data? And does it mean I can do the same reduction to .data in my adapter class(es) without breaking anything? |
Actually. It seem we already support it. So you should not need to do anything? What was your original error? Maybe we should also open a separate issue, since this issue is about the timezones |
Well, solving the timezone issue requires touching the dataframe somewhere in pytesmo. My question is: is it safe to either modify the TS or discard anything but the .data property. So it's more about how to implement the timezone solution. |
Since we handle A solution for the timezone issue could also be implemented in the data manager after we have extracted the |
After re-reading this thread, I don't think we need to do anything at the moment. Merging #151 broke our CI so I would revert this if nobody disagrees. I don't see the point of having a test for a feature that we do not support at the moment. |
I've reverted the merge d5ab76e |
added test for timezone, closes #150
The test that reproduces the issue:
The exception that occurs:
The root of the problem: pytesmo.temporal_matching.df_match, line 85:
If you set a breakpoint to line 86, you can observe this:
Any recommendations on how to fix this? Manually copying over the timezone information or taking care not to delete it in the first place?
See also: https://github.com/awst-austria/pytesmo/blob/timezone_issue/tests/test_temporal_matching.py
The text was updated successfully, but these errors were encountered: