-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify time handling in multi-model statistics preprocessor #685
Conversation
…grid time; raise for daily data
This function also checks the frequency of the cubes (previously in _datetime_to_int_days), this seems to be a much more logical place. _datetime_to_int_days now simplifies to a one-liner, and can probably be eliminated completely.
@valeriupredoi whenever you have time (I'm not in a rush) |
hey @Peter9192 great stuff mate! I am sorry I took a while to review, been on leave until today. A few questions:
Anyways two small nicks, it's a very nice optimization you did imho! 🍺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job, Peter 🍺
one more thing - can you pls add the info in the documentation that describes the time regridding within the module? |
Thanks @valeriupredoi !
Yeah, I wanted to use that as well. But then it got messy, since regrid_time doesn't change the calendar, so then we'd be modifying the calendar here and the actual time points there (and it's just 2 lines...). At first I copied all the stuff about aux-coords as well, but then I noticed this never makes it to |
cool! ready to go, @mattiarighi could you give it a test please mate 🍺 |
@Peter9192 could you pls fix the conflict from the new |
Doing this right now... ⌛ |
It looks good to me. 👍 |
Okay I had a first look at the monthly data tests only. These are my findings:
|
The new regression test for the daily statistics of a 365-day calendar failed because some days had time points at 00:00 and others at 12:00. Since we're planning to move time handling to regrid time altogether, this behaviour is consistent with regrid_time (see issue #744).
Differences in time coordinates in daily regression tests:
Then there are some differences in the actual data (only for |
This is due to a bug in the current implementation (on which the reference data was based). I made an issue #937. Reference data should be updated. |
The reference data contains some metadata that is redundant or inconsistent. After updating the reference data, this can be reverted.
I also found out where this difference comes from: in the current master branch, the data is passed over from the original cubes into a new array here:
The incoming data is I've created a new PR #941 to correct this change and to update the reference data. That way we can confirm that the current PR doesn't break the preprocessor. |
Okay, so I think I've found all the quirks that surfaced after the introduction of the new regression tests. In summary:
To do:
@stefsmeets can you check my diagnosis and approve the proposed solution(s)? |
Okay, so now you can see that the only remaining failures are the daily sample data tests: FAILED tests/sample_data/multimodel_statistics/test_multimodel.py::test_multimodel_regression_day[full-365_day] This is because of #937. If you look at the test output, you can see that the shapes are inconsistent (2 (old) vs 62 (new) days in 2 months). I will now revert the coordinate check. This will introduce more failures due to different time attributes etc. as discussed above. |
This reverts commit 2bd2d62.
Indeed, now there are 8 failures. E.g. Next I've update the reference data so that:
|
All tests pass now. @stefsmeets and/or @nielsdrost can you do a final check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice work!
Good work @Peter9192 and @stefsmeets and @bouweandela and @valeriupredoi ! |
This PR simplifies the way time is handled in the multimodel statistics preprocessor. This should make it easier to understand what the code does, and hence to debug it.
These changes are inspired by some recent unexpected behavior when cubes with different calendars were passed to the function (#665). The first important change, therefore, is to fix the calendar for the output cubes to a standard format. Simultaneously, I added a function that makes all input cubes also use this same standard calendar.
Having consistent time arrays make time indexing much easier. Previously, complex mappings were needed to align the indexes representing the same time points in different cubes. Now, we can just get any time from any cube, without worrying too much about the index. Furthermore, I aligned the code between 'assemble overlap data' and 'assemble full data' to the extent that they could be merged into a single function.
Tasks
If writing a new/modified preprocessor function, please update the documentationNot necessary