Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add temporal utilities and averaging functionalities #107

Merged
merged 6 commits into from
Mar 23, 2022

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Sep 21, 2021

Description

This PR adds temporal averaging functionality, including calculating time series averaging, climatology, and departures.

Summary of Changes

  • Operates on a data variable within a dataset, and returns a dataset
  • Stores metadata regarding temporal averaging in the data variable's attributes
  • Supported grouping frequencies:
    • Time series average:
      • "year", "season", "month", "day", "hour"
        • "year" groups by year
        • "season" groups by (year, season)
        • "month" groups by (year, month)
        • "day" groups by (year, month, day)
        • "hour" groups by (year, month, day, hour)
    • Climatology and departures:
      • "season", "month", "day"
        • "season" groups by season
        • "month" groups by month
        • "day" groups by (month, day)
  • Support for weighted and unweighted averaging (defaults to True)
  • Support for centering time coordinates using time bounds (defaults to true)
  • Configuration specific to "season" frequency
    • Predefined seasons:
      • "dec_mode" - mode for the seasons that includes December ("DJF" or "JFD") (defaults to "DJF")
      • "drop_incomplete_djf" - if "dec_mode" is "DJF", drop incomplete "DJF" seasons (defaults to False)
    • Custom seasons:
      • "custom_seasons" - list of lists containing the 3-letter string representation of months (e.g., "Jan")

Validation

If you are interested in the validation of this feature against CDAT, please checkout the notebooks found here: xCDAT/xcdat-validation#28

Future Enhancements

Feature Specification Documentation: https://docs.google.com/document/d/1klHh5LLYcmNSopvptSVYmcgtUVLFnLMueRhfCF6tbaU/edit#

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder changed the title Feature/47 climatology Feature/47 climatology (WIP) Sep 21, 2021
@tomvothecoder tomvothecoder changed the title Feature/47 climatology (WIP) Add feature for time series averaging and calculating climatology and departure. Sep 21, 2021
@tomvothecoder tomvothecoder marked this pull request as draft September 21, 2021 22:45
@codecov-commenter
Copy link

codecov-commenter commented Sep 30, 2021

Codecov Report

Merging #107 (ff1341f) into main (201a575) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              main      #107    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            8         9     +1     
  Lines          381       741   +360     
==========================================
+ Hits           381       741   +360     
Impacted Files Coverage Δ
xcdat/__init__.py 100.00% <100.00%> (ø)
xcdat/axis.py 100.00% <100.00%> (ø)
xcdat/bounds.py 100.00% <100.00%> (ø)
xcdat/dataset.py 100.00% <100.00%> (ø)
xcdat/spatial_avg.py 100.00% <100.00%> (ø)
xcdat/temporal.py 100.00% <100.00%> (ø)
xcdat/utils.py 100.00% <100.00%> (ø)
xcdat/xcdat.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93a923e...ff1341f. Read the comment docs.

@tomvothecoder tomvothecoder self-assigned this Sep 30, 2021
@tomvothecoder tomvothecoder added this to the v1.0.0 milestone Sep 30, 2021
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch 2 times, most recently from 4d26c8b to b3f28c1 Compare October 4, 2021 21:39
@tomvothecoder tomvothecoder changed the title Add feature for time series averaging and calculating climatology and departure. Add support for calculating climatology and departure and time series averaging Oct 4, 2021
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from 83e1bd6 to 09ca9e5 Compare October 11, 2021 16:30
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from 5f1e0c7 to daaeb7a Compare October 29, 2021 21:42
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from b13fadc to 0a2699b Compare November 9, 2021 16:37
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch 3 times, most recently from 50ba587 to c72ead8 Compare November 11, 2021 19:12
@tomvothecoder tomvothecoder changed the title Add support for calculating climatology and departure and time series averaging Add temporal averaging functionality Nov 12, 2021
@tomvothecoder tomvothecoder changed the title Add temporal averaging functionality Add temporal averaging functionality (timeseries, climatology and departure) Nov 12, 2021
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch 4 times, most recently from 47360f5 to 2f1712d Compare November 17, 2021 23:47
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from 2f1712d to 669f6fa Compare December 2, 2021 23:15
@chengzhuzhang
Copy link
Collaborator

Hey @chengzhuzhang, ds.xcdat.center_time() returns a dataset, so you need to update the variable ds.

Thank you, Tom. center_time is very useful. I'm wondering is it straightforward to make it as an argument in open_dataset().

@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from ef28763 to 3e464af Compare January 27, 2022 00:10
@tomvothecoder
Copy link
Collaborator Author

Hey @chengzhuzhang, ds.xcdat.center_time() returns a dataset, so you need to update the variable ds.

Thank you, Tom. center_time is very useful. I'm wondering is it straightforward to make it as an argument in open_dataset().

No problem, glad to hear! Yes, adding center_time as a kwarg to open_dataset() is straightforward so I'll do that.

@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from 5af1cef to eb18948 Compare February 2, 2022 22:14
Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor review updates

@durack1
Copy link
Collaborator

durack1 commented Mar 1, 2022

@tomvothecoder this looks great, I wonder are all the various flavours of calendar managed by XCDAT?

@tomvothecoder
Copy link
Collaborator Author

Thanks @durack1! Xarray uses cftime under the hood, so XCDAT should be capable of handling the various calendar types if we implement features using primarily the xarray APIs.

https://xarray.pydata.org/en/stable/user-guide/weather-climate.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range

https://xarray.pydata.org/en/stable/user-guide/time-series.html#creating-datetime64-data

When decoding/encoding datetimes for non-standard calendars or for dates before year 1678 or after year 2262, xarray uses the cftime library. It was previously packaged with the netcdf4-python package under the name netcdftime but is now distributed separately. cftime is an optional dependency of xarray.

@durack1
Copy link
Collaborator

durack1 commented Mar 2, 2022

Ok this is great, it seems all the cdtime calendars are included in cftime, so we're good, great!

So the cftime library includes (https://unidata.github.io/cftime/api.html#cftime.num2date)

calendar: describes the calendar used in the time calculations. All the values currently
defined in the [CF metadata convention](http://cfconventions.org/cf-conventions/cf-conventions#calendar)
are supported. Valid calendarsstandard’, ‘gregorian’, ‘proleptic_gregorian’ ‘noleap’,
‘365_day’, ‘360_day’, ‘julian’, ‘all_leap’, ‘366_day’. Default isstandard’, which is a
mixed Julian/Gregorian calendar.

Just as an FYI, dropping some cdtime links here for documentation:
https://github.com/CDAT/cdtime/blob/master/Src/cdtimemodule.c#L1422-L1430 <- cdtime calendar types,

    PyDict_SetItemString(d, "error", PyCdtime_ErrorObject); /* export exception */
    DefineLongConstant(d, "StandardCalendar", (long)cdStandard);
    DefineLongConstant(d, "GregorianCalendar", (long)cdStandard);
    DefineLongConstant(d, "JulianCalendar", (long)cdJulian);
    DefineLongConstant(d, "MixedCalendar", (long)cdMixed);
    DefineLongConstant(d, "NoLeapCalendar", (long)cdNoLeap);
    DefineLongConstant(d, "Calendar360", (long)cd360);
    DefineLongConstant(d, "ClimCalendar", (long)cdClim);
    DefineLongConstant(d, "ClimLeapCalendar", (long)cdClimLeap);
    DefineLongConstant(d, "DefaultCalendar", (long)cdMixed); /* initialize the default calendar */

Which calls
https://github.com/CDAT/libcdms/blob/master/include/cdms.h#L149-L167

#define cdStandardCal   0x11
#define cdClimCal        0x0
#define cdHasLeap      0x100
#define cdHasNoLeap    0x000
#define cd365Days     0x1000
#define cd360Days     0x0000
#define cdJulianCal  0x10000
#define cdMixedCal   0x20000

typedef enum cdCalenType {
	cdStandard    = ( cdStandardCal | cdHasLeap   | cd365Days),
	cdJulian      = ( cdStandardCal | cdHasLeap   | cd365Days | cdJulianCal),
	cdNoLeap      = ( cdStandardCal | cdHasNoLeap | cd365Days),
	cd360         = ( cdStandardCal | cdHasNoLeap | cd360Days),
	cdClim        = ( cdClimCal     | cdHasNoLeap | cd365Days),
	cdClimLeap    = ( cdClimCal     | cdHasLeap   | cd365Days),
	cdClim360     = ( cdClimCal     | cdHasNoLeap | cd360Days),
	cdMixed       = ( cdStandardCal | cdHasLeap   | cd365Days | cdMixedCal)
}  cdCalenType;

@tomvothecoder
Copy link
Collaborator Author

Hey everyone, this PR is ready for a final round of review. Here are the associated validation notebooks that I mentioned on our email chain: xCDAT/xcdat-validation#28

@lee1043, @chengzhuzhang, and @pochedls -- feel free to try out the API and look through the notebooks
@jasonb5 -- it will be nice if you can review the code for any optimizations, improvements, and fixes

I'm hoping to have this PR ready to be merged by our meeting next Wednesday. Thanks!

@chengzhuzhang
Copy link
Collaborator

chengzhuzhang commented Mar 16, 2022

Hi @tomvothecoder ,

My testing goes smoothly so far. Only have two high level comments:

  • It would be really handy, if this feature can include the capability to calculate weighted temporal average. Right now in xarray, one still has to generates the weights and then do something like (d * weights).groupby("time.month").sum(dim="time") I think temporal_average can cover this case straightforwardly? perhaps we can add a mean mode in addition or enable this if neither "climatology" nor "timeseries" is supplied?
  • And have a time coordinate, as well as a multi-index coordinate is a bit confusing. The additional step to convert the multi-index coordinates back to a new time coordinate seems a good way to clear this confusing. But it will require some additional work...

@durack1
Copy link
Collaborator

durack1 commented Mar 18, 2022

Folks I just came across this, might be worth a quick look to see alternatives https://ncar.github.io/esds/posts/2021/yearly-averages-xarray/

@chengzhuzhang @tomvothecoder @pochedls

Copy link
Collaborator

@jasonb5 jasonb5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of the code LGTM.

- Includes calculating time series averaging, climatology, and departures
- Add `bottleneck` to `setup.py` and `conda-env/dev.yml`
- Update fixtures for robust testing of edge cases
@tomvothecoder tomvothecoder force-pushed the feature/47-climatology branch from ac6ebab to d9492fa Compare March 22, 2022 22:59
- Rename class `TemporalAverageAccessor` to `TemporalAccessor`
- Fix tests for centering time with `open_dataset()` and `open_mfdataset()`
@tomvothecoder tomvothecoder dismissed jasonb5’s stale review March 23, 2022 21:45

The requested changes have been made and doesn't require a re-review.

@tomvothecoder tomvothecoder merged commit d75704e into main Mar 23, 2022
@tomvothecoder tomvothecoder deleted the feature/47-climatology branch March 23, 2022 21:50
@tomvothecoder tomvothecoder changed the title Add temporal averaging functionality (time series avg, climatology and departure) Add temporal utilities and averaging functionalities Mar 23, 2022
@tomvothecoder
Copy link
Collaborator Author

Hi @tomvothecoder ,

My testing goes smoothly so far. Only have two high level comments:

* It would be really handy, if this feature can include the capability to calculate weighted temporal average. Right now in `xarray`, one still has to generates the `weights` and then do something like `(d * weights).groupby("time.month").sum(dim="time")` I think `temporal_average` can cover this case straightforwardly? perhaps we can add a `mean` mode in addition or enable this if neither "climatology" nor "timeseries" is supplied?

* And have a time coordinate, as well as a multi-index coordinate is a bit confusing. The additional step to convert the multi-index coordinates back to a new time coordinate seems a good way to clear this confusing. But it will require some additional work...

I decided to carry these issues over to #201 and #202.

@chengzhuzhang
Copy link
Collaborator

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add temporal averaging functionality (time series avg, climatology and departure)
7 participants