-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API extending xgcm vertical regridding #388
Conversation
Tasks left:
The current plan for handling
If auto derive is selected the vertical coordinate attributes will be checked for |
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #388 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 15 +1
Lines 1428 1517 +89
=========================================
+ Hits 1428 1517 +89
☔ View full report in Codecov by Sentry. |
Hey @jasonb5, I threw the notes from the past few meetings here since they are action items. Feel free to check-box or update as needed. Action Items from Meetings (11/2022 - 1/2023)
|
def create_grid(**kwargs: CoordOptionalBnds) -> xr.Dataset: | ||
"""Creates a grid from coordinate mapping. | ||
|
||
Parameters | ||
---------- | ||
lat : Union[np.ndarray, xr.DataArray] | ||
Array of latitude values. | ||
lon : Union[np.ndarray, xr.DataArray] | ||
Array of longitude values. | ||
lat_bnds : Optional[Union[np.ndarray, xr.DataArray]] | ||
Array of bounds for latitude values. | ||
lon_bnds : Optional[Union[np.ndarray, xr.DataArray]] | ||
Array of bounds for longitude values. | ||
**kwargs : CoordOptionalBnds | ||
Mapping of coordinate name and data with optional bounds. See | ||
:py:data:`xcdat.axis.VAR_NAME_MAP` for valid coordinate names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jason, my suggestion here was to make the parameters for create_grid
more explicit.
In my opinion, using **kwargs
over an explicit args/kwargs can create confusion and reduce readability since there isn't set number of clearly defined parameters for the function. Here is a post a read about **kwargs
: https://softwareengineering.stackexchange.com/a/294486
I would suggest keeping the args and kwargs, then extending as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pochedls @chengzhuzhang @lee1043 @tomvothecoder
How do we feel about the following function to replace the current create_grid
.
def create_grid(x: GridCoord, y: GridCoord, z: GridCoord) -> xr.Dataset:
Where GridCoord
could be any of the following forms.
(name, data)
(name, data, attrs)
(name, data, bnds)
(name, data, bnds, attrs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, this is a tuple where:
- name = string
- data = array of floats
- attrs = dictionary mapping str to str
- bnds = 2d array of floats
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder Yes that's correct. The data
and bnds
values can be a list, ndarray or DataArray (1d/2d respectively).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming. I am curious about the benefits you see in using a tuple of sub-arguments over an xr.DataArray
? I'm pretty sure this was discussed with the team during our meetings, but I didn't really weigh in because there's a lot to track!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah – I have instances where I have a numpy array and want to create a DataArray (with correct axis information). I think this is kind of common. @mzelinka – were you saying that you have cases where it would be helpful to generate a new axis (with correct metadata/attributes)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we include a create_axis() function (similar to the one at bottom)?
I can see a function like create_axis
making the process of creating xr.DataArray axes faster and less prone to error. Maybe this can be scoped out in another GitHub issue since I don't think this PR is dependent on this possible feature?
I had a few questions (for myself too):
- When and how often might users want/need to create axes instead of using the ones in the file?
- How do we decide what is considered the most sensible default CF attributes for each axis? Xarray sets some like
calendar="standard"
for the time axis, and we set a few others like units for latitude/longitude. - If we decide this makes sense to implement, we should consider calling it
create_dim
since we haveget_dim_keys()
andget_dim_coords()
.
I also like the idea of having a static dictionary to store default CF attributes. I think cf-xarray and/or xarray might already have this implemented. We have default attrs set in various parts of the code-base (e.g., calendar=standard
for the time axis). It'll be cleaner to have a single reference point for default attrs.
# axis.py
CF_DEFAULT_ATTRS = {
"lat": '{'units': 'degrees_north',
'axis': 'Y',
'long_name': 'latitude',
'standard_name': 'latitude'
},
....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pochedls @tomvothecoder Lets move this discussion to the new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update (#496) will go in a new PR after this PR is merged right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, this will be resolved in a separate PR.
I see what's going on here, when horizontal regridding was first implemented the method would call Should we even be adding bounds outside of what was generated during the regridding process? If we do want the methods to fill out the bounds should it be for all dimensions or only the dimensions involved in the process e.g. horizontal generates only "X", "Y"?
I haven't been able to reproduce this yet, I'm not sure what is using the yaksa package. |
The original dataset includes So – we shouldn't generate the bounds, but I do think we should carry over the original bounds (that are unaffected by horizontal regridding). I'm not sure how much we need to worry about yaksa, I get the warning on main branch, too (even after re-creating the dev environment). |
@pochedls This logic sounds good, I'll update accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jason, here is my final review. The suggestions are mostly renaming things.
mock_regridder = mock.MagicMock() | ||
mock_regridder.return_value.vertical.return_value = "output data" | ||
|
||
mock_data = mock.MagicMock() | ||
|
||
with mock.patch.dict(accessor.VERTICAL_REGRID_TOOLS, {"xgcm": mock_regridder}): | ||
output = self.ac.vertical("ts", mock_data, tool="xgcm", target_data=None) | ||
|
||
assert output == "output data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of MagicMock is neat here. For those interested (probably just me), the xgcm vertical regridder object is being mocked (mock_regridder
) and set as the xgcm
dictionary value in accessor.VERTICAL_REGRID_TOOLS
. The return value of mock_regridder
is set to "output_data"
, which is being tested in the assertion on line 1088. This test is to check if the correct vertical regridding tool is being referenced.
The actual result of the regridder object is not being tested (hence "output_data"
) because it is already tested through class TestXGCMRegridder
.
) | ||
|
||
regridder = regrid_tool(self._ds, output_grid, **options) | ||
output_ds = regridder.horizontal(data_var, self._ds) | ||
|
||
return output_ds | ||
|
||
def vertical( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on #501, are we also considering calling this vertical_xgcm
and not having a general wrapper with tool="xgcm"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, I'll address it in the next PR.
def create_grid(**kwargs: CoordOptionalBnds) -> xr.Dataset: | ||
"""Creates a grid from coordinate mapping. | ||
|
||
Parameters | ||
---------- | ||
lat : Union[np.ndarray, xr.DataArray] | ||
Array of latitude values. | ||
lon : Union[np.ndarray, xr.DataArray] | ||
Array of longitude values. | ||
lat_bnds : Optional[Union[np.ndarray, xr.DataArray]] | ||
Array of bounds for latitude values. | ||
lon_bnds : Optional[Union[np.ndarray, xr.DataArray]] | ||
Array of bounds for longitude values. | ||
**kwargs : CoordOptionalBnds | ||
Mapping of coordinate name and data with optional bounds. See | ||
:py:data:`xcdat.axis.VAR_NAME_MAP` for valid coordinate names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update (#496) will go in a new PR after this PR is merged right?
xcdat/regridder/base.py
Outdated
Coord = Union[np.ndarray, xr.DataArray] | ||
|
||
CoordOptionalBnds = Union[Coord, Tuple[Coord, Coord]] | ||
|
||
|
||
def preserve_bounds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this function will be used publicly, or should it be private (_preserve_bounds()
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing it won't be used publicly, I'll rename it .
xcdat/regridder/base.py
Outdated
output_ds: xr.Dataset, | ||
output_grid: xr.Dataset, | ||
input_ds: xr.Dataset, | ||
ignore_dims: List[CFAxisKey], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_ds: xr.Dataset, | |
output_grid: xr.Dataset, | |
input_ds: xr.Dataset, | |
ignore_dims: List[CFAxisKey], | |
input_ds: xr.Dataset, | |
output_grid: xr.Dataset, | |
output_ds: xr.Dataset, | |
ignore_dims: List[CFAxisKey], |
It's more intuitive to me to have the input params input_ds
and output_grid
first since they contain the bounds info, which are copied/preserved to the output_ds
.
I'm thinking input -> output basically. Not a big deal though, especially if this should be a private function (my previous comment).
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
🎉🎊 |
Description
Adds vertical regridding via xgcm.
Checklist
If applicable: