Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some CF support for lon and lat #73

Closed
wants to merge 3 commits into from
Closed

Add some CF support for lon and lat #73

wants to merge 3 commits into from

Conversation

stefraynaud
Copy link

@stefraynaud stefraynaud commented Oct 28, 2019

This PR adds some support for CF conventions to search for longitude and latitude in Datasets, and rename them to 'lon' and 'lat'. Bounds data arrays are also renamed if found.

A module named 'cf' was created for that and unitests were added.

@codecov-io
Copy link

codecov-io commented Oct 28, 2019

Codecov Report

Merging #73 into master will increase coverage by 0.44%.
The diff coverage is 97.61%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #73      +/-   ##
=========================================
+ Coverage   95.76%   96.2%   +0.44%     
=========================================
  Files           6       7       +1     
  Lines         260     343      +83     
=========================================
+ Hits          249     330      +81     
- Misses         11      13       +2
Impacted Files Coverage Δ
xesmf/frontend.py 93.75% <91.66%> (-0.24%) ⬇️
xesmf/cf.py 98.61% <98.61%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7479110...9094213. Read the comment docs.

@stefraynaud
Copy link
Author

With the first commit, coordinates and bounds are automatically identified and renamed, if they respect CF conventions which are based on name, units and standard_name.

With the second commit, output coordinate names are automatically restored.

@JiaweiZhuang
Copy link
Owner

@stefraynaud Thanks for the PR! I had a long backlog and didn't get time to look at this -- sorry for the delay.

I agree that some degree of decoding feature is useful. On the other hand I am also worried about putting complex logic into xesmf, especially tricky edges cases like: which variable actually got used if the dataset contains all three variables 'lat_b', 'lat_bnds' 'latitude_bnds'.

To that end I summarized a meta-issue at #74. Would you be able to build the CF-convention support on top of the proposed xesmf.config.set(grid_name_dict=...)? That would be much easier to maintain and extend.

Also, will xarray.decode_cf be useful here?

CF_SPECS = {
'lon': {
'name': ['lon', 'longitude'],
'units': ['degrees_east', 'degree_east', 'degree_E', 'degrees_E',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the units and other attributes seems a bit too heavy for xesmf? A "CF-checker" is better for a standalone package or pushed to upstream like xarray.decode_cf

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that a CF checker for xarray is globally missing.
However, xarray is intended to remain of general purpose, and not to introduce notions of geographic and physical spatialisation.

As for the fact that these attributes are checked is the strict application of the CF conventions for longitude and latitude.
This prevent focusing on names.

xesmf/cf.py Outdated
ds: xarray.DataArray or xarray.Dataset
'''
# Suffixes for bounds
bsuffixes = ('_b', '_bounds', '_bnds')
Copy link
Owner

@JiaweiZhuang JiaweiZhuang Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here assumes a priority of '_b' > '_bounds' > '_bnds'. If more than one of them exist in the dataset, could it lead to a name conflict error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main caveat of this PR because it rely on names and not on attributes.

Don't blame me : a solution is to push forward the application of CF conventions and to use the bounds attribute as explained here. This makes unique link between coordinates and bounds.

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Nov 5, 2019

Also note that renaming the coordinate name alone does not solve CF-convention -- CF uses a boundary shape of (n_lat, n_lon, 4) while ESMPy expects (n_lat+1, n_lon+1) #14 (comment).

@stefraynaud
Copy link
Author

Also note that renaming the coordinate name alone does not solve CF-convention -- CF uses a boundary shape of (n_lat, n_lon, 4) while ESMPy expects (n_lat+1, n_lon+1) #14 (comment).

You are right, but this is beyond the scope of this PR which is focused on finding variables in datasets.

By the way, a conversion from the (n_lat, n_lon, 4) form to the (n_lat+1, n_lon+1) it straigthforward, but must raise a warning (my two cents).

It first look ate the "bounds" attribute, then search
for names using usual prefixes.
Copy link

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid renaming coordinates.
I believe the search for coordinates might be too complex to be clearly communicated to users. I can imagine questions such as "why did it work on this file and not this one ?"
We should strive to reduce the potential confusion between CF bounds and grid corners.

}


def get_coord_name_from_specs(ds, specs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly a more extensive search than what #74 provides. The question is do we want this ? Also, from a user perspective, all the different ways coordinates can be identified might create some confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it is the strict application of the CF conventions, which must be considered to be the reference for formatting dataset, and which are increasingly being followed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Checking with units first since they are mandatory. And then standard_name, just in case.



def get_bounds_name_from_coord(ds, coord_name,
suffixes=['_b', '_bnds', '_bounds']):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could get really confusing, because this function could either return netCDF bounds or the grid corners (_b). I suggest the code clearly distinguishes bounds from corners.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep probably. I made no difference in this PR regarding the bounds/corners problem.

return bounds_name
warnings.warn('invalid bounds name: ' + bounds_name)

# From suffixed names
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the identification by suffix.

lon_name = get_lon_name(ds)
if lon_name is not None:

# Search for bounds and rename
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have to rename coordinates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, renaming is for internal use only, just to make the code clearer. It is possible to not rename :)

aulemahal pushed a commit to Ouranosinc/xESMF that referenced this pull request Feb 18, 2021
aulemahal added a commit to Ouranosinc/xESMF that referenced this pull request Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants