Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compliance checker is not handling CF 1.9 files correctly, and is complaining about CF 1.8 issues that are not correct #1093

Closed
mjbrodzik opened this issue Jun 18, 2024 · 21 comments
Assignees

Comments

@mjbrodzik
Copy link

Version of compliance checker running:
5.0.0

Describe the checker this affects:
CF

Attach a minimal CDL or NetCDF file which is able to reproduce the issue

To Reproduce:
Uploaded attached file, and ran the checker
NSIDC0630_GRD_EASE2_N25km_GCOMW1_AMSR2_E_10.7H_20230627_v2.0.nc.gz
using CF-1.8, since it is the most recent one to select.

Describe the issue below:
A number of the reported issues are complaining about issues that should be updated for CF 1.9, specifically:

  1. "??2.2 Data Types
  • The variable TB failed because the datatype is uint16
  • The variable TB_num_samples failed because the datatype is uint8
  • The variable TB_std_dev failed because the datatype is uint16"

CF 1.9 now allows unsigned types.

  1. "??3.5 Flags
  • TB_num_samples's flag_values must be an array of values not <class 'numpy.uint8'>"

We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value. (This doesn't appear to be a difference between CF 1.8 and CF 1.9)

  1. "??5.6 Horizontal Coordinate Reference Systems, Grid Mappings, Projections
  • Incidence_angle's coordinate variable "y" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of Incidence_angle's dimensions
  • Incidence_angle's coordinate variable "x" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of Incidence_angle's dimensions
  • TB's coordinate variable "y" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of TB's dimensions
  • TB's coordinate variable "x" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of TB's dimensions
  • TB_std_dev's coordinate variable "y" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of TB_std_dev's dimensions
  • TB_std_dev's coordinate variable "x" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of TB_std_dev's dimensions"

We note that CF1.8 and CF1.8 both say: "To geo-reference data horizontally with respect to
the Earth, a grid mapping variable may be provided by the data variable, using
the grid_mapping attribute. If the coordinate variables for a horizontal grid
are not longitude and latitude, then a grid_mapping variable provides the
information required to derive longitude and latitude values for each grid
location. If no grid mapping variable is referenced by a data variable, then
longitude and latitude coordinate values shall be supplied in addition to the
required coordinates. For example, the Cartesian coordinates of a map projection
may be supplied as coordinate variables and, in addition, two-dimensional
latitude and longitude variables may be supplied via the coordinates attribute
on a data variable."

Since our data are projected, we interpreted this to mean that we could include x
and y as the coordinate variables (in projected meters), and together with the
"grid_mapping" attribute on each variable, this precluded the requirement to
also include 2D latitude/longitude coordinate variables. We think the checker is
not correctly interpreting the convention in this case.

  1. "??2.6 Attributes
  • ??2.6.2 references global attribute should be a non-empty string "

Our references global attribute contains a non-empty string, so we think this
might be a bug in the checker.

  1. "??4.1 Latitude Coordinate
  • latitude variable 'y' should define valid units for latitude
    ??4.2 Longitude Coordinate
  • longitude variable 'x' should define valid units for longitude"

We think this is a further misunderstanding on the checker's understanding of the convention on horizontal
geo-referencing. Since the data are projected, the y(x) variable units have nothing to do with latitude(longitude) units.

  1. "??8.1 Packed Data
  • Variable TB and add_offset must be equivalent data types or TB must be of type byte, short, or int and add_offset must be float or double
  • Variable TB_std_dev and add_offset must be equivalent data types or TB_std_dev must be of type byte, short, or int and add_offset must be float or double
  • Variable Incidence_angle and add_offset must be equivalent data types or Incidence_angle must be of type byte, short, or int and add_offset must be float or double
  • Variable TB_time and add_offset must be equivalent data types or TB_time must be of type byte, short, or int and add_offset must be float or double
  • Variable TB and scale_factor must be equivalent data types or TB must be of type byte, short, or int and scale_factor must be float or double
  • Variable TB_std_dev and scale_factor must be equivalent data types or TB_std_dev must be of type byte, short, or int and scale_factor must be float or double
  • Variable Incidence_angle and scale_factor must be equivalent data types or Incidence_angle must be of type byte, short, or int and scale_factor must be float or double
  • Variable TB_time and scale_factor must be equivalent data types or TB_time
    must be of type byte, short, or int and scale_factor must be float or double "

We think this is a problem with the checker expecting not to find unsigned
types, which is legitimate for CF1.8. See above, CF1.9 allows unsigned types.

@hot007
Copy link

hot007 commented Jun 20, 2024

Related to this, I found #972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please?
We have some files being written now to CF1-10 and even 1-11 standard and we can't check them.
Thanks!

@benjwadams benjwadams self-assigned this Jul 9, 2024
@benjwadams
Copy link
Contributor

benjwadams commented Jul 9, 2024

Related to this, I found #972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please?
We have some files being written now to CF1-10 and even 1-11 standard and we can't check them.
Thanks!

Per recent interest in continuing development, and realignment of project funds the answer is soon. CF 1.9 and 1.10 don't add a tremendous number of features versus some earlier releases. If you are interested in assisting the process, please create a separate issue with a file and detail what you'd expect to have checked in 1.9. Certain sections such as https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling
currently have a dearth of existing files and examples we can check against, complicating interpretation of that section.

@benjwadams
Copy link
Contributor

Regarding OP's post here, this mostly looks to be a data type issue which should be fairly straightforward to fix.

@benjwadams
Copy link
Contributor

We do actually check for unsigned int support here:

https://github.com/ioos/compliance-checker/blob/develop/compliance_checker/tests/test_cf.py#L3141-L3148

I'll dig to figure out what's going on.

@mwengren
Copy link
Member

Related to this, I found #972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please?
We have some files being written now to CF1-10 and even 1-11 standard and we can't check them.
Thanks!

Per recent interest in continuing development, and realignment of project funds the answer is soon. CF 1.9 and 1.10 don't add a tremendous number of features versus some earlier releases. If you are interested in assisting the process, please create a separate issue with a file and detail what you'd expect to have checked in 1.9. Certain sections such as https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling
currently have a dearth of existing files and examples we can check against, complicating interpretation of that section.

@benjwadams in #972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

We should make sure those projects are in alignment and if we don't already have CF 1.9 supported in the latest CC release, issue a new release in the near future that does support it, even if the compression section isn't implemented (assuming most everything else in 1.9 already is).

@ocefpaf
Copy link
Member

ocefpaf commented Jul 11, 2024

@benjwadams in #972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

Micah, this page is built using the README and that will be updated when #1098 is merged.

@benjwadams
Copy link
Contributor

Compliance checker web is also not linked to the latest release of Compliance Checker via GitHub Actions, so it needs to be built against Compliance Checker manually.

@benjwadams
Copy link
Contributor

??2.6.2 references global attribute should be a non-empty string "

Hmm, you have an array of strings:

>>> ds.references
['Long, D. G. and M. J. Brodzik. 2016. Optimum Image Formation for Spaceborne Microwave Radiometer Products. IEEE Trans. Geosci. Remote Sensing, 54(5):2763-2779, doi:10.1109/TGRS.2015.2505677.\n', 'Algorithm Theoretical Basis Document:  https://nsidc.org/sites/nsidc.org/files/technical-references/MEaSUREs_CETB_ATBD.pdf\n']

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#description-of-file-contents mentions:

The attribute values are all character strings.

I'll have to clarify this with CF group. We don't usually get descriptive values in global attributes with >= 2ary array elements.

@mwengren
Copy link
Member

@benjwadams in #972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

Micah, this page is built using the README and that will be updated when #1098 is merged.

Thanks @ocefpaf. In my reply above, I had been looking at the Compliance Checker Web website (https://compliance.ioos.us), but as you point out our README needs an update as well. Looks good!

@ocefpaf
Copy link
Member

ocefpaf commented Jul 12, 2024

Thanks @ocefpaf. In my reply above, I had been looking at the Compliance Checker Web website (https://compliance.ioos.us), but as you point out our README needs an update as well. Looks good!

The web checker probably needs to be updated and redeployed before a new release is issued.

Back when we still used setup.py we did not had an entrypoint for CF 1.9. See https://github.com/ioos/compliance-checker/pull/1024/files#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L58

Since then we moved to pyproject.toml and #1097 added the CF 1.9 entrypoiint. That is why we need a new release first.

@benjwadams is that something you can do?

@benjwadams
Copy link
Contributor

Yes, I will tag a new release.

@mjbrodzik
Copy link
Author

??2.6.2 references global attribute should be a non-empty string "

Hmm, you have an array of strings:

>>> ds.references
['Long, D. G. and M. J. Brodzik. 2016. Optimum Image Formation for Spaceborne Microwave Radiometer Products. IEEE Trans. Geosci. Remote Sensing, 54(5):2763-2779, doi:10.1109/TGRS.2015.2505677.\n', 'Algorithm Theoretical Basis Document:  https://nsidc.org/sites/nsidc.org/files/technical-references/MEaSUREs_CETB_ATBD.pdf\n']

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#description-of-file-contents mentions:

The attribute values are all character strings.

I'll have to clarify this with CF group. We don't usually get descriptive values in global attributes with >= 2ary array elements.

Hi, @benjwadams , it's so great to see the action to address my issues, thank you very much. Have you received any clarification on this question? I've been using this approach (making the attribute a list of strings when there are multiple references that are to be included) for many years (several ongoing data sets), and I'd like to know if I need to change it. Just last week on a new data set, a reviewer asked me if it is a legitimate approach. So I'd be interested in the answer. I may only be able to correct new data sets going forward, but I can also notify colleagues at the NSIDC DAAC to keep alert for this potential problem.

@benjwadams
Copy link
Contributor

We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value

This has come up a few times and in fact I thought this was addressed for flag_values. NetCDF seems to represent underlying attributes as arrays, but NetCDF-Python will store a single valued array as a scalar value if I recall correctly.

@benjwadams
Copy link
Contributor

Version of compliance checker running:
5.0.0

You are running an older version of compliance checker.

"??3.5 Flags
TB_num_samples's flag_values must be an array of values not <class 'numpy.uint8'>"
We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value. (This doesn't appear to be a difference between CF 1.8 and CF 1.9)

This behavior was fixed in 6ee802e . It has to do with how netCDF4-Python treats single element arrays due to the netCDF data model as far as I can tell -- assigning a single element array or scalar to an attribute returns a scalar when the attribute is queried.

@benjwadams
Copy link
Contributor

Please abridge and/or strikeout superfluous issues after running against newer version of compliance checker.

@benjwadams
Copy link
Contributor

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#longitude-coordinate mentions

Optionally, the longitude type may be indicated additionally by providing the standard_name attribute with the value longitude, and/or the axis attribute with the value X.

A similar case follows for axis "Y" being treated as longitude variables.

This is why these are being detected as lon/lat variables:

https://github.com/ioos/compliance-checker/blob/develop/compliance_checker/cfutil.py#L549-L563

@benjwadams
Copy link
Contributor

I have to trace a little more, but "true" latitude/longitude variables are detected using the method above, and thus are likely related to the errors being raised of the form:

Incidence_angle's coordinate variable "y" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of Incidence_angle's dimensions

@benjwadams
Copy link
Contributor

cf-convention/discuss#340
Current discussion seems to have arrived at the consensus that global attributes taking string should be one string only. However, I could put a warning in that arrays of string in NetCDF4-Python should instead be a single string.

@mjbrodzik
Copy link
Author

Thanks for this discussion, I'll pass it along to NSIDC DAAC and hopefully they can get the recommendation out further afield.

@benjwadams
Copy link
Contributor

Closing issue. Please abridge remaining points into separate issues.

@mjbrodzik
Copy link
Author

I'm a little confused @benjwadams . I was assuming that I'd be able to go online and get a pull-down for CF 1.9, then try my file again and see what may be left. But when I go online, I'm still only seeing 1.8 and prior versions. So I'm not sure how to verify what changes have been made. Please let me know what to expect for next steps to verify the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants