Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save crs_wkt on netCDF save #4719

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wjbenfold
Copy link
Contributor

@wjbenfold wjbenfold commented Apr 27, 2022

🚀 Pull Request

Description

One small code change, many test changes


Consult Iris pull request check list

@jamesp
Copy link
Member

jamesp commented May 9, 2022

This seems sensible, what does the cf conventions say about wkt? this will change everyones output netcdf files by default right?

@wjbenfold
Copy link
Contributor Author

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#use-of-the-crs-well-known-text-format

It will change their output results, but not what their file means, so I think that's fine?

The only bit of the CF conventions wrt. crs_wkt that slightly concerns me is that axes should be in the correct order when it's given, which I sort of assume we do but I don't have evidence on that.

@jamesp
Copy link
Member

jamesp commented May 9, 2022

From the link you put above:

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property grid mapping attributes as well as by crs_wkt.

As long as we are doing that seems reasonable.
Do we have any way of testing that the WKT is representing the same thing as the cs object parameters or can we rely on cartopy for that?

@wjbenfold
Copy link
Contributor Author

wjbenfold commented May 9, 2022

Do we have any way of testing that the WKT is representing the same thing as the cs object parameters or can we rely on cartopy for that?

We're trusting our as_cartopy_crs and Cartopy to correctly turn our parameters into a Cartopy CRS (and by extension a proj CRS, which the Cartopy CRS extends and which actually carries the to_wkt method) but both of those are trustworthy / should be tested elsewhere.

@jamesp
Copy link
Member

jamesp commented May 9, 2022

This definitely needs a what's new / adding to the release notes for which ever release it goes out in.

I'm wondering if this should be the default behaviour or whether it should be behind a flag, given that it is changing the output iris produces? The thing that worries me is how little control users have over what goes into their netcdf file, and that it is redundant info in the sense that the wkt is already encoded in other params. if there was a problem that made the file incompatible with the next piece of software in their workflow, how could the user work around it?

@wjbenfold
Copy link
Contributor Author

I can future flag it? There aren't users clamouring for this to be the default behaviour. It would mean a warning on pretty much every netCDF save though.

We could add to the arguments that can be supplied to the netCDF writer, but having made that addition to the API we'd have to support that bloat of the API (and leave it in there or future flag its removal) if we want to get rid of it at a later point. Maybe we should future flag it so that people can try it out then when it becomes on by default they can still turn it off if it turns out to cause issues, then once it becomes fixed (at least within Iris) people will have found any bugs and/or adapted?

@larsbarring
Copy link
Contributor

Does this PR also take care of what is discussed in CF/cf-conventions issue #223, i.e. to explicitly state the coordinate axis ordering in the content of the grid_mapping attribute of the data variable? This was introduced in CF-1.8.

@wjbenfold
Copy link
Contributor Author

Does this PR also take care of what is discussed in CF/cf-conventions issue #223, i.e. to explicitly state the coordinate axis ordering in the content of the grid_mapping attribute of the data variable? This was introduced in CF-1.8.

It doesn't, but you're right that looking at that issue it appears we'd need to include that at the same time. It shouldn't be too hard as we already have the information when we write the variable, so it looks like just an addition to what we include in the grid_mapping field of a variable.

@pp-mo do you have opinions on this?

@jamesp jamesp marked this pull request as draft May 25, 2022 08:30
@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2023

In order to maintain a backlog of relevant PRs, we automatically label them as stale after 500 days of inactivity.

If this PR is still important to you, then please comment on this PR and the stale label will be removed.

Otherwise this PR will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Oct 8, 2023
@larsbarring
Copy link
Contributor

Yes, I think that this is still relevant.

@github-actions github-actions bot removed the Stale A stale issue/pull-request label Oct 10, 2023
@trexfeathers trexfeathers added the Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info label Feb 23, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Will Benfold seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@trexfeathers
Copy link
Contributor

trexfeathers commented Apr 26, 2024

Discussed yesterday in a Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info taming meeting (link). Following the discussion, @trexfeathers and @ESadek-MO have drafted an updated plan, to be reviewed again at the next meeting:

One important note from the meeting is that we currently store no arbitrary attributes on the CoordSystem class, which prevents us from storing crs_wkt as it stands. We see this as definitely a good thing, because crs_wkt has no impact on Iris behaviour, but could definitely cause inconsistencies if the CoordSystem itself was modified, but the crs_wkt attribute was not. It therefore seems safest to write out crs_wkt fresh every time a NetCDF file is saved.

Step 1 - finish #4719

We are comfortable ALWAYS saving crs_wkt to a file. This improves Iris' compliance with the CF guidelines, and makes the files more helpful to downstream software. If there is anyone who finds the inclusion of crs_wkt problematic, we expect them to be in the minority, and ncdata provides the ideal tools for removing the attribute during Iris saving.

In the interest of breaking the task down, we plan to leave the changes to grid_mapping for Step 2.

Step 2 - enable Iris to encode the OGC axis order into the grid_mapping attribute: #3388

Encoding axis ordering in grid_mapping has a good chance of breaking downstream code. E.g. Iris would currently error (?) if it loaded a CRS with grid_mapping in this format. So it is important that this is a opt-in switch.

We propose a new @property on the CoordSystem class: CoordSystem.encode_axis_order. This defaults to False, preserving the existing behaviour. When switched to True, grid_mapping will be written using the new format e.g. data:grid_mapping = "crs: latitude, longitude".

We could potentially detect situations where this is NEEDED to make sense of the crs_wkt attribute, and issue a warning if CoordSystem.encode_axis_order is False.

Iris also needs to be able to understand this on loading.

Name of property is definitely up for debate

Step 3 - make it easier to modify coord_systems for whole Cubes

The proposal in Step 2 will often give rise to code like this:

for cube in my_cube_list:
    for coord in cube.coords():
    	coord.coord_system.encode_axis_order = True
my_cube_list.save("tmp.nc")

Which is quite clunky. We also regularly find user confusion about the difference between Coord.coord_system and Cube.coord_system(), and adding the need to do the above will only increase the frequency of confusion.

So: it would be good to provide more convenience for modifying coord_system at the Cube level - i.e. automatically modify the coord_system of all Coords attached to the Cube, assuming they are identical.

If sensible, we think it should take the form of Cube.coord_system - a @property that can be set - rather than the current method (coord_system()), which just invites confusion. The method would have to stay until the next major release of course.

@ESadek-MO
Copy link
Contributor

Step 3 suggestion is going to be clunky; coordinate_system objects aren't shared across coordinates, so would either need a rehaul of how it works, or more effort on the user's side.
Instead, we would offer a docstring tutorial in the new code. This seems to be the more sensible option.

@trexfeathers trexfeathers added Dragon Sub-Task 🦎 https://github.com/orgs/SciTools/projects/19?pane=info and removed Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info labels Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dragon Sub-Task 🦎 https://github.com/orgs/SciTools/projects/19?pane=info
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

7 participants