Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of salem-style x, y, and z coordinates #14

Merged
merged 2 commits into from
Feb 16, 2022

Conversation

jthielen
Copy link
Collaborator

@jthielen jthielen commented Nov 20, 2021

Change Summary

As alluded to in #2, including dimension coordinates in the grid mapping/projection space is a key feature for integrating with other tools in the ecosystem like metpy and xgcm. In this (draft) PR, I've combined code ported from salem with some of my own one-off scripts and what already exists in xwrf to meet this goal. In particular, this introduces a pyproj dependency (for CRS handling and transforming the domain center point from lon/lat to easting/northing). Matching the assumptions already present in xwrf and salem, this implementation assumes we do not have a moving domain (which simplifies things greatly). Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

Also, because it existed in salem and my scripts alongside the dimension coordinate handling, I also included my envisioned diagnostic field calculations. These are deliberately limited to only those four fields that require WRF-specific handling:

  • ~~ 'T' going to potential temperature has a magic number offset of 300 K~~
  • ~~ 'P' and 'PB' combine to form pressure, and are not otherwise used~~
  • ~~ 'PH' and 'PHB' combine to form geopotential, and are not otherwise used~~
  • ~~ Geopotential to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere~~

Unless I'm missing something, any other diagnostics should be derivable using these or other existing fields in a non-WRF-specific way (and so, fit outside of xwrf). If the netcdf4 backend already handles Dask chunks, then this should "just work" as it is currently written. However, I'm not sure how this should behave with respect to lazy-loading when chunks are not specified, so that is definitely a discussion to have in relation to #10.

Right now, no tests are included, as this is just a draft implementation to get the conversation started on how we want to approach these features. So, please do share your thoughts and ask questions!

Related issue number

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable

@jthielen jthielen added the enhancement New feature or request label Nov 20, 2021
@lpilz
Copy link
Collaborator

lpilz commented Nov 23, 2021

Good work on the PR, I think this is a nice starting point. Did you get the dask integration working yet? I tried it and somehow the calculation, especially of the air_pressure diagnostic doesn't seem to be delayed. As a first try, I implemented _check_if_dask as hasattr(arr, 'chunks'), so these should be dask arrays which are getting added.

Copy link
Collaborator

@lpilz lpilz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments :)

xwrf/grid.py Outdated Show resolved Hide resolved
xwrf/grid.py Outdated Show resolved Hide resolved
This operation should be called before destaggering or any other cleaning operation.
"""
# Potential temperature
if _check_if_dask(dataset['T']):
Copy link
Collaborator

@lpilz lpilz Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only work if dask is present?

I guess we should either

  • only allow xwrf usage with dask -> check somewhere earlier for dask (like in __init__ or open_dataset)
  • also allow xwrf usage without dask -> remove these checks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just a placeholder to capture the intent of needing a Dask check of some kind before diagnostics are computed. It would be a good conversation to have as to what level of Dask-based operations are required vs. optional, and if optional, how those choices are specified.

xwrf/diagnostics.py Outdated Show resolved Hide resolved
xwrf/diagnostics.py Outdated Show resolved Hide resolved
@jthielen
Copy link
Collaborator Author

Good work on the PR, I think this is a nice starting point. Did you get the dask integration working yet? I tried it and somehow the calculation, especially of the air_pressure diagnostic doesn't seem to be delayed. As a first try, I implemented _check_if_dask as hasattr(arr, 'chunks'), so these should be dask arrays which are getting added.

Thanks! And no, Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet. Once I (or someone else) can figure out how best to route the chunks argument in open_dataset through the backend API properly, then this should be closer to working.

@lpilz
Copy link
Collaborator

lpilz commented Nov 23, 2021

And no, Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet.

I thought so. I'm also not really sure on how to do this and not getting a huge amount of info from the dask and xarray documentations. Maybe @andersy005 can help?

@andersy005
Copy link
Member

andersy005 commented Nov 23, 2021

Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet.

chunks are handled by xarray. So, I'd expect passing chunks in open_dataset() to suffice:

In [5]: ds = xr.open_dataset(path, engine="xwrf", chunks={"south_north": 50})

In [6]: ds
Out[6]: 
<xarray.Dataset>
Dimensions:  (Time: 1, south_north: 546, west_east: 480)
Coordinates:
    XLONG    (south_north, west_east) float32 dask.array<chunksize=(50, 480), meta=np.ndarray>
    XLAT     (south_north, west_east) float32 dask.array<chunksize=(50, 480), meta=np.ndarray>
Dimensions without coordinates: Time, south_north, west_east
Data variables:
    Q2       (Time, south_north, west_east) float32 dask.array<chunksize=(1, 50, 480), meta=np.ndarray>
    PSFC     (Time, south_north, west_east) float32 dask.array<chunksize=(1, 50, 480), meta=np.ndarray>

P.S.: The existing backend plugin isn't robust enough(things such as time decoding aren't handled properly when using dask)... I plan to look into this at some point. Also, notice how XLONG and XLAT are chunked coordinates... They should have been loaded in memory instead

@jthielen
Copy link
Collaborator Author

chunks are handled by xarray. So, I'd expect passing chunks in open_dataset() to suffice:
...
P.S.: The existing backend plugin isn't robust enough(things such as time decoding aren't handled properly when using dask)... I plan to look into this at some point. Also, notice how XLONG and XLAT are chunked coordinates... They should have been loaded in memory instead

Alright! So if I'm interpreting what you're saying correctly along with https://github.com/pydata/xarray/blob/main/xarray/backends/api.py#L335-L512, chunks is actually handled outside/after the backend API portion? If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

@andersy005
Copy link
Member

If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

Wouldn't creating custom backend arrays be overkill? Assuming we want to support reading files via the Python-netCDF4 library, we might be able to write a custom data store that borrows from xarray's NetCDF4DataStore: https://github.com/pydata/xarray/blob/5db40465955a30acd601d0c3d7ceaebe34d28d11/xarray/backends/netCDF4_.py#L291. With this custom datastore, we would have more control over what to do with variables, dimensions, attrs before passing them to xarray. Wouldn't this suffice for the data loading (without the derived quantities)?

I think there's value in keeping the backend plugin simple (e.g. performing simple tasks such as decoding coordinates, fixing attributes/metadata, etc) and everything else outside the backend. Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

Some of the benefits of deriving quantities outside the backend are that this approach:

(1) doesn't obfuscate what's going on,
(2) gives users the opportunity to fix aspects of the dataset that might be missed by xwrf during data loading before passing this cleaned dataset to the functionality for deriving quantities.
(3) removes the requirement for deriving quantities to be a lazy operation i.e. if your dataset is in memory, deriving the quantity is done eagerly...

@andersy005
Copy link
Member

andersy005 commented Nov 23, 2021

Some of the benefits of deriving quantities outside the backend are that this approach:

Also, Wouldn't it be beneficial for deriving quantities to be backend agnostic? I'm imagining cases in which the data have been post-processed and saved in a different format (e.g. Zarr) and you still want to be able to use the same code for deriving quantities on the fly.

@lpilz
Copy link
Collaborator

lpilz commented Nov 23, 2021

Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

This sounds like it factors directly into the "keep the solutions as general as possible (so that maybe also MPAS can profit from it)" discussion. However, I feel that we have to think about the user-perspective too. I don't have any set opinions on this and we should definitely discuss this maybe in a larger group too. Here some thoughts on this so far:

I think the reason users like wrf-python is because it is an easy one-stop-shop for getting wrf output to work with python - this is especially true because lots of users are scientists and not software engineers or programmers. I personally take from this point that it would be prudent to keep the UX as easy as possible. I think this is what the Backend-approach does really well. Basically users just have to add the engine='xwrf' kwarg and then it just works (TM). Meaning that it provides the users with CF-compliant de-WRFified meteo data. Also, given that the de-WRFification of the variable data is not too difficult (it's basically just adding fields for three variables), I think the overhead in complexity wouldn't be too great. However, while I do see that it breaks the conceptual barrier between data loading (and decoding etc.) and computation, this breakage would be required in order to provide the user with meteo data rather than raw wrf fields.

@andersy005 do you already have some other ideas on how one could handle this elegantly?

Also, should we move this discussion to a separate issue maybe?

@jthielen
Copy link
Collaborator Author

@andersy005 @lpilz I think discussing this API question (what xwrf features go in the backend and what (if any) go elsewhere) is good to discuss on its own issue. I'll refine the scope of this PR to just the coordinate operations (both because of that conversation, and that the approach used here won't really work anyway given how the backends and Dask chunking interact).

@jthielen jthielen changed the title Draft implementation of salem-style coordinates and basic diagnostics Draft implementation of salem-style coordinates Nov 23, 2021
xwrf/grid.py Outdated Show resolved Hide resolved
xwrf/grid.py Outdated Show resolved Hide resolved
@lpilz
Copy link
Collaborator

lpilz commented Jan 17, 2022

Something is still fishy with the CRS for me. If I transform like this:

trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_string(str(ds['wrf_projection'].values)))
x, y = trf.transform(8.553972274305062, 49.509090717658204)

and then use ds.interp(west_east=x, south_north=y), I get N 49.38863641, E 8.79648642 which is quite a ways off from the starting point. Can anyone point me to what I'm doing wrong?

Copy link
Collaborator

@lpilz lpilz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to add XGCM-attributes to vertical coordinates

xwrf/io_plugin.py Outdated Show resolved Hide resolved
@jthielen
Copy link
Collaborator Author

I've pushed some updates to remove the Berlin-specific conditions, fix the TRUELAT2 missing case, and to add CF/COMODO attrs to the vertical coords!


Something is still fishy with the CRS for me. If I transform like this:

trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_string(str(ds['wrf_projection'].values)))
x, y = trf.transform(8.553972274305062, 49.509090717658204)

and then use ds.interp(west_east=x, south_north=y), I get N 49.38863641, E 8.79648642 which is quite a ways off from the starting point. Can anyone point me to what I'm doing wrong?

I unfortunately don't quite have the right intuition to tell if that's a datum shift error as described in https://fabienmaussion.info/2018/01/06/wrf-projection/ or if something else is going on. What dataset are you using? Also, @fmaussion do you have any insight on this?

xwrf/grid.py Outdated Show resolved Hide resolved
@jthielen jthielen mentioned this pull request Jan 19, 2022
@lpilz
Copy link
Collaborator

lpilz commented Jan 20, 2022

I attached the dataset in question (had to rename it to zip but it's still a netCDF file).
wrfout_d03_2018-10-18_00:00:00_only_1time.zip

I don't think this is the problem described in the blog post, because we do explicitly add the spherical datum (a and b) to the Lambert projection, but maybe Fabien has some more insights.


Also: here the code

ds = xr.open_dataset("wrfout_d03_2018-10-18_00:00:00_only_1time.nc", chunks='auto', engine=xwrf.WRFBackendEntrypoint)
from pyproj import Transformer, CRS
trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_cf(ds['wrf_projection'].attrs))
x, y = trf.transform(8.553972274305062, 49.509090717658204)
var_in_question = ds.T.isel(bottom_top=0).interp(west_east=x, south_north=y)
print(var_in_question)

@fmaussion
Copy link

I attached the dataset in question (had to rename it to zip but it's still a netCDF file).

Can Salem read and plot this properly?

@fmaussion
Copy link

This looks OK in salem to me: https://gist.github.com/fmaussion/97beef70976231a58c914819966643f5

With a great plot haha:

image

@lpilz
Copy link
Collaborator

lpilz commented Feb 2, 2022

@jthielen Would you mind merging main into your branch? I'm currently already using your code so having it updated ​would be great :)

@lpilz
Copy link
Collaborator

lpilz commented Feb 2, 2022

Maybe we should add some tests of xwrf against salem.... I still could not get it to work (have recently had trouble with metpy.plots) so asserting it works would be useful

@jthielen jthielen changed the title Draft implementation of salem-style coordinates Implementation of salem-style x, y, and z coordinates Feb 11, 2022
@jthielen jthielen marked this pull request as ready for review February 11, 2022 01:54
@jthielen jthielen force-pushed the salem-style-coords branch 2 times, most recently from 36ce734 to 3223827 Compare February 11, 2022 02:00
@jthielen
Copy link
Collaborator Author

jthielen commented Feb 11, 2022

This PR has been reworked to be based on #44 (so for now should not be merged until I can rebase on main after #44 is merged), but is otherwise ready for review! Given the discussion in #36, I've included renaming south_north -> y and others likewise, but let me know whether or not I should keep that here.

@jthielen
Copy link
Collaborator Author

Now that #44 has been merged, this has been updated to just include the relevant commit!

Copy link
Member

@andersy005 andersy005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍🏽

The code coverage for postprocess.py is relatively low but we can address this in a separate PR

@andersy005
Copy link
Member

Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

Also, should we add a test for xgcm interoperability to ensure xgcm is able to construct the grid object?

@lpilz
Copy link
Collaborator

lpilz commented Feb 15, 2022

Unfortunately, with the new changes metpy.parse_cf() breaks for me as it seems to try to cast XTIME to meters. I am getting a DimensionalityError

var = var.metpy.convert_coordinate_units(coord_name, 'meters')
DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'meter' ([length])

I tried adding some units attributes and setting the coordinates kwarg of metpy.parse_cf(), but nothing helped. Does anybody have a hint as to what is happening here?

@kmpaul
Copy link
Contributor

kmpaul commented Feb 15, 2022

Unfortunately, with the new changes metpy.parse_cf() breaks for me as it seems to try to cast XTIME to meters. I am getting a DimensionalityError

var = var.metpy.convert_coordinate_units(coord_name, 'meters')
DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'meter' ([length])
I tried adding some units attributes and setting the coordinates kwarg of metpy.parse_cf(), but nothing helped. Does anybody have a hint as to what is happening here?

@lpilz: This sound like a good test to add.

@jthielen
Copy link
Collaborator Author

@lpilz Thanks for the heads up on that! Unidata/MetPy#2347 has been created upstream in MetPy to resolve it.

I've added a test against xgcm, and I can definitely do likewise against MetPy (and cf_xarray?) next for coordinate identification (it'll probably have to be version bounded given Unidata/MetPy#2347, but oh well).

@lpilz
Copy link
Collaborator

lpilz commented Feb 16, 2022

Thanks for always fixing these so quickly. Awesome work @jthielen!

While I in principle agree that we should test this against as many utilities which might be used in conjunction with xwrf as possible, maybe this can be done in future PRs. We could e.g. do this in the documentation phase when writing tutorials and stuff... I think this PR is quite nice already and doesn't have to be the MOAPRs ;)

@kmpaul
Copy link
Contributor

kmpaul commented Feb 16, 2022

@lpilz @jthielen: I totally agree that this can be done in future PRs. I'd just make a note of it in a new issue and then move forward. The fewer PRs we have blocking other work, the better.

Awesome work on this!

Copy link
Contributor

@kmpaul kmpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @jthielen! Thanks for all the great work!

@kmpaul
Copy link
Contributor

kmpaul commented Feb 16, 2022

Merge whenever ready.

@jthielen
Copy link
Collaborator Author

With 3 approvals, I went ahead and merged it after resolving the merge conflicts with main. #49 has been created to keep track of the testing-related concerns brought up here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coordinate UX Add to_xgcm_grid_dataset function to help read into XGCM
5 participants