-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert grids to/ from unstructured #217
Conversation
Codecov ReportBase: 84.71% // Head: 85.03% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #217 +/- ##
==========================================
+ Coverage 84.71% 85.03% +0.31%
==========================================
Files 33 35 +2
Lines 1452 1483 +31
==========================================
+ Hits 1230 1261 +31
Misses 222 222
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
If you like it, let's go with that
None from me
I would do 2 (users can always do
Hmm yes tricky. My instinct would be 'flat' or 'long' as they feel more like the usual data science terms. You could use the names
Do we have to do this? Maybe it's naive, but I feel like we should be able to preserve nans unless the user doesn't want to? If we have to keep the nans, then sounds fine (I'll make suggestions during review) Thanks for a great PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, couple of questions about naming and exactly how things are setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the helpful feedback! I agree stack
and unstack
are more intuitive names (flatten
might work as well but I would not know how to name the inverse operation). Let's see how this looks like:
tas_masked = xru.mask.mask_ocean(tas)
tas_stacked = xru.grid.stack_lat_lon(tas_masked)
tas_unstacked = xr.grid.unstack_lat_lon(tas_stacked, coords_orig)
Maybe cell_dim="cell"
should then be stack_dim="cell"
?
We remove the <NA>
because our statistical functions don't handle them, and, because we don't emulate the ocean (currently), it allows to remove ~70% of the data. That this cannot be round-tripped is an annoying little thing that bugs me without end. A minimal example:
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
%matplotlib
air = xr.tutorial.open_dataset("air_temperature")
coords_orig = air[["lat", "lon"]]
air = air.air
# add a row of Na -> e.g. by masking the ocean
air[:, 12, :] = np.NaN
air_stacked = air.stack(cell=("lat", "lon"))
air_stacked = air_stacked.dropna("cell")
air_unstacked = air_stacked.unstack("cell")
air_unstacked_aligned = xr.align(air_unstacked, coords_orig, join="right")[0]
print(f"{air.lat.shape=}")
print(f"{air_unstacked.lat.shape=}")
print(f"{air_unstacked_aligned.lat.shape=}")
# ---
f, axs = plt.subplots(2, 1)
# difficult to see but one lat-band is too big
air_unstacked.isel(time=0).plot(ax=axs[0])
air_unstacked_aligned.isel(time=0).plot(ax=axs[1])
Yep sounds good to me! |
Co-authored-by: znichollscr <114576287+znichollscr@users.noreply.github.com> Co-authored-by: Zeb Nicholls <zebedee.nicholls@climate-energy-college.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I implemented the naming updates. I also changed cell_dim="cell"
to stack_dim="gridcell"
.
I'll merge in a few days unless I get more feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, nice work!
Thanks! My plan is to create a tutorial with a potential workflow once all these PRs are done. |
isort . && black . && flake8
CHANGELOG.rst
This is the first of several of PRs which will add helper functions to deal with
Dataset
andDataArray
objects. This first one deals with converting arrays from regular grids to unstructured grids (i.e. flattening thelon
andlat
dimension). What will follow are PRs on masking (land & antarctica), calculating the global mean, and potentially, stacking ensembles, and calculating anomalies.I do have a number of questions - most of them concerning naming.
General question: The functions are currently in
mesmer.xarray_utils
. I imagined this as follows:Does this sound good? I had a number of other ideas but start to prefer this name - as the functionality is xarray-specific. However, I am open to suggestions.
Other ideas I had:
I now went with
cell_dim="cell"
- objections? See discussion in unstructured grid #105.Should the function be (quasi-)top-level (1) or not (2)?
What about the name
unstructured
? I think it comes from models that have a truly unstructured grid (e.g. ICON), and their spatial data has to be 1D. Thus, the name may not be entirely correct in this context as in most cases we will be flattening a regular grid. So what is a better name? (I mean I am happy to keep the current name if people understand what it means.)There is a difficulty with round-tripping because we remove
cells
that areNaN
. We need the original coordinates to restore the original coordinates. I decided that the user will be responsible to supplying them. Other solutions did not convince me (round-trip land-only unstructured grid #117). This may look like:This is a draft - I plan to tag some people later.