convert grids to/ from unstructured #217

mathause · 2022-11-02T18:48:56Z

Closes round-trip land-only unstructured grid #117, closes unstructured grid #105
Tests added
Passes isort . && black . && flake8
Fully documented, including CHANGELOG.rst

This is the first of several of PRs which will add helper functions to deal with Dataset and DataArray objects. This first one deals with converting arrays from regular grids to unstructured grids (i.e. flattening the lon and lat dimension). What will follow are PRs on masking (land & antarctica), calculating the global mean, and potentially, stacking ensembles, and calculating anomalies.

I do have a number of questions - most of them concerning naming.

General question: The functions are currently in mesmer.xarray_utils. I imagined this as follows:
```
import mesmer.xarray_utils as mxu

ds = mxu.to_unstructured(ds)
```
Does this sound good? I had a number of other ideas but start to prefer this name - as the functionality is xarray-specific. However, I am open to suggestions.

Other ideas I had:
```
mesmer.data_utils
mesmer.preproc
mesmer.process
mesmer.tools
mesmer.utils
mesmer.xarray_utils
```
I now went with cell_dim="cell" - objections? See discussion in unstructured grid #105.

Should the function be (quasi-)top-level (1) or not (2)?

import mesmer.xarray_utils as mxu

ds = mxu.to_unstructured(ds) # (1)
ds = mxu.grid.to_unstructured(ds) # (2)

What about the name unstructured? I think it comes from models that have a truly unstructured grid (e.g. ICON), and their spatial data has to be 1D. Thus, the name may not be entirely correct in this context as in most cases we will be flattening a regular grid. So what is a better name? (I mean I am happy to keep the current name if people understand what it means.)
There is a difficulty with round-tripping because we remove cells that are NaN. We need the original coordinates to restore the original coordinates. I decided that the user will be responsible to supplying them. Other solutions did not convince me (round-trip land-only unstructured grid #117). This may look like:
```
import mesmer.xarray_utils as mxu

ds = mxu.grid.to_unstructured(ds) 
coords_orig = ds.coords.to_dataset()[["lon", "lat"]]

ds = mxu.grid.from_unstructured(ds, coords_orig) # (2)
```

This is a draft - I plan to tag some people later.

mesmer/xarray_utils/grid.py

codecov-commenter · 2022-11-03T13:52:18Z

Codecov Report

Base: 84.71% // Head: 85.03% // Increases project coverage by +0.31% 🎉

Coverage data is based on head (5a37c18) compared to base (1687f37).
Patch coverage: 96.00% of modified lines in pull request are covered.

❗ Current head 5a37c18 differs from pull request most recent head 3b38396. Consider uploading reports for the commit 3b38396 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #217      +/-   ##
==========================================
+ Coverage   84.71%   85.03%   +0.31%     
==========================================
  Files          33       35       +2     
  Lines        1452     1483      +31     
==========================================
+ Hits         1230     1261      +31     
  Misses        222      222

Flag	Coverage Δ
unittests	`85.03% <96.00%> (+0.31%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mesmer/xarray_utils/grid.py	`95.83% <95.83%> (ø)`
mesmer/xarray_utils/__init__.py	`100.00% <100.00%> (ø)`
mesmer/utils/regionmaskcompat.py	`95.45% <0.00%> (+0.33%)`	⬆️
mesmer/stats/linear_regression.py	`91.02% <0.00%> (+0.35%)`	⬆️
mesmer/calibrate_mesmer/train_gv.py	`93.33% <0.00%> (+1.66%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

znichollscr · 2022-11-08T00:13:55Z

I had a number of other ideas but start to prefer this name - as the functionality is xarray-specific

If you like it, let's go with that

2. objections?

None from me

3. Should the function be (quasi-)top-level (1) or not (2)?

I would do 2 (users can always do from mesmer.xarray_utils.grid import to_unstructured if they want something else

4. What about the name unstructured?

Hmm yes tricky. My instinct would be 'flat' or 'long' as they feel more like the usual data science terms. You could use the names stack_lat_lon and unstack_lat_lon so it's a bit clearer what is actually going on?

5. There is a difficulty with round-tripping because we remove cells that are NaN

Do we have to do this? Maybe it's naive, but I feel like we should be able to preserve nans unless the user doesn't want to? If we have to keep the nans, then sounds fine (I'll make suggestions during review)

Thanks for a great PR!

znichollscr

lgtm, couple of questions about naming and exactly how things are setup

mesmer/xarray_utils/grid.py

mathause

Thanks for the helpful feedback! I agree stack and unstack are more intuitive names (flatten might work as well but I would not know how to name the inverse operation). Let's see how this looks like:

tas_masked = xru.mask.mask_ocean(tas)

tas_stacked = xru.grid.stack_lat_lon(tas_masked)

tas_unstacked = xr.grid.unstack_lat_lon(tas_stacked, coords_orig)

Maybe cell_dim="cell" should then be stack_dim="cell"?

We remove the <NA> because our statistical functions don't handle them, and, because we don't emulate the ocean (currently), it allows to remove ~70% of the data. That this cannot be round-tripped is an annoying little thing that bugs me without end. A minimal example:

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

%matplotlib

air = xr.tutorial.open_dataset("air_temperature")
coords_orig = air[["lat", "lon"]]

air = air.air
# add a row of Na -> e.g. by masking the ocean
air[:, 12, :] = np.NaN

air_stacked = air.stack(cell=("lat", "lon"))
air_stacked = air_stacked.dropna("cell")

air_unstacked = air_stacked.unstack("cell")
air_unstacked_aligned = xr.align(air_unstacked, coords_orig, join="right")[0]

print(f"{air.lat.shape=}")
print(f"{air_unstacked.lat.shape=}")
print(f"{air_unstacked_aligned.lat.shape=}")

# ---

f, axs = plt.subplots(2, 1)
# difficult to see but one lat-band is too big
air_unstacked.isel(time=0).plot(ax=axs[0])
air_unstacked_aligned.isel(time=0).plot(ax=axs[1])

mesmer/xarray_utils/grid.py

znicholls · 2022-11-09T02:35:40Z

Maybe cell_dim="cell" should then be stack_dim="cell"?

Yep sounds good to me!

Co-authored-by: znichollscr <114576287+znichollscr@users.noreply.github.com> Co-authored-by: Zeb Nicholls <zebedee.nicholls@climate-energy-college.org>

mathause

Ok I implemented the naming updates. I also changed cell_dim="cell" to stack_dim="gridcell".

I'll merge in a few days unless I get more feedback.

mesmer/xarray_utils/grid.py

docs/source/api.rst

znicholls

Looks awesome, nice work!

mathause · 2022-11-11T09:20:14Z

Thanks! My plan is to create a tutorial with a potential workflow once all these PRs are done.

mathause added 5 commits November 2, 2022 19:07

convert grids to/ from unstructured

4e82386

cleanup

137e2be

more tests

d36b6db

make keyword only

54693a9

docs

3e2da21

mathause commented Nov 3, 2022

View reviewed changes

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved

Apply suggestions from code review

4b6a4e8

mathause marked this pull request as ready for review November 3, 2022 13:36

mathause requested review from jschwaab and znicholls November 3, 2022 13:37

mathause mentioned this pull request Nov 7, 2022

xarray_utils: mask ocean and Antarctica #219

Merged

4 tasks

znichollscr reviewed Nov 8, 2022

View reviewed changes

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved

mesmer/xarray_utils/grid.py Show resolved Hide resolved

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved

mathause commented Nov 8, 2022

View reviewed changes

mesmer/xarray_utils/grid.py Show resolved Hide resolved

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved

mathause and others added 4 commits November 10, 2022 15:08

Apply suggestions from code review

5f20323

Co-authored-by: znichollscr <114576287+znichollscr@users.noreply.github.com> Co-authored-by: Zeb Nicholls <zebedee.nicholls@climate-energy-college.org>

rename and split functions

81af240

update changelog

6d65dcd

move to top level

5a37c18

mathause commented Nov 10, 2022

View reviewed changes

mesmer/xarray_utils/grid.py Show resolved Hide resolved

mathause commented Nov 10, 2022

View reviewed changes

docs/source/api.rst Outdated Show resolved Hide resolved

Update docs/source/api.rst

3b38396

znicholls approved these changes Nov 10, 2022

View reviewed changes

mathause merged commit e8430f4 into MESMER-group:main Nov 11, 2022

mathause deleted the to_from_unstructured branch November 11, 2022 09:21

mathause mentioned this pull request May 11, 2023

rename and move data maniplation functions #237

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert grids to/ from unstructured #217

convert grids to/ from unstructured #217

mathause commented Nov 2, 2022 •

edited

Loading

codecov-commenter commented Nov 3, 2022 •

edited

Loading

znichollscr commented Nov 8, 2022

znichollscr left a comment

mathause left a comment

znicholls commented Nov 9, 2022

mathause left a comment

znicholls left a comment

mathause commented Nov 11, 2022

convert grids to/ from unstructured #217

convert grids to/ from unstructured #217

Conversation

mathause commented Nov 2, 2022 • edited Loading

codecov-commenter commented Nov 3, 2022 • edited Loading

Codecov Report

znichollscr commented Nov 8, 2022

znichollscr left a comment

Choose a reason for hiding this comment

mathause left a comment

Choose a reason for hiding this comment

znicholls commented Nov 9, 2022

mathause left a comment

Choose a reason for hiding this comment

znicholls left a comment

Choose a reason for hiding this comment

mathause commented Nov 11, 2022

mathause commented Nov 2, 2022 •

edited

Loading

codecov-commenter commented Nov 3, 2022 •

edited

Loading