Default to CF standard axes for accessing dimension names #20

cisaacstern · 2021-11-13T01:41:26Z

A continuation of #19, which I accidentally closed.

This PR achieves the recommendation made in #19 (comment) (with the exception of the reference system part), in that

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json --reference-system=4326

now yields the same result as

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json \
    --x-dimension=lon --y-dimension=lat --reference-system=4326

because the correct values for the x and y dimensions are inferred from the CF namespace in

# xstac/_xstac.py L21-36

CF_STANDARD_AXES = dict(temporal_dimension="T", x_dimension="X", y_dimension="Y")


def maybe_use_cf_standard_axis(kw, kw_name, ds):
    if kw is None:
        try:
            kw = ds.cf[CF_STANDARD_AXES[kw_name]].name
        except KeyError:
            logger.warning(
                f"Kwarg `{kw_name}` is None and `{CF_STANDARD_AXES[kw_name]}` is not a valid key "
                "of the provided dataset's `cf` namespace. Therefore, STAC object is generating "
                f"without a `{kw_name}` in its Datacube Extension. To correct this, either pass a "
                f"string value to `{kw_name}`, or make `ds.cf['{CF_STANDARD_AXES[kw_name]}']` "
                "accessible using, e.g., `cf_xarray`'s `guess_coord_axis` method."
            )
    return kw

which is called on each dimension kwarg in xarray_to_stac.

The logger warning is my attempt to make it clear to the user what's happening if the key is missing; we don't want to raise an error because there may be situations where dimensions are intentionally omitted, as noted in #19 (comment).

I'm using the axes names instead of the standard names, because I understand them to be more generalizable than "latitude" and "longitude". The axes-based approach does work both for the existing tests as well as the Terraclimate example. If we do want to go this route, I could build in some additional testing for explicit dimension name passing as well as the KeyError edge case.

TomAugspurger

Thanks, looks good.

One comment on handling the failures.

Can you also add a test? I'd want to make sure that users can override things. Maybe something silly like x_dimension="time" and check the result.

TomAugspurger · 2021-11-13T19:16:48Z

xstac/_xstac.py

+    if kw is None:
+        try:
+            kw = ds.cf[CF_STANDARD_AXES[kw_name]].name
+        except KeyError:


I'm not sure how to handle this case. In general, I'd assume people don't look at logs and so just logging the warning isn't great, and might just annoy people who don't want that dimension.

How about this: the default of None will mean get it from the CF conventions by default, but we just raise this KeyError if it's not present for time, y, and x.

Users who have those dims, but don't / can't use CF conventions can specify time="my-time-dim".

Users who don't have those dims can drop them explicitly with time=False.

So I think the whole try / except can be removed, or you could use it to reraise with a better error message

try: ... except KeyError as e: raise KeyError("better message about setting {dim}=False to drop") from e

Added two descriptive errors in c3e1001

TomAugspurger · 2021-11-13T19:20:53Z

Just saw this:

we don't want to raise an error because there may be situations where dimensions are intentionally omitted, as noted in #19 (comment).

I think #19 (comment) might have been mistaken about the behavior of time_dimension=None. I think that time_dimension=False means no time dimension. So we can safely use time_dimension=None to mean infer from CF conventions.

cisaacstern · 2021-11-14T00:54:43Z

Added two error conditions to the cf handler func, and tests for both. Also added testing for explicit dimension name passing via the "explicit_dims" parametrization of test_xarray_to_stac.

In working on the tests, I found the single test_xstac.py a bit unwieldy to navigate, so I made moved it to a tests directory with a confest.py. Hope that's okay?

I think that time_dimension=False means no time dimension.

This remains the one untested (untestable?) aspect. This hypothetical test_no_time_dimension is commented-out because the resulting item fails

https://github.com/TomAugspurger/xstac/blob/2d794ea51472ea84edb21ee9a4dc877ddaae4277/xstac/_xstac.py#L413

My understanding is that STAC objects require a temporal dimension. If so, what would be the use case for time_dimension=False?

cisaacstern · 2021-11-14T01:04:11Z

This remains the one untested (untestable?) aspect.

Changed recommendation in 78dd461 to omit suggesting *_dimension=False, pending further insight on if this is possible.

TomAugspurger · 2021-11-14T19:50:26Z

xstac/_xstac.py

+    )
+    if not hasattr(ds, "cf"):
+        raise AttributeError(
+            f"Kwarg `{kw_name}` is None and the dataset does not have a `cf` namespace. Please "


Is this error message right? I would expect that importing cf_xarray here would ensure that the accessor is available, regardless of whether it was imported when the Dataset was created.

Is this error message right?

No. Removed in b1b4004

I would expect that importing cf_xarray here would ensure that the accessor is available

Correct. I hadn't realized this. And I think removing the cf_xarray import in conftest.py (where the test datasets are created) serves as a de-facto test/verification of this expectation: b1b4004#diff-e52e4ddd58b7ef887ab03c04116e676f6280b824ab7469d5d3080e5cba4f2128

TomAugspurger · 2021-11-15T19:32:35Z

Great, thanks @cisaacstern!

cisaacstern added 6 commits November 12, 2021 16:59

remove CLI defaults

89ceadd

default to cf standard dim names

3d5ee32

import cf_xarray in tests

1298a3e

remove (now redundant) kwargs from tests

418bbbb

use cf standard *axes*

fa2b721

add cf_xarray as install req

34c9aef

TomAugspurger reviewed Nov 13, 2021

View reviewed changes

cisaacstern added 3 commits November 13, 2021 15:40

make test dir, add conftest

f575b8d

two error conditions for cf handler func

c3e1001

test error conditions and explicit dim passing

363b5f9

comment out dim=False suggestion

78dd461

TomAugspurger reviewed Nov 14, 2021

View reviewed changes

cf_xarray import in xstac adds cf namespace

b1b4004

TomAugspurger merged commit e825c2d into stac-utils:main Nov 15, 2021

cisaacstern mentioned this pull request Nov 16, 2021

Define dataset schema pangeo-forge/pangeo-forge-orchestrator#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to CF standard axes for accessing dimension names #20

Default to CF standard axes for accessing dimension names #20

cisaacstern commented Nov 13, 2021

TomAugspurger left a comment

TomAugspurger Nov 13, 2021

cisaacstern Nov 14, 2021

TomAugspurger commented Nov 13, 2021

cisaacstern commented Nov 14, 2021 •

edited

Loading

cisaacstern commented Nov 14, 2021

TomAugspurger Nov 14, 2021

cisaacstern Nov 15, 2021

TomAugspurger commented Nov 15, 2021

Default to CF standard axes for accessing dimension names #20

Default to CF standard axes for accessing dimension names #20

Conversation

cisaacstern commented Nov 13, 2021

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Nov 13, 2021

Choose a reason for hiding this comment

cisaacstern Nov 14, 2021

Choose a reason for hiding this comment

TomAugspurger commented Nov 13, 2021

cisaacstern commented Nov 14, 2021 • edited Loading

cisaacstern commented Nov 14, 2021

TomAugspurger Nov 14, 2021

Choose a reason for hiding this comment

cisaacstern Nov 15, 2021

Choose a reason for hiding this comment

TomAugspurger commented Nov 15, 2021

cisaacstern commented Nov 14, 2021 •

edited

Loading