Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to CF standard axes for accessing dimension names #20

Merged
merged 11 commits into from
Nov 15, 2021

Conversation

cisaacstern
Copy link
Contributor

A continuation of #19, which I accidentally closed.

This PR achieves the recommendation made in #19 (comment) (with the exception of the reference system part), in that

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json --reference-system=4326

now yields the same result as

$ xstac examples/terraclimate/item-template.json \
    zarr-https examples/terraclimate/item.json \
    --x-dimension=lon --y-dimension=lat --reference-system=4326

because the correct values for the x and y dimensions are inferred from the CF namespace in

# xstac/_xstac.py L21-36

CF_STANDARD_AXES = dict(temporal_dimension="T", x_dimension="X", y_dimension="Y")


def maybe_use_cf_standard_axis(kw, kw_name, ds):
    if kw is None:
        try:
            kw = ds.cf[CF_STANDARD_AXES[kw_name]].name
        except KeyError:
            logger.warning(
                f"Kwarg `{kw_name}` is None and `{CF_STANDARD_AXES[kw_name]}` is not a valid key "
                "of the provided dataset's `cf` namespace. Therefore, STAC object is generating "
                f"without a `{kw_name}` in its Datacube Extension. To correct this, either pass a "
                f"string value to `{kw_name}`, or make `ds.cf['{CF_STANDARD_AXES[kw_name]}']` "
                "accessible using, e.g., `cf_xarray`'s `guess_coord_axis` method."
            )
    return kw

which is called on each dimension kwarg in xarray_to_stac.

The logger warning is my attempt to make it clear to the user what's happening if the key is missing; we don't want to raise an error because there may be situations where dimensions are intentionally omitted, as noted in #19 (comment).

I'm using the axes names instead of the standard names, because I understand them to be more generalizable than "latitude" and "longitude". The axes-based approach does work both for the existing tests as well as the Terraclimate example. If we do want to go this route, I could build in some additional testing for explicit dimension name passing as well as the KeyError edge case.

Copy link
Collaborator

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

One comment on handling the failures.

Can you also add a test? I'd want to make sure that users can override things. Maybe something silly like x_dimension="time" and check the result.

xstac/_xstac.py Outdated
if kw is None:
try:
kw = ds.cf[CF_STANDARD_AXES[kw_name]].name
except KeyError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to handle this case. In general, I'd assume people don't look at logs and so just logging the warning isn't great, and might just annoy people who don't want that dimension.

How about this: the default of None will mean get it from the CF conventions by default, but we just raise this KeyError if it's not present for time, y, and x.

Users who have those dims, but don't / can't use CF conventions can specify time="my-time-dim".

Users who don't have those dims can drop them explicitly with time=False.

So I think the whole try / except can be removed, or you could use it to reraise with a better error message

try:
    ...
except KeyError as e:
    raise KeyError("better message about setting {dim}=False to drop") from e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two descriptive errors in c3e1001

@TomAugspurger
Copy link
Collaborator

Just saw this:

we don't want to raise an error because there may be situations where dimensions are intentionally omitted, as noted in #19 (comment).

I think #19 (comment) might have been mistaken about the behavior of time_dimension=None. I think that time_dimension=False means no time dimension. So we can safely use time_dimension=None to mean infer from CF conventions.

@cisaacstern
Copy link
Contributor Author

cisaacstern commented Nov 14, 2021

Added two error conditions to the cf handler func, and tests for both. Also added testing for explicit dimension name passing via the "explicit_dims" parametrization of test_xarray_to_stac.

In working on the tests, I found the single test_xstac.py a bit unwieldy to navigate, so I made moved it to a tests directory with a confest.py. Hope that's okay?

I think that time_dimension=False means no time dimension.

This remains the one untested (untestable?) aspect. This hypothetical test_no_time_dimension is commented-out because the resulting item fails

https://github.com/TomAugspurger/xstac/blob/2d794ea51472ea84edb21ee9a4dc877ddaae4277/xstac/_xstac.py#L413

My understanding is that STAC objects require a temporal dimension. If so, what would be the use case for time_dimension=False?

@cisaacstern
Copy link
Contributor Author

This remains the one untested (untestable?) aspect.

Changed recommendation in 78dd461 to omit suggesting *_dimension=False, pending further insight on if this is possible.

xstac/_xstac.py Outdated
)
if not hasattr(ds, "cf"):
raise AttributeError(
f"Kwarg `{kw_name}` is None and the dataset does not have a `cf` namespace. Please "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this error message right? I would expect that importing cf_xarray here would ensure that the accessor is available, regardless of whether it was imported when the Dataset was created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this error message right?

No. Removed in b1b4004

I would expect that importing cf_xarray here would ensure that the accessor is available

Correct. I hadn't realized this. And I think removing the cf_xarray import in conftest.py (where the test datasets are created) serves as a de-facto test/verification of this expectation: b1b4004#diff-e52e4ddd58b7ef887ab03c04116e676f6280b824ab7469d5d3080e5cba4f2128

@TomAugspurger TomAugspurger merged commit e825c2d into stac-utils:main Nov 15, 2021
@TomAugspurger
Copy link
Collaborator

Great, thanks @cisaacstern!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants