Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep attributes for "bounds" variables #8924

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 0 additions & 20 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -794,24 +794,4 @@ def cf_encoder(variables: T_Variables, attributes: T_Attrs):

new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}

# Remove attrs from bounds variables (issue #2921)
Copy link
Contributor

@dcherian dcherian Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit more complex.

We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.

I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in _update_bounds_encoding and then modify this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit more complex.

We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.

Why do you need to set them if they are not set by the user?

I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in _update_bounds_encoding and then modify this loop.

They shouldn't be there if the user did not set them.
Maybe I am a bit confused here, could you elaborate?

Copy link
Contributor

@dcherian dcherian Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to set them if they are not set by the user?

To ensure that the two variables are encoded similarly as the standard suggests.

Our encode/decode logic treats all variables independently, so we need to do this to treat the time and time_bounds variables identically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to set them if they are not set by the user?

To ensure that the two variables are encoded similarly as the standard suggests.

Our encode/decode logic treats all variables independently, so we need to do this to treat the time and time_bounds variables identically.

I am only talking about writing the dataset to disk, so the encoding side of it before writing. I might see the point for time variables, but this could all be handled internally in .encoding (which is used for time variables anyway).

I still don't quite understand the logic for all the others. The standard suggets that if the attributes exist, they should be identical. If they don't exist, well then just don't bother and leave it to the reader to infer the encoding from the parent variable. I don't see the necessitiy to pollute .attrs for that.

for var in new_vars.values():
bounds = var.attrs["bounds"] if "bounds" in var.attrs else None
if bounds and bounds in new_vars:
# see http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries
for attr in [
"units",
"standard_name",
"axis",
"positive",
"calendar",
"long_name",
"leap_month",
"leap_year",
"month_lengths",
]:
if attr in new_vars[bounds].attrs and attr in var.attrs:
if new_vars[bounds].attrs[attr] == var.attrs[attr]:
new_vars[bounds].attrs.pop(attr)

return new_vars, attributes
14 changes: 8 additions & 6 deletions xarray/tests/test_coding_times.py
Original file line number Diff line number Diff line change
Expand Up @@ -732,23 +732,25 @@ def test_encode_time_bounds() -> None:

encoded, _ = cf_encoder(ds.variables, ds.attrs)
assert_equal(encoded["time_bounds"], expected["time_bounds"])
assert "calendar" not in encoded["time_bounds"].attrs
assert "units" not in encoded["time_bounds"].attrs

# if time_bounds attrs are same as time attrs, it doesn't matter
ds.time_bounds.encoding = {"calendar": "noleap", "units": "days since 2000-01-01"}
encoded, _ = cf_encoder({k: v for k, v in ds.variables.items()}, ds.attrs)
assert_equal(encoded["time_bounds"], expected["time_bounds"])
assert "calendar" not in encoded["time_bounds"].attrs
assert "units" not in encoded["time_bounds"].attrs

# for CF-noncompliant case of time_bounds attrs being different from
# time attrs; preserve them for faithful roundtrip
ds.time_bounds.encoding = {"calendar": "noleap", "units": "days since 1849-01-01"}
ds.time_bounds.encoding = {
"calendar": "gregorian",
"units": "days since 1849-01-01",
}
encoded, _ = cf_encoder({k: v for k, v in ds.variables.items()}, ds.attrs)
with pytest.raises(AssertionError):
assert_equal(encoded["time_bounds"], expected["time_bounds"])
assert "calendar" not in encoded["time_bounds"].attrs
assert encoded["time"].attrs["calendar"] == ds.time.encoding["calendar"]
assert (
encoded["time_bounds"].attrs["calendar"] == ds.time_bounds.encoding["calendar"]
)
assert encoded["time_bounds"].attrs["units"] == ds.time_bounds.encoding["units"]

ds.time.encoding = {}
Expand Down
Loading