Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing DataArray equality using built-in '==' operator leads to mutilated DataArray.attrs dictionary #6852

Closed
4 tasks done
l-johnston opened this issue Jul 31, 2022 · 3 comments · Fixed by #6857
Closed
4 tasks done
Labels
bug regression topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@l-johnston
Copy link

What happened?

In previous versions of xarray, testing numerical equivalence of two DataArrays was possible using the built-in operator '==' and without side affects. Now in version 2022.6.0, when one DataArray lacks an attribute that the other DataArray has, the DataArray with the attribute is mutilated during comparison leading to an empty attrs dictionary.

What did you expect to happen?

DataArray_1 == DataArray_2 should not have side affects.

Minimal Complete Verifiable Example

import xarray as xr
da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
da_withunits.frequency.attrs["units"] = "GHz"
print(da_withunits.frequency.units)
da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
print(da_withunits == da_withoutunits)
print(da_withunits.frequency.units)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

GHz
<xarray.DataArray (frequency: 3)>
array([ True,  True,  True])
Coordinates:
  * frequency  (frequency) int32 1 2 3
Traceback (most recent call last):
  File "d:\projects\ssdv\mvce.py", line 9, in <module>
    print(da_withunits.frequency.units)
  File "...\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py", line 256, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'units'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252') libhdf5: None libnetcdf: None

xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.23.1
scipy: 1.9.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.2.0
pip: 22.2.1
conda: None
pytest: 7.1.2
IPython: 8.4.0
sphinx: None

@l-johnston l-johnston added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 31, 2022
@keewis
Copy link
Collaborator

keewis commented Jul 31, 2022

can you try if setting keep_attrs=True helps?

That's wrong, I can reproduce the side-effects. Not sure where that's coming from, though. And interestingly, only the first operand is mutated, da_withoutunits == da_withunits does not drop the units on da_withunits.

@keewis keewis removed the needs triage Issue that has not been reviewed by xarray team member label Jul 31, 2022
@l-johnston
Copy link
Author

keep_attrs=True doesn't help

In [1]: import xarray as xr
In [2]: xr.set_options(keep_attrs=True)
Out[2]: <xarray.core.options.set_options at 0x1789a959fa0>
In [3]: da_withunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [4]: da_withunits.frequency.attrs["units"] = "GHz"
In [5]: da_withunits.frequency.units
Out[5]: 'GHz'
In [6]: da_withoutunits = xr.DataArray([1, 1, 1], coords={"frequency": [1, 2, 3]})
In [7]: da_withunits == da_withoutunits
Out[7]:
<xarray.DataArray (frequency: 3)>
array([ True,  True,  True])
Coordinates:
  * frequency  (frequency) int32 1 2 3

In [8]: da_withunits.frequency.units
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 da_withunits.frequency.units
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\xarray\core\common.py:256, in AttrAccessMixin.__getattr__(self, name)
    254         with suppress(KeyError):
    255             return source[name]
--> 256 raise AttributeError(
    257     f"{type(self).__name__!r} object has no attribute {name!r}"
    258 )
AttributeError: 'DataArray' object has no attribute 'units'

@TomNicholas TomNicholas added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Aug 1, 2022
@keewis
Copy link
Collaborator

keewis commented Aug 1, 2022

bisecting tells me this is a regression introduced by #6389. Looking at the code, this happens because copying the variables with variables.copy() makes a shallow copy of the dictionary (and not its values), which means that we're actually mutating the Dataset variables. If I change that line to

# make a shallow copy of each variable
new_variables = {name: var.copy() for name, var in variables.items()}

we stop mutating the dataset.

cc @benbovy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug regression topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants