Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

indexes related error from xarray v0.14.0 when calling esmlab.resample(ds, freq='ann') #155

Closed
klindsay28 opened this issue Oct 24, 2019 · 7 comments · Fixed by #156
Closed
Labels
bug Something isn't working

Comments

@klindsay28
Copy link

When I the commands

import xarray as xr
import esmlab
ds = xr.open_dataset('/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries/FG_CO2_ocn_piControl_00.nc')
ds_ann = esmlab.resample(ds, freq='ann')

I get an error with the following traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/esmlab/core.py", line 780, in resample
    weights=weights, method=method
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/esmlab/core.py", line 556, in compute_ann_mean
    computed_dset = dset.apply(weighted_mean_arr, wgts=wgts)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/xarray/core/dataset.py", line 4140, in apply
    for k, v in self.data_vars.items()
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/xarray/core/dataset.py", line 4140, in <dictcomp>
    for k, v in self.data_vars.items()
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/esmlab/core.py", line 547, in weighted_mean_arr
    (darr * wgts).resample({self.time_coord_name: 'A'}).sum(dim=self.time_coord_name)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/xarray/core/dataarray.py", line 2503, in func
    self, other = align(self, other, join=align_type, copy=False)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES/lib/python3.7/site-packages/xarray/core/alignment.py", line 298, in align
    "indexes along dimension {!r} are not equal".format(dim)
ValueError: indexes along dimension 'time' are not equal

Output of esmlab.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

esmlab: 2019.4.27.post40
xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.2
scipy: 1.3.1
xesmf: 0.2.1
cftime: 1.0.3.4
dask: 2.6.0
distributed: 2.6.0
setuptools: 41.4.0
pip: 19.3.1
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

@klindsay28 klindsay28 added the bug Something isn't working label Oct 24, 2019
@mnlevy1981
Copy link
Contributor

@andersy005 and I were working through NCAR/intake-esm-datastore#37 this afternoon and eventually ended up at the same error. With #156 I no longer got the error in my notebook, so I think it's pretty promising for this issue as well.

@klindsay28
Copy link
Author

Applying #156 does enable ds_ann = esmlab.resample(ds, freq='ann') to run with no error. However, the resulting dataset looks odd to me:

>>> ds_ann
<xarray.Dataset>
Dimensions:     (d2: 2, region: 4, time: 1200, year: 1200)
Coordinates:
  * region      (region) object 'Global' 'SH_mid_lat' 'low_lat' 'NH_mid_lat'
    time        (year) object 0001-07-01 17:05:00 ... 1200-07-01 17:00:00
  * year        (year) int64 1 2 3 4 5 6 7 ... 1195 1196 1197 1198 1199 1200
Dimensions without coordinates: d2
Data variables:
    FG_CO2      (time, region) float64 -0.165 0.1448 -0.9223 ... -0.9603 0.612
    time_bound  (d2, time) object 0001-01-01 01:59:59 ... 1201-01-01 00:00:00
Attributes:
    history:  \n2019-10-25 09:01:22.862617 esmlab.resample(<DATASET>, freq="a...

There is a new dimension, year, with an associated coordinate. time is somehow associated with year, instead of time. I don't know what it means that time is lacking an * in the list of coordinates for the dataset. Additionally, if I attempt to write the generated dataset to a file, I get an error from xarray:

>>> ds_ann.to_netcdf('foo.nc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/core/dataset.py", line 1536, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/backends/api.py", line 1071, in to_netcdf
    dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/backends/api.py", line 1117, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/backends/common.py", line 293, in store
    variables, attributes = self.encode(variables, attributes)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/backends/common.py", line 382, in encode
    variables, attributes = cf_encoder(variables, attributes)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/conventions.py", line 746, in cf_encoder
    _update_bounds_encoding(variables)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/conventions.py", line 424, in _update_bounds_encoding
    "'{0}' before writing to a file.".format(v.name, attrs["bounds"]),
AttributeError: 'Variable' object has no attribute 'name'

@klindsay28
Copy link
Author

I'm also unable to select a slice along the time dimension in the generated dataset.

>>> ds_ann.sel(time=slice('0801-01-01', '1001-01-01'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/core/dataset.py", line 2000, in sel
    self, indexers=indexers, method=method, tolerance=tolerance
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/core/coordinates.py", line 392, in remap_label_indexers
    obj, v_indexers, method=method, tolerance=tolerance
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/core/indexing.py", line 261, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/site-packages/xarray/core/indexing.py", line 140, in convert_label_indexer
    "cannot use a dict-like object for selection on "
ValueError: cannot use a dict-like object for selection on a dimension that does not have a MultiIndex

@klindsay28
Copy link
Author

The updated #156 enables me to select a slice along the time dimension on the dataset returned by esmlab.resample.

I'm also able to write the result to a netCDF file, if I run the following prior to calling to_netcdf

for key in ['units', 'calendar']:
    ds_ann.time.encoding[key] = ds.time.encoding[key]

It looks like esmlab.resample does not propagate these encoding settings that to_netcdf relies on. That might deserve an esmlab issue of its own.

Thanks for the quick work on this @andersy005

@andersy005
Copy link
Contributor

I just fixed the encoding propagation issue as well:

In [1]: import xarray as xr                                                                                                                                         

In [2]: ds = xr.open_dataset('/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries/FG_CO2_ocn_piControl_00.nc')                                        

In [3]: ds_ann = esmlab.resample(ds, freq='ann')                                                                                                                    

In [4]: import esmlab                                                                                                                                               

In [5]: ds_ann = esmlab.resample(ds, freq='ann')                                                                                                                    

In [6]: ds_ann                                                                                                                                                      
Out[6]: 
<xarray.Dataset>
Dimensions:     (d2: 2, region: 4, time: 1200)
Coordinates:
  * region      (region) object 'Global' 'SH_mid_lat' 'low_lat' 'NH_mid_lat'
  * time        (time) object 0001-07-01 17:05:00 ... 1200-07-01 17:00:00
Dimensions without coordinates: d2
Data variables:
    FG_CO2      (time, region) float64 -0.165 0.1448 -0.9223 ... -0.9603 0.612
    time_bound  (d2, time) object 0001-01-01 01:59:59.999999 ... 1201-01-01 00:00:00
Attributes:
    history:  \n2019-10-25 14:20:33.431614 esmlab.resample(<DATASET>, freq="a...

In [7]: ds_ann.time.encoding                                                                                                                                        
Out[7]: 
{'dtype': dtype('float64'),
 '_FillValue': 9.969209968386869e+36,
 'units': 'days since 0000-01-01',
 'calendar': 'noleap'}

In [8]: ds_ann.time_bound.encoding                                                                                                                                  
Out[8]: 
{'dtype': dtype('float64'),
 '_FillValue': 9.969209968386869e+36,
 'units': 'days since 0000-01-01',
 'calendar': 'noleap'}

In [9]: ds_ann.sel(time=slice('0801-01-01', '1001-01-01'))                                                                                                          
Out[9]: 
<xarray.Dataset>
Dimensions:     (d2: 2, region: 4, time: 200)
Coordinates:
  * region      (region) object 'Global' 'SH_mid_lat' 'low_lat' 'NH_mid_lat'
  * time        (time) object 0801-07-01 17:00:00 ... 1000-07-01 17:00:00
Dimensions without coordinates: d2
Data variables:
    FG_CO2      (time, region) float64 0.1078 0.3866 -0.9026 ... -0.9721 0.6262
    time_bound  (d2, time) object 0801-01-01 00:00:00 ... 1001-01-01 00:00:00
Attributes:
    history:  \n2019-10-25 14:20:33.431614 esmlab.resample(<DATASET>, freq="a...

In [10]: ds_ann.to_netcdf("/glade/u/home/abanihi/scratch/foo.nc")           

@klindsay28
Copy link
Author

Thanks!

@mnlevy1981
Copy link
Contributor

I was running into the same issue (losing the encoding on the time dimension), and using @andersy005's latest master commit is working great for me. Thanks for pointing it out @klindsay28, I was like "oh, that's weird I need to specify a time index instead of a year" but didn't connect the dots on what must've gone wrong to make that the case. Anyway, I'm back to having years on the x-axis of my plots... I think my notebook is down to just intake-esm issues at this point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants