Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: (Un-weighted) temporal averaging probably doesn't need bounds #437

Closed
pochedls opened this issue Mar 21, 2023 · 1 comment · Fixed by #579
Closed

[Bug]: (Un-weighted) temporal averaging probably doesn't need bounds #437

pochedls opened this issue Mar 21, 2023 · 1 comment · Fixed by #579
Assignees
Labels
priority: now Requires immediate attention type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. work: obvious

Comments

@pochedls
Copy link
Collaborator

What happened?

I was looking at the PR that adds functionality to create accurate time bounds and came across some 3-hour datasets where the new functionality was doing something reasonable, but I didn't actually know what the bounds should be. I realized this doesn't actually matter for 3-hour bounds, because the upper and lower bound are always 3 hours apart (and so all time elements have equal weight). This is different from averaging across months, because different months have varying lengths (and receive slightly different weights).

Anyways, I passed my 3-hourly dataset into group_average with weighted=False and got a KeyError because the time bounds were missing. But for an unweighted average we don't need the time bounds (I think).

What did you expect to happen?

I thought xcdat would return a temporally averaged dataset.

Minimal Complete Verifiable Example

import xcdat as xc

fn = '/p/css03/esgf_publish/CMIP6/CMIP/MIROC/MIROC6/amip/r1i1p1f1/3hr/tas/gn/v20190912/tas_3hr_MIROC6_amip_r1i1p1f1_gn_201401010300-201501010000.nc'
ds = xc.open_dataset(fn)
ds.temporal.group_average('tas', freq='day', weighted=False)

Relevant log output

---------------------------------------------------------------------------                                                                                           
KeyError                                  Traceback (most recent call last)
File ~/bin/anaconda3/envs/xcdat_dev/lib/python3.10/site-packages/xarray/core/dataset.py:1348, in Dataset._construct_dataarray(self, name)
   1347 try:
-> 1348     variable = self._variables[name]
   1349 except KeyError:

KeyError: 'time_bnds'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Cell In[60], line 1
----> 1 ds.temporal.group_average('tas', freq='day', weighted=False)

File ~/code/xcdat/xcdat/temporal.py:352, in TemporalAccessor.group_average(self, data_var, freq, weighted, keep_weights, season_config)
    224 """Returns a Dataset with average of a data variable by time group.
    225 
    226 Time bounds are used for generating weights to calculate weighted group
   (...)
    348 }
    349 """
    350 self._set_data_var_attrs(data_var)
--> 352 return self._averager(
    353     data_var,
    354     "group_average",
    355     freq,
    356     weighted=weighted,
    357     keep_weights=keep_weights,
    358     season_config=season_config,
    359 )

File ~/code/xcdat/xcdat/temporal.py:761, in TemporalAccessor._averager(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    759 # Get the data variable and the required time axis metadata.
    760 dv = _get_data_var(ds, data_var)
--> 761 time_bounds = ds.bounds.get_bounds("T", var_key=dv.name)
    763 if self._mode == "average":
    764     dv = self._average(dv, time_bounds)

File ~/code/xcdat/xcdat/bounds.py:228, in BoundsAccessor.get_bounds(self, axis, var_key)
    219 if len(bounds_keys) == 0:
    220     raise KeyError(
    221         f"No bounds data variables were found for the '{axis}' axis. Make sure "
    222         "the dataset has bound data vars and their names match the 'bounds' "
   (...)
    225         "or `xcdat.add_bounds`."
    226     )
--> 228 bounds: Union[xr.Dataset, xr.DataArray] = self._dataset[
    229     bounds_keys if len(bounds_keys) > 1 else bounds_keys[0]
    230 ].copy()
    232 return bounds

File ~/bin/anaconda3/envs/xcdat_dev/lib/python3.10/site-packages/xarray/core/dataset.py:1439, in Dataset.__getitem__(self, key)
   1437     return self.isel(**key)
   1438 if utils.hashable(key):
-> 1439     return self._construct_dataarray(key)
   1440 if utils.iterable_of_hashable(key):
   1441     return self._copy_listed(key)

File ~/bin/anaconda3/envs/xcdat_dev/lib/python3.10/site-packages/xarray/core/dataset.py:1350, in Dataset._construct_dataarray(self, name)
   1348     variable = self._variables[name]
   1349 except KeyError:
-> 1350     _, name, variable = _get_virtual_variable(self._variables, name, self.dims)
   1352 needed_dims = set(variable.dims)
   1354 coords: dict[Hashable, Variable] = {}

File ~/bin/anaconda3/envs/xcdat_dev/lib/python3.10/site-packages/xarray/core/dataset.py:186, in _get_virtual_variable(variables, key, dim_sizes)
    184 split_key = key.split(".", 1)
    185 if len(split_key) != 2:
--> 186     raise KeyError(key)
    188 ref_name, var_name = split_key
    189 ref_var = variables[ref_name]

KeyError: 'time_bnds'

Anything else we need to know?

I think the issue is here. We require bounds even if weighted=False (I think the bounds are only used to calculate weights...which we don't need). Although the time bounds are passed into the averaging functions, they aren't used!

I think we could modify this code to:

        # Get the data variable and the required time axis metadata.
        dv = _get_data_var(ds, data_var)
        if weighted:
            time_bounds = ds.bounds.get_bounds("T", var_key=dv.name)
        else:
            time_bounds = None

Environment

Current-ish dev branch.

@pochedls pochedls added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 21, 2023
@tomvothecoder
Copy link
Collaborator

Nice catch with this oversight! Yes, it is a simple fix using your modifications.

In addition to that, we need to update the _average and _group_average type annotations and docstrings with time_bounds: Union[xr.DataArray, None].

xcdat/xcdat/temporal.py

Lines 1087 to 1089 in 5790bc9

def _average(
self, data_var: xr.DataArray, time_bounds: xr.DataArray
) -> xr.DataArray:

xcdat/xcdat/temporal.py

Lines 1117 to 1119 in 5790bc9

def _group_average(
self, data_var: xr.DataArray, time_bounds: xr.DataArray
) -> xr.DataArray:

No errors need to be raised in these methods because ds.bounds.get_bounds() will raise the KeyError if no bounds are found but the calculations are supposed to be weighted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: now Requires immediate attention type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. work: obvious
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants