Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Use dimension coordinates from the input data variable in specific xCDAT APIs #314

Closed
tomvothecoder opened this issue Aug 15, 2022 · 3 comments · Fixed by #343
Assignees
Labels
type: enhancement New enhancement request

Comments

@tomvothecoder
Copy link
Collaborator

Is your feature request related to a problem?

Related #285 and #312, we might need to update how we get the dimension name for a time axis (self._dim) in TemporalAccessor().

Instead of getting the time dimension name from the dataset (which can have multiple time axis):

self._dim = get_axis_coord(self._dataset, "T").name

We can get just the single time axis for the data variable that is being temporally averaged:
self._dim = get_axis_coord(self._data_var, "T").name

Originally posted by @tomvothecoder in #312 (comment)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Aug 15, 2022
@tomvothecoder tomvothecoder self-assigned this Aug 15, 2022
@tomvothecoder tomvothecoder moved this to Todo in v0.3.1 Aug 15, 2022
@tomvothecoder tomvothecoder changed the title [Feature]: Update the referenced time dimension in TemporalAccessor. [Feature]: Update reference to time dimension in TemporalAccessor to the input data variable Aug 15, 2022
@tomvothecoder tomvothecoder changed the title [Feature]: Update reference to time dimension in TemporalAccessor to the input data variable [Feature]: Use time dimension from the input data variable instead of the entire dataset in TemporalAccessor Aug 15, 2022
@pochedls
Copy link
Collaborator

@tomvothecoder - I'm wondering if this could/should be more general to all dimensions (e.g., also X or Y). This issue can affect other axes (e.g., here). I have not seen problems with spatial averaging, but it seems like it could (e.g., here). Maybe the data_var can be passed into some of these functions (as an optional argument?) and used in get_bounds?

@tomvothecoder
Copy link
Collaborator Author

Good point @pochedls. Yes, this should be applied to the SpatialAccessor and its methods as well.

For accessor classes:

  • Store the name of the user specified data_var as a class attr so it can be referenced in the class methods (no need to pass around as method args)

For get_bounds:

  • Update to handle cases where the dataset has multiple axis dimensions
  • Add data_var string keyword arg to fetch bounds for a specific axis for a data var

@tomvothecoder tomvothecoder changed the title [Feature]: Use time dimension from the input data variable instead of the entire dataset in TemporalAccessor [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset Aug 29, 2022
@tomvothecoder tomvothecoder changed the title [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset (can have multiple coords for an axis) Aug 29, 2022
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Aug 30, 2022

The proposed solution in this issue actually won't fix the issue of trying to map the desired axis coord variable for a dim because DataArrays can also have multiple coord vars mapped to a dim like Datasets (Datasets shares coords from its DataArrays).

Example:

import cf_xarray as cf
import xarray as xr
import xcdat as xc

#%%
# Test2 - Remove attributes referring to non-existing bounds
# This file is CF compliant (https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl)
"""
- https://github.com/xCDAT/xcdat/issues/285#issuecomment-1209768695
"""
filepath2 = "https://thredds-su.ipsl.fr/thredds/dodsC/ipsl_thredds/omamce/TestCases/XCDAT/thetao_CF.nc"

#%%
ds2 = xr.open_dataset(filepath2)

#%%
# 1. Attempt to get "T" axis coordinate variable in the Dataset
ds2.cf["T"]
# KeyError: "Receive multiple variables for key 'T': {'time_centered', 'time_counter'}. Expected only one. Please pass a list ['T'] instead to get all variables matching 'T'."

#%%
# 2. Check the Dataset's cf_xarray mapping 
ds2.cf

"""
Coordinates:
- CF Axes:   T: ['time_centered', 'time_counter']
             X, Y, Z: n/a

- CF Coordinates:   longitude: ['nav_lon']
                    latitude: ['nav_lat']
                  * vertical: ['olevel']
                    time: ['time_centered', 'time_counter']

- Cell Measures:   area, volume: n/a

- Standard Names:   latitude: ['nav_lat']
                    longitude: ['nav_lon']
                    time: ['time_centered', 'time_counter']

- Bounds:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   sea_water_conservative_temperature: ['thetao']

- Bounds:   T: ['time_centered_bounds']
            time: ['time_centered_bounds']
            time_centered: ['time_centered_bounds']

"""

#%%
# 3. Attempt to get "T" axis coordinate variable in the DataArray
ds2.thetao.cf["T"]
# KeyError: "Receive multiple variables for key 'T': {'time_centered', 'time_counter'}. Expected only one. Please pass a list ['T'] instead to get all variables matching 'T'."

#%%
# 4. Check the DataArray's cf_xarray mapping
ds2.thetao.cf
"""
Coordinates:
- CF Axes:   T: ['time_centered', 'time_counter']
             X, Y, Z: n/a

- CF Coordinates:   longitude: ['nav_lon']
                    latitude: ['nav_lat']
                  * vertical: ['olevel']
                    time: ['time_centered', 'time_counter']

- Cell Measures:   area, volume: n/a

- Standard Names:   latitude: ['nav_lat']
                    longitude: ['nav_lon']
                    time: ['time_centered', 'time_counter']

- Bounds:   n/a
"""

We need to figure out a sensible way of choosing the correct coord var for a dim in a DataArray for SpatialAccessor and TemporalAccessor when multiple coord vars are present.

This might involve using the dimension name as mentioned here, and making sure this coord var has bounds:

pass in a list, e.g., ds.cf[["T"]] instead of ds.cf["T"]. You could check if one of the time coordinates is a dimension by checking this intersection: list(set(ds.cf[["T"]].coords) & set(ds.cf[["T"]].dims)). If one axis satisfies this criteria, you could then choose this axis (e.g., updating the selection here) within _has_cf_compliant_time.

@tomvothecoder tomvothecoder changed the title [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset (can have multiple coords for an axis) [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset Sep 8, 2022
@tomvothecoder tomvothecoder changed the title [Feature]: Use coordinates for an axis from the input data variable instead of the entire dataset [Feature]: Use dimension coordinates from the input data variable instead of the entire dataset Sep 8, 2022
@tomvothecoder tomvothecoder changed the title [Feature]: Use dimension coordinates from the input data variable instead of the entire dataset [Feature]: Use dimension coordinates from the input data variable in specific xCDAT APIs Sep 8, 2022
@tomvothecoder tomvothecoder removed this from v0.3.2 Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants