Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign #216

Merged
merged 110 commits into from
Jun 15, 2023
Merged

Redesign #216

merged 110 commits into from
Jun 15, 2023

Conversation

erogluorhan
Copy link
Member

@erogluorhan erogluorhan commented Feb 2, 2023

This PR introduces a prototype redesign of UXarray based on the conversation in Discussion #185 . This prototype's backbone is the uxarray.core.Grid object and a brand new uxarray.core.UxDataset that inherits from xarray.Dataset.

There were a lot of hesitation (including mine) and warnings against inheritance from xarray in Discussion #185., so one major goal of the prototype is to trigger the team and collaborators to point out potential drawbacks of the inheritance scheme.

Disclosure: I ended up finding the inheritance most suitable way for UXarray's case after figuring immediate issues with the dataset accessors and not being able to seeing robust implementations of xarray's custom indexes other than current.experiments/trials.

This PR does:

  • Create a new namespace uxarray.core that houses the existing grid as well as new api and dataset modules.

  • NOT change the grid.py functionality much but adds a new equals function to check if any two grid topologies are the same. Other functions are kept the same for now; however, functions like interpolate that can better be applied on UxDataset instead of a grid can be and should most likely be removed in the future if tis prototype gets hired.

  • Create xarray-like api.py module to handle user API functions such as open_dataset, open_grid, etc by wrapping xarray's functions.

  • Create a new uxarray.core.UxDataset that inherits from xarray.Dataset (Likewise a DataArray version can be considered in the future).

    • Constructor adds a uxarray.core.Grid object as property to itself at the initialization (similar to xarray's dataset accessor), so the connection between each uxarray dataset and grid topology
    • Has only one function integrate() defined for now. Algorithm is the same, but function call is focused on the dataset instead of on the grid, e.g.:
      uxds = uxarray.open_dataset(grid_file, dataset_file)
      integral = uxds.integrate()
      
  • Add test_api.py to tests and update existing test cases to reflect the design changes

@UXARRAY/uxarray-dev

EDIT (2/16/2023):

@philipc2 @dcherian Now added UxDataArray to this prototype and got all xr.DataArray fields of any UxDataset to be represented as UxDataArray. UxDataArrayalso has a uxgrid property. So, whenever a function that generates a new dataset or a data array out of UxDataset is executed, the result is expected to be either a new UxDataset and UxDataArray instead of xr.Dataset or xr.DataArray, respectively, so we can keep uxarray functionality valid on these new things.

EDIT (3/30/2023):

Uxarray's new design can be tried out following the instructions in the How to test new design document.

* Create a dataset module and convert to Grid Accessor xarray.Dataset accessor class

* Convert Grid appearances to GridAccessor

* Introduce api.py and dataset.py

* Update test cases to reflect the redesign

* Remove NestedSequence dependency

* precommit fixes
* Create a dataset module and convert to Grid Accessor xarray.Dataset accessor class

* Convert Grid appearances to GridAccessor

* Introduce api.py and dataset.py

* Update test cases to reflect the redesign

* Remove NestedSequence dependency

* precommit fixes

* Fix grid.equals and add test_dataset
@hongyuchen1030
Copy link
Contributor

Disclosure: I ended up finding the inheritance most suitable way for UXarray's case after figuring immediate issues with the dataset accessors and not being able to seeing robust implementations of xarray's custom indexes other than current.experiments/trials.

This PR does:

  • Create a new namespace uxarray.core that houses the existing grid as well as new api and dataset modules.

  • NOT change the grid.py functionality much but adds a new equals function to check if any two grid topologies are the same. Other functions are kept the same for now; however, functions like interpolate that can better be applied on UxDataset instead of a grid can be and should most likely be removed in the future if tis prototype gets hired.

  • Create xarray-like api.py module to handle user API functions such as open_dataset, open_grid, etc by wrapping xarray's functions.

  • Create a new uxarray.core.UxDataset that inherits from xarray.Dataset (Likewise a DataArray version can be considered in the future).

    • Constructor adds a uxarray.core.Grid object as property to itself at the initialization (similar to xarray's dataset accessor), so the connection between each uxarray dataset and grid topology
    • Has only one function integrate() defined for now. Algorithm is the same, but function call is focused on the dataset instead of on the grid, e.g.:
      uxds = uxarray.open_dataset(grid_file, dataset_file)
      integral = uxds.integrate()
      
  • Add test_api.py to tests and update existing test cases to reflect the design changes

@UXARRAY/uxarray-dev

My current understanding is grid class hold the original information of the mesh grid (basically the read-in file information), and all the connectivity information are stored in a separate class called "dataset". And to link them together we do

this_grid = uxarray.open_grid("gridfile.sh")
this_grid_dataset = uxarray.open_dataset(this_grid, "datasetfile.ds")

If that so, my concerns will be:

  1. Will any operation(now or in the future) in the xarray.datase change the "original information" of the grid? (Basically the coherence in updating the information and avoid dirty read & write)

  2. If users are going to export the data, what are we going to give them?

Please let me know if I misunderstand or being unclear of anything

@hongyuchen1030
Copy link
Contributor

hongyuchen1030 commented Feb 14, 2023

Resolved in the meeting:

1. ux_ds = uxarray.open_dataset(grid_file, "datasetfile.ds")
2. ux_ds.uxgrid
3. ux_ds.dataset_wise_funcs()
4. ux_ds.uxgrid.grid_wise_funcs()
5. ux_ds.var1.uxgrid.dataarray_wise_funcs()

@erogluorhan
Copy link
Member Author

erogluorhan commented Feb 17, 2023

EDIT (2/16/2023):

@philipc2 @dcherian Now added UxDataArray to this prototype and got all xr.DataArray fields of any UxDataset to be represented as UxDataArray. UxDataArrayalso has a uxgrid property. So, whenever a function that generates a new dataset or a data array out of UxDataset is executed, the result is expected to be either a new UxDataset or UxDataArray instead of xr.Dataset or xr.DataArray, respectively, so we can keep uxarray functionality valid on these new things.

@philipc2
Copy link
Member

philipc2 commented Feb 20, 2023

Attempting to merge two UxDataset objects, each lying on the same grid with different data variables that agree with the grid topology fails because of the if not isinstance(obj, (DataArray, Dataset, dict)): in xr.merge() complains about our Grid object stored in the uxgrid accessor. We will need to implement our own version of the functions mentioned here to be grid-aware.

uxds_v1 = ux.open_grid(gridfile_geoflow, dsfile_v1_geoflow)
uxds_v2 = ux.open_grid(gridfile_geoflow, dsfile_v2_geoflow)
# fails because of the grid accessor 
uxds_pair = xr.merge([uxds_v1, uxds_v2])

uxarray/core/grid.py Outdated Show resolved Hide resolved
@philipc2
Copy link
Member

Creating a shallow or deep copy using the Xarray copy method on a UxDataset does not preserve the uxgrid accessor

uxds = ux.open_dataset(gridfile_geoflow, dsfile_v1_geoflow)
# uxgrid accessor cleared to None
uxds_copy = uxds.copy(deep=False)

@philipc2
Copy link
Member

Similarly, calling a method such as .astype() also does not preserve the uxgrid accessor. Taking the route of inherence with custom accessors, does this mean that we must still deal with the statefullness issue, however we can "patch" it up with modifications to methods that return a Dataset or DataArray?

@dcherian
Copy link
Contributor

FWIW I brought up this issue at the Xarray dev meeting. We're discussing here and it seems like we're converging to a solution.

For this discussion the relevant piece is that if you created an xarray.Index object to store grid state, then all xarray operations will propagate these indexes; and any accessors can access state information in the Index. That particular discussion is about propagating the Index when a DataArray is extracted, this does not work currently.

however we can "patch" it up with modifications to methods that return a Dataset or DataArray

You could hack this by using __getattr__ but it'll be painful.

Copy link
Contributor

@hongyuchen1030 hongyuchen1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all my Algorithms implementing experience, I don't see any trouble using the current design.

The following algorithms' implementation is not heavily relied on the Xarray functionality. It uses more numpy features and are highly portable and self-robust

uxarray/core/dataset.py Show resolved Hide resolved
uxarray/core/dataset.py Outdated Show resolved Hide resolved
@headtr1ck
Copy link

headtr1ck commented Mar 2, 2023

A major method that you are missing is DataArray/Dataset._replace.
This will always be called when a copy is returned (so basically 90% of the methods use it).

Notice, that xarrays version calls type(self)(...) so it will call the init of your custom class but without the uxgrid arg, so you have to call super()._replace and then afterwards set the uxgrid.

Edit: Probably you also need Dataset._construct_direct (why does xarray even have that when replace exists???)

uxarray/core/dataarray.py Outdated Show resolved Hide resolved
@rajeeja
Copy link
Contributor

rajeeja commented Mar 3, 2023

I would like to reorganize a bit: add two folders:

  1. utils (helpers , get quadrature etc.)
  2. io (_exodus etc.)

Copy link
Contributor

@hongyuchen1030 hongyuchen1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for putting these together. It looks awesome overall. Just left some comments about the documentation

uxarray/core/grid.py Show resolved Hide resolved
docs/examples/001-working-with-unstructured-grids.ipynb Outdated Show resolved Hide resolved
docs/examples/002-grid-topology.ipynb Show resolved Hide resolved
docs/user_api/index.rst Outdated Show resolved Hide resolved
@erogluorhan erogluorhan changed the title Redesigning prototype Redesign Jun 15, 2023
@philipc2
Copy link
Member

philipc2 commented Jun 15, 2023

@erogluorhan

I have updated our "Working with Unstructured Grid Data" notebook to reflect the comments in #181 and the work that we've done with the inheritance notebook.

I have also added a small section to the "Grid Topology" notebook to describe how to access the INT_FILL_VALUE and INT_DTYPE with a description on the standardization behind them.

Let me know if there is anything else I should add!

docs/getting-started/overview.rst Outdated Show resolved Hide resolved
docs/getting-started/overview.rst Outdated Show resolved Hide resolved
docs/examples/002-grid-topology.ipynb Show resolved Hide resolved
Copy link
Contributor

@hongyuchen1030 hongyuchen1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your hard work! The documentation looks good to me now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
redesign Content relating to the ongoing redesign
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UxDataset.__setitem__ Connectivity Usage Example Notebook Implement Copy method for ux.Grid
8 participants