Skip to content

Commit

Permalink
User-guide - pandas : Add alternative to xarray.Dataset.from_dataframe (
Browse files Browse the repository at this point in the history
#9020)

* Update pandas.rst

* Update pandas.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update pandas.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update ecosystem.rst

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* review comments

* Update doc.yml

* Update doc.yml

* Update doc.yml

* Update doc.yml

* Update doc.yml

* Update doc.yml

* remove code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update ci/requirements/doc.yml

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* Update doc/user-guide/pandas.rst

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
  • Loading branch information
3 people authored and andersy005 committed Jun 14, 2024
1 parent 6772b42 commit 4d075e9
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Extend xarray capabilities
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
- `eofs <https://ajdawson.github.io/eofs/>`_: EOF analysis in Python.
- `hypothesis-gufunc <https://hypothesis-gufunc.readthedocs.io/en/latest/>`_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input.
- `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`_ : A tabular analyzer and a semantic, compact and reversible converter for multidimensional and tabular data
- `nxarray <https://github.com/nxarray/nxarray>`_: NeXus input/output capability for xarray.
- `xarray-compare <https://github.com/astropenguin/xarray-compare>`_: xarray extension for data comparison.
- `xarray-dataclasses <https://github.com/astropenguin/xarray-dataclasses>`_: xarray extension for typed DataArray and Dataset creation.
Expand Down
20 changes: 20 additions & 0 deletions doc/user-guide/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,26 @@ work even if not the hierarchical index is not a full tensor product:
s[::2]
s[::2].to_xarray()
Lossless and reversible conversion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The previous ``Dataset`` example shows that the conversion is not reversible (lossy roundtrip) and
that the size of the ``Dataset`` increases.

Particularly after a roundtrip, the following deviations are noted:

- a non-dimension Dataset ``coordinate`` is converted into ``variable``
- a non-dimension DataArray ``coordinate`` is not converted
- ``dtype`` is not allways the same (e.g. "str" is converted to "object")
- ``attrs`` metadata is not conserved

To avoid these problems, the third-party `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`__ library offers lossless and reversible conversions between
``Dataset``/ ``DataArray`` and pandas ``DataFrame`` objects.

This solution is particularly interesting for converting any ``DataFrame`` into a ``Dataset`` (the converter find the multidimensional structure hidden by the tabular structure).

The `ntv-pandas examples <https://github.com/loco-philippe/ntv-pandas/tree/main/example>`__ show how to improve the conversion for the previous ``Dataset`` example and for more complex examples.

Multi-dimensional data
~~~~~~~~~~~~~~~~~~~~~~

Expand Down

0 comments on commit 4d075e9

Please sign in to comment.