Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-guide - pandas : Add alternative to xarray.Dataset.from_dataframe #9020

Merged
merged 28 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
fcedc2d
Update pandas.rst
loco-philippe May 9, 2024
09d3e93
Update pandas.rst
loco-philippe May 9, 2024
763e6d4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 10, 2024
d4e0b8c
Merge branch 'main' into main
loco-philippe May 10, 2024
6ca399d
Update pandas.rst
loco-philippe May 10, 2024
c8e2c3b
Merge branch 'main' of https://github.com/loco-philippe/xarray
loco-philippe May 10, 2024
878b683
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 10, 2024
56a488f
Update ecosystem.rst
loco-philippe May 10, 2024
15eff3c
Merge branch 'main' into main
mathause May 13, 2024
c1a3ff5
Update doc/user-guide/pandas.rst
loco-philippe May 13, 2024
84b476d
Update doc/user-guide/pandas.rst
loco-philippe May 13, 2024
ffe3a73
Update doc/user-guide/pandas.rst
loco-philippe May 13, 2024
5a6009f
review comments
loco-philippe May 13, 2024
1082288
Update doc.yml
loco-philippe May 13, 2024
dd7970d
Update doc.yml
loco-philippe May 13, 2024
0113b96
Update doc.yml
loco-philippe May 14, 2024
5f9468a
Update doc.yml
loco-philippe May 14, 2024
77345bc
Update doc.yml
loco-philippe May 14, 2024
06d98d3
Update doc.yml
loco-philippe May 14, 2024
356f031
remove code
loco-philippe May 19, 2024
f571bda
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2024
a069c11
Update doc/user-guide/pandas.rst
loco-philippe May 21, 2024
e214029
Update doc/user-guide/pandas.rst
loco-philippe May 21, 2024
4e8aede
Update ci/requirements/doc.yml
loco-philippe May 21, 2024
d8db8e9
Update doc/user-guide/pandas.rst
loco-philippe May 21, 2024
2b331dc
Update doc/user-guide/pandas.rst
loco-philippe May 21, 2024
890cb27
Merge branch 'main' into main
mathause May 22, 2024
8059010
Merge branch 'main' into main
loco-philippe May 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Extend xarray capabilities
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
- `eofs <https://ajdawson.github.io/eofs/>`_: EOF analysis in Python.
- `hypothesis-gufunc <https://hypothesis-gufunc.readthedocs.io/en/latest/>`_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input.
- `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`_ : A tabular analyzer and a semantic, compact and reversible converter for multidimensional and tabular data
- `nxarray <https://github.com/nxarray/nxarray>`_: NeXus input/output capability for xarray.
- `xarray-compare <https://github.com/astropenguin/xarray-compare>`_: xarray extension for data comparison.
- `xarray-dataclasses <https://github.com/astropenguin/xarray-dataclasses>`_: xarray extension for typed DataArray and Dataset creation.
Expand Down
56 changes: 56 additions & 0 deletions doc/user-guide/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,62 @@ work even if not the hierarchical index is not a full tensor product:
s[::2]
s[::2].to_xarray()

Lossless and reversible converter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The previous example shows that the conversion is not reversible (lossless roundtrip) and that the size of the ``datasets`` increases.
loco-philippe marked this conversation as resolved.
Show resolved Hide resolved

Another approach is to use a lossless and reversible conversion (e.g Third party `ntv-pandas`__ libraries). A dataset can then be shared
between several tools.
loco-philippe marked this conversation as resolved.
Show resolved Hide resolved

__ https://github.com/loco-philippe/ntv-pandas/blob/main/README.md
loco-philippe marked this conversation as resolved.
Show resolved Hide resolved

DataFrame to Dataset or DataArray
mathause marked this conversation as resolved.
Show resolved Hide resolved
---------------------------------

The conversion is done without loss, by finding the multidimensional structure hidden by the tabular structure.

By applying this conversion to the DataFame above, we find the initial ``Dataset``:
loco-philippe marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

import ntv_pandas as npd

df.npd.to_xarray()

Dataset or DataArray to Dataframe
---------------------------------

In the other direction, information that is not supported by the DataFrame must be transferred to the DataFrame (e.g. ``attrs`` data).
loco-philippe marked this conversation as resolved.
Show resolved Hide resolved

For this, pandas provides the ``attrs`` attribute.

.. ipython:: python

ds = xr.Dataset(
{"foo": (("x", "y"), np.random.randn(2, 3))},
coords={
"x": [10, 20],
"y": ["a", "b", "c"],
"along_x": ("x", np.random.randn(2)),
"scalar": 123,
},
attrs={"example": "test npd"},
)
ds

After reverse conversion, we find the initial ``Dataset``:

.. ipython:: python

df = npd.from_xarray(ds)
mathause marked this conversation as resolved.
Show resolved Hide resolved

df.npd.to_xarray()

.. note::

The pandas ``attrs`` attribute is still experimental (some operations remove it). The associated information must therefore be processed as a priority

Multi-dimensional data
~~~~~~~~~~~~~~~~~~~~~~

Expand Down