-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-guide - pandas : Add alternative to xarray.Dataset.from_dataframe #9020
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Overall I think this looks reasonable! I don't have strong views on the substance, others should comment if they have views... We'll need to add |
Thanks for the answer to the question I was going to ask! Would it also be useful to add |
Yes, please go ahead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. And as mentioned you will need to add the new dependency to https://github.com/pydata/xarray/blob/main/ci/requirements/doc.yml
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Error Read the Docs :
Is it because the package name (ntv-pandas) is different from the module name (ntv_pandas) ? |
Is it available on conda? Otherwise if only on |
Now you get a syntax error because |
Checking again - you absolutely need to start testing your packages continuously. I am reluctant to 'endorse' a package that does not have a CI pipeline. Let us know if you need support with that. |
This bug is clearly unacceptable (I'm ashamed) !! This demonstrates that I now need to take the time to have a robust CI process. For the current PR, two solutions:
It seems to me that solution 2 is preferable (unless you want to go quickly and in which case I will follow solution 1). |
Cool great to hear! Thinking about this - I would second @keewis idea. Featuring a prominent section without code would give it visibility, limit maintenance burden, and keep the docs environment smaller. (I maintain another smaller package where a third party package is featured in the docs. While I find it super cool that someone created an extension, it has caused above issues for me.) Maybe this could be something along the lines: Lossless and reversible conversion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The previous example shows that the conversion is not reversible (lossy roundtrip) and
that the size of the ``datasets`` increases. To avoid these problems, the third-party
`ntv-pandas`__ library offers lossless and reversible conversions between
``Dataset``/ ``DataArray`` and pandas ``DataFrame`` objects.
__ https://github.com/loco-philippe/ntv-pandas If you have not done so yet, you can showcase the examples in the docs of ntv-pandas. |
for more information, see https://pre-commit.ci
The proposal to have Xarray documentation without code linked to a third party package seems logical to me. So I modified the PR taking into account @mathause's proposal (with some modifications). I also added the Xarray accessors in the ntv_numpy package. We now have symmetric methods Another question: I haven't found any other tools or methodologies that analyze the structure of a tabular data to extract the multidimensional structure. Do you know any? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing our suggestions. Need to remove the packages from the doc.yml again. I have some optional minor suggestions, but looks good to go.
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thank you @max-sixty and @mathause for your approval of this PR. I finally added another example (use case) which I think should address new needs (but I don't know Xarray well enough and I'm interested in your opinion). Note: JSON, Pandas and Xarray interfaces are built on a neutral format as defined in this notebook. I think this structure is consistent with your roadmap (I'm also interested in your opinion). |
close/ open to trigger the tests again |
No idea why this does not work, but I don't see this could have something to do with the PR itself. I'll merge manually. Thanks for your PR and willingness to adopt to our changes! |
Thanks also for spending time on this PR. The changes were useful and relevant, so it was normal to take them into account ! |
#9020) * Update pandas.rst * Update pandas.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update pandas.rst * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update ecosystem.rst * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * review comments * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * Update doc.yml * remove code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update ci/requirements/doc.yml Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * Update doc/user-guide/pandas.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
This PR follow the issue #9015 as proposed by @max-sixty.
I added an additional section in the pandas.rst file to provide a third-party pandas interface alternative that is lossless and reversible.
The main contribution is the ability to find the multidimensional structure hidden by the tabular structure.