Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show(df) does not work with modin.pandas #325

Open
wpritom opened this issue Oct 6, 2024 · 7 comments
Open

show(df) does not work with modin.pandas #325

wpritom opened this issue Oct 6, 2024 · 7 comments

Comments

@wpritom
Copy link

wpritom commented Oct 6, 2024

show() is not working while I'm importing pandas with from modin. I'm using modin to improve pandas performance.

import modin.pandas as pd

df = pd.read_csv("****.csv")

Now show(df, classes="display") column showing the following error.

AttributeError: 'DataFrame' object has no attribute 'iter_rows'

@mwouts
Copy link
Owner

mwouts commented Oct 6, 2024

Hi @wpritom , thanks for reporting this! Yes that's right currently ITables only supports Pandas and Polars DataFrames.

Can you convert df back to a Pandas DataFrame before calling show, for now at least?

You can leave this issue open so that I look to add support for Modin DataFrames when time permits. Thanks.

@MarcoGorelli
Copy link

Hi @mwouts - would you be open to using Narwhals in ITables?

I think this could simplify some of the code, e.g. this, and would also give you support for pandas / Polars / Modin / cuDF / PyArrow (and any other Narwhals-compatible eager dataframe), without making any of them required dependencies

Happy to make a PR if you'd be interested, just gauging interest first

@mwouts
Copy link
Owner

mwouts commented Nov 12, 2024

Hey @MarcoGorelli , Narwhals sounds like a great package indeed! And sure I would love to provide support for more dataframe types, see for instance #217 (pending) where I started working on Ibis support.

I would love to see how that part of the code would look like with Narwhals! The parts that we would need to rewrite that I am currently thinking of (there might be more) are

  • the downsampling part (estimate the size of the table content, then keep only a certain number of top and bottom rows, first and last columns)
  • the conversion from Python data to Javascript data.

Looking forward to hearing more from you!

@DeaMariaLeon
Copy link
Contributor

Hi @mwouts, I'm working on this.
Just so you know that we are around. (I'm a Narwhals team member, 🙂).

@mwouts
Copy link
Owner

mwouts commented Dec 15, 2024

Hi @wpritom , we're getting something that is starting to work - huge thanks to @DeaMariaLeon and to @MarcoGorelli !

Can you give a try at this PR and let us know how it works for you?

pip install git+https://github.com/mwouts/itables.git@use_narwhals

Also I am not familiar with modin, so I am wondering if it is expected that the modin tests are much slower than the pandas ones?

Last but not least I see warnings on my empty dataframes in the sample dataframe notebook (docs/modin_dataframes.md), I guess they come from modin itself?

UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.
Please refer to https://modin.readthedocs.io/en/stable/supported_apis/defaulting_to_pandas.html for explanation.
UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.

UserWarning: `DataFrame.itertuples` for empty DataFrame is not currently supported by Pandas

@mwouts
Copy link
Owner

mwouts commented Jan 11, 2025

To follow-up on this, we have a PR that passes the tests, however I see significant performance issues and hence I am not confident releasing it.

This code takes 13 to 18 seconds to run on my computer and I see no reason why it should be so slow - the dataframe is only 100 columns x 100 rows. And the to_html_datatable call takes 4 seconds which sounds far too much too... Is it expected that Modin has this kind of performance problems or could there be an issue with my local installation?

from itables.sample_dfs import get_dict_of_test_modin_dfs
from itables.javascript import to_html_datatable

df = get_dict_of_test_modin_dfs()["wide"]
html = to_html_datatable(df)

@MarcoGorelli
Copy link

Hey - I've also observed Modin having a tonne of overhead, I think it's only intended for datasets that don't fit on a single machine. To be honest I'm not sure that Modin even is a great candidate for iTables, Modin users might be better off converting to pandas before passing their table to itables

For Polars users, on the other hand, I'd expect iTables to work very well, as Polars works well on small datasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants