Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Handle conversion from Polars with pd.DataFrame #47368

Closed
braaannigan opened this issue Jun 15, 2022 · 14 comments
Closed

ENH: Handle conversion from Polars with pd.DataFrame #47368

braaannigan opened this issue Jun 15, 2022 · 14 comments
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Enhancement

Comments

@braaannigan
Copy link

Is your feature request related to a problem?

I have a Polars DataFrame polars_df and want to convert it to a Pandas DataFrame. However, when I call pd.DataFrame(polars_df) it transposes the polars dataframe.

Describe the solution you'd like

I would like pd.DataFrame to output a Pandas DataFrame that hasn't been transposed.

API breaking implications

In the __init__ method of the DataFrame class (line 604) it would check if the type string was polars.internal.dataframe and if so it would call the to_pandas method of the object.

Describe alternatives you've considered

In my own code I can just use the to_pandas method of a polars dataframe, but 3rd-party libraries do not do this.

Additional context

import pandas as pd
import polars as pl

df = pl.DataFrame({'a':[0,1],'b':[1,2]})
pd.DataFrame(df)

gives output like:

	0	1
0	0	1
1	3	4

so the rows have been transposed and the column names have been dropped

@braaannigan braaannigan added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2022
@jreback
Copy link
Contributor

jreback commented Jun 15, 2022

we cannot depend on polars

instead polars can implement thr dataframe protocol

pls open an issue on that repo

closing as this is out of scope

@jreback jreback added this to the No action milestone Jun 15, 2022
@jreback jreback added Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2022
@jreback jreback closed this as not planned Won't fix, can't repro, duplicate, stale Jun 15, 2022
@braaannigan
Copy link
Author

Thanks for your comments @jreback

Looking through the PR on the dataframe protocol it seems like the conversion has to be called explicitly. Could there be a check within pd.DataFrame to see if the object isn't one of the usual suspects but has the dataframe namespace then it uses that namespace to construct the pandas dataframe?

@jreback
Copy link
Contributor

jreback commented Jun 15, 2022

there will be further work on this but polars would need to support this in the first place

@braaannigan
Copy link
Author

Cheers, thanks for replying

@abubelinha
Copy link

abubelinha commented Feb 10, 2023

we cannot depend on polars
instead polars can implement thr dataframe protocol
pls open an issue on that repo

@braaannigan did you finally open that issue? If so, please share the link.
I also think it's important to facilitate compatibility between both systems

Thanks
@abubelinha

@MarcoGorelli
Copy link
Member

polars has already implemented it pola-rs/polars#6581

@abubelinha
Copy link

Thanks @MarcoGorelli
So @jreback this issue can/should be reopen now?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 10, 2023

🤔 why should it be reopened? you can do

In [8]: df = pl.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [9]: pd.api.interchange.from_dataframe(df)
Out[9]:
   a  b
0  1  4
1  2  5
2  3  6

@abubelinha
Copy link

abubelinha commented Feb 10, 2023

Sorry, I may have misunderstood.
I thought the issue was letting Pandas import a Polars dataframe (keeping column names and without transposing) with a simple syntax like this:

pldf # a Polars dataframe
pddf = pd.DataFrame(pldf) # a Pandas dataframe

Anyway, the issue remained unanswered until now.
So, this is the final answer?

pd.core.interchange.from_dataframe.from_dataframe(pldf)

@MarcoGorelli
Copy link
Member

pd.DataFrame(pldf) is unsupported, the answer to that was

we cannot depend on polars
instead polars can implement thr dataframe protocol

, which polars has done 🎉

Or you can just use to_pandas pola-rs/polars#6756

@abubelinha
Copy link

Thanks for clarifying 👍

@datapythonista
Copy link
Member

we cannot depend on polars

While I agree that the dataframe interchange protocol is a better approach, it's worth noting that we can actually depend on polars, since we're already doing it on xarray to provide a DataFrame.to_xarray. So, I don't think a DataFrame.to_polars should be out of question.

I'm surely more in favor of removing to_xarray than to have a soft dependency on polars or any other library that we can export to. And if having these methods is useful (and I think it is), I think having third party packages implementing Python entrypoints we can create our methods automatically from seems the most reasonable approach to me.

@MarcoGorelli
Copy link
Member

Sure, just noting though that they already have from_pandas:

In [1]: pl.from_pandas(pd.DataFrame({'a': [1,2,3]}))
Out[1]:
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

which I think pd.DataFrame.to_polars would be duplicative of?

@jreback
Copy link
Contributor

jreback commented Feb 18, 2023

we don't rely on xarray explicitly - for a type check
rather it's a runtime conversion TO xarray

-1 on adding any additional even implicit dependencies

-0.25 on removing xarray (this is an orthogonal library so it's pretty useful)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants