-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas 2.0 with pyarrow backend: "TypeError: Cannot interpret 'timestamp[ms][pyarrow]' as a data type" #3127
Comments
I just tried it with this in my app's
|
I don't think it is, #3076 was about fixing a sanitization issue for |
@mattijn The commit I mention in my second comment (72a361c) was the most recent state of the main branch when I made that comment, and later than the merge of #3076. I was also recently playing with Polars and indeed I had type-related problems with Vega-Altair 5.0.1, I saw PR #3076, and I got over those problems precisely by using commit 427d5679, the one that merged #3076. So with commits that have #3076 I have indeed observed this app working with Polars frames but not with Arrow-backed Pandas 2.0 frames. |
@jonmmease I'm obviously not nearly as deep into this as you are but given how central Arrow is becoming to the whole ecosystem and various comments I read about how support for this dataframe interchange protocol is still experimental, I wonder if just building first-class support for Arrow might be wise. |
Hi @sacundim, yeah we totally agree. We're working on moving various parts of Altair from depending directly on the pandas API to working with the combination of the DataFrame Interchange Protocol and pyarrow. One recent example was #3114, where I updated the encoding type inference to rely on the DataFrame interchange protocol when available. One note is that we'll need to keep the pandas-centric logic around for a while because we still support versions of pandas before the DataFrame Interchange Protocol was introduced, and we want Altair to work in environments like Pyodide where pyarrow isn't available yet. |
… order to reconfirm that Vega-Altair works just fine there, and the issue that I report for Pandas 2.0 is happening only for Pandas 2.0. vega/altair#3127
Just to make things a bit clearer; (1) I've got two experimental branches in my project in question:
(2) I've just now got both using Vega-Altair from the GitHub main branch:
(3) The following function in my polars-try2 branch uses PyAthena 3.0.6 to produce pyarrow tables, wraps them into Polars dataframes, and Vega-Altair understands the latter just fine:
(4) But the following function in my pandas-2.0 branch uses PyAthena 3.0.6 to produce pyarrow tables the exact way, differs only in that it packages them into Pandas 2.0 dataframes, and gets the error that this ticket reports:
|
@sacundim, could you try out this branch when you have a chance? |
…sue: vega/altair#3127 Different error: ``` File "/usr/local/lib/python3.11/site-packages/pandas/core/interchange/utils.py", line 90, in dtype_to_arrow_c_fmt raise NotImplementedError( NotImplementedError: Conversion of timestamp[ms][pyarrow] to Arrow C format string is not implemented. ```
@jonmmease Different error now (full stack trace at bottom):
Might not be the exact same code path on my side as the original failure Stack trace
|
@jonmmease Aha, got it, the following excerpt of this code leads to the chart failing with the error shown above:
...whereas the following one works just fine:
|
Thanks @sacundim, I see what's going on. Should be fixed on the same branch now. |
@jonmmease Revision 57a4ad7 works on my end, thanks! |
Thanks for reviewing @sacundim! Seems like a nice project you are working on. I can't judge the content, but it is nice to see these type of extended Altair code specifications! If you have other suggestions how the package can be improved, please feel free to open a new issue or discussion. |
Essential outline of what I'm doing:
Stack trace from my actual app
The text was updated successfully, but these errors were encountered: