-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyarrow datetime is not supported using altair 5.0 #3050
Comments
Thanks for the report! I can reproduce it with the following: from datetime import datetime
import pyarrow as pa
import altair as alt
data = {
'date': [datetime(2004, 8, 1), datetime(2004, 9, 1)],
'value': [102, 129]
}
pa_table = pa.table(data)
alt.Chart(pa_table).mark_line().encode(
x='date',
y='value'
)
Trying to narrow it down a bit further. The
The serialization of the arrow table is done using the dataframe interchange protocol available in import pyarrow.interchange as pi
pi_table = pi.from_dataframe(pa_table)
pi_table
Where this object is converted into a pylist before inserted as values within the generated vega-lite specification: pi_table.to_pylist()
As can be seen, the In Altair this happens around here: https://github.com/altair-viz/altair/blob/master/altair/utils/data.py#L168-L170 elif hasattr(data, "__dataframe__"):
# experimental interchange dataframe support
pi = import_pyarrow_interchange()
pa_table = pi.from_dataframe(data)
return {"values": pa_table.to_pylist()} Currently we do not run the elif str(dtype).startswith("datetime"):
# Convert datetimes to strings. This needs to be a full ISO string
# with time, which is why we cannot use ``col.astype(str)``.
# This is because Javascript parses date-only times in UTC, but
# parses full ISO-8601 dates as local time, and dates in Vega and
# Vega-Lite are displayed in local time by default.
# (see https://github.com/altair-viz/altair/issues/1027)
df[col_name] = (
df[col_name].apply(lambda x: x.isoformat()).replace("NaT", "")
) Not sure if we should sanitize interchange dataframes on the Python side or if it is better to push the |
I think you can also explore the idea of extending the JSON serialiser with support for datetimes. For example that's what we have done in TurboGears ( https://github.com/TurboGears/tg2/blob/development/tg/jsonify.py#L94-L98 ). That way you don't have to care about sanitising types explicitly because the json dumper will take care of it for you independently from where that value came from. |
If you decide to sanitize interchange dataframes on the Python side, you could use >>> from datetime import datetime
>>> import pyarrow as pa
>>> arr = pa.array([datetime(2010, 1, 1), datetime(2015, 1, 1)])
>>> arr
<pyarrow.lib.TimestampArray object at 0x12f37a020>
[
2010-01-01 00:00:00.000000,
2015-01-01 00:00:00.000000
]
>>> import pyarrow.compute as pc
>>> pc.cast(arr, pa.string())
<pyarrow.lib.StringArray object at 0x12f37a0e0>
[
"2010-01-01 00:00:00.000000",
"2015-01-01 00:00:00.000000"
] Note: in case of duration, the error should be raised. |
Thanks for responding @AlenkaF! We've to cast dates to full ISO-8601 dates in order to render dates in local time in Altair, where dates cast as string will render as UTC in JavaScript.. |
Thanks also @amol- for raising ideas! Appreciated! |
I created an altair chart from a duckdb dataframe exported as an arrow table, but I get this error
TypeError: Object of type datetime is not JSON serializable
when I cast the field to string everything works fine.
The text was updated successfully, but these errors were encountered: