You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, several parts of Enum's and Categorical's are broken in the conjunction with Parquet reading and writing. This mostly stems from the fact that Polars does not use the arrow Dictionary type to store data contained in enums and categoricals. Instead, it chooses to store them in primitive u32 arrays with the categories being stored in the DataType. This causes us to need to make a small translation step when writing handling parquet for interoperability with other parquet readers.
Some things that are currently broken or less than ideal.
At the moment, several parts of Enum's and Categorical's are broken in the conjunction with Parquet reading and writing. This mostly stems from the fact that Polars does not use the arrow
Dictionary
type to store data contained in enums and categoricals. Instead, it chooses to store them in primitive u32 arrays with the categories being stored in theDataType
. This causes us to need to make a small translation step when writing handling parquet for interoperability with other parquet readers.Some things that are currently broken or less than ideal.
write_parquet
does not preserve enum categories for empty data frames #20083)When these issues are fixed, working with Enum's and Categorical's in polars parquet should be a lot more doable.
The text was updated successfully, but these errors were encountered: