Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for better Enum / Categorical support in Polars Parquet #20089

Open
coastalwhite opened this issue Dec 1, 2024 · 0 comments
Open

Comments

@coastalwhite
Copy link
Collaborator

At the moment, several parts of Enum's and Categorical's are broken in the conjunction with Parquet reading and writing. This mostly stems from the fact that Polars does not use the arrow Dictionary type to store data contained in enums and categoricals. Instead, it chooses to store them in primitive u32 arrays with the categories being stored in the DataType. This causes us to need to make a small translation step when writing handling parquet for interoperability with other parquet readers.

Some things that are currently broken or less than ideal.

When these issues are fixed, working with Enum's and Categorical's in polars parquet should be a lot more doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant