Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Parallelize arrow conversion if binview -> large_bin #17083

Merged
merged 2 commits into from
Jun 20, 2024
Merged

Conversation

ritchie46
Copy link
Member

@djouallah created a benchmark with only string columns and calls to_arrow.

Because Polars has adopted the new string-view type, we are conservative and must covert to the old large-string. This conversion must move all data. Therefore let's make it parallel.

You can default to string-view in pyarrow 16 by setting to_arrow(future=True), but the delta-writer still doesn't support that datatype. delta-io/delta-rs#2613

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jun 20, 2024
@ritchie46 ritchie46 merged commit 8792926 into main Jun 20, 2024
25 checks passed
@ritchie46 ritchie46 deleted the to_string branch June 20, 2024 08:32
@djouallah
Copy link

Thanks but I read as string as data is messed up but then later convert to double ?

@c-peters c-peters added the accepted Ready for implementation label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants