Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parquet crate for decoding Parquet data into Arrow arrays #1040

Open
andygrove opened this issue Oct 26, 2024 · 0 comments
Open

Use parquet crate for decoding Parquet data into Arrow arrays #1040

andygrove opened this issue Oct 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

What is the problem the feature request solves?

Comet has native code for decoding Parquet structures into Arrow arrays. This issue is for discussing delegating to the parquet crate instead for these operations.

The benefits of this approach include:

  • Support for complex types. The parquet crate already supports reading maps and structs. We could implement the same support in the Comet native code but it is probably a lot of work
  • Support for StringView and benefitting from related performance optimizations (see [1] and [2] for details)
  • Benefit from ongoing optimization work and active community
  • Reduce maintenance efforts in Comet

Possible downsides of this approach:

  • Lose the performance benefit of re-using mutable buffers? (although this also comes with a maintenance cost)

[1] https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
[2] https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/

Describe the potential solution

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant