Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuring parquet compression #1235

Closed
wjones127 opened this issue Mar 21, 2023 · 1 comment · Fixed by #1497
Closed

Support configuring parquet compression #1235

wjones127 opened this issue Mar 21, 2023 · 1 comment · Fixed by #1497
Labels
enhancement New feature or request

Comments

@wjones127
Copy link
Collaborator

Description

We should probably make this configurable at the table level, if there's some standard for that.

Note: need to change the filename format:

// NOTE: If we add a non-snappy option, file name must change
let file_name = format!("part-{first_part}-{uuid_part}-{last_part}.snappy.parquet");

Also, should we consider using zstd by default? It produces smaller files than snappy, but is faster than gzip (the Iceberg default). It was one of the better ones it my tests of Parquet compression, and I've seen suggested in several other venues for Parquet tables.

Use Case

Related Issue(s)

@wjones127 wjones127 added the enhancement New feature or request label Mar 21, 2023
@spebern
Copy link
Contributor

spebern commented Mar 22, 2023

It would also be a good idea to support compression levels. zstd's decompression speed does not suffer that much from higher levels, but achieves higher compression ratios (https://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/).

When download is a bottleneck and compression speed is not tuning the level can be useful.

wjones127 pushed a commit that referenced this issue Jun 28, 2023
…eltaWriter (#1497)

# Description
Adds the capability to pass a configured WriterProperties to the
`RecordBatchWriter` and `DeltaWriter` similar to how the
`OptimizeBuilder` can be updated.

# Related Issue(s)
- closes #1469 
- closes #1235 

# Documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants