-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadoop LZ4 Support for LZ4 Codec #3013
Commits on Nov 1, 2022
-
Added tests for hadoop_lz4_compress_large.parquet
Adrián Gallego Castellanos committedNov 1, 2022 Configuration menu - View commit details
-
Copy full SHA for bd15e37 - Browse repository at this point
Copy the full SHA bd15e37View commit details -
Changed interface to be able to receive CodecOptions.
* Added `CodecOptions` struct to hold `Codec` configuration. * Added `backward_compatible_lz4` option in `CodecOptions`. * Added `CodecOptions` to `ReadOptions` to be able to configure `SerializedFileReader`. * Added `SerializedRowGroupReaderOptionsBuilder` with `CodecOptions` to be able to configure `SerializedRowGroupReader`, with extensible interface. * Added `SerializedPageReaderOptionsBuilder` with `CodecOptions` to be able to configure `SerializedPageReader`, with extensible interface. * Added `new_with_config` to `SerializedPageReader` API to be able to configure `SerializedFileReader` without breaking `new` API. * `CodecOptions` implements `CopyTrait` as it is composed by `Copy` types. If in the future it contains a non `Copy` type, maybe is better to create `CodecOptionsPtr = Arc<CodecOptions>`. * `CodecOptions` is only added in the read path, in the write path the default values are taken, as the options currently only affect the read path and have no effect on write path. If required to add to write path maybe it will be nice to add into `WriteProperties`.
Adrián Gallego Castellanos committedNov 1, 2022 Configuration menu - View commit details
-
Copy full SHA for a94b248 - Browse repository at this point
Copy the full SHA a94b248View commit details
Commits on Nov 2, 2022
-
Added support for LZ4_HADOOP compression codec.
* Added compression and decompression for LZ4_HADOOP.
Adrián Gallego Castellanos committedNov 2, 2022 Configuration menu - View commit details
-
Copy full SHA for 693b85a - Browse repository at this point
Copy the full SHA 693b85aView commit details -
* Added a test for two parquet files with the same content, both with LZ4 CompressionCodec, but one using the LZ4_HADOOP (no-fallback) algorithm and the other LZ4_RAW algorithm (fallback to last level). * Refactor `LZ4HadoopCodec::compress` function to make it easier to understand.
Adrián Gallego Castellanos committedNov 2, 2022 Configuration menu - View commit details
-
Copy full SHA for 54e01cf - Browse repository at this point
Copy the full SHA 54e01cfView commit details
Commits on Nov 4, 2022
-
Adrián Gallego Castellanos committed
Nov 4, 2022 Configuration menu - View commit details
-
Copy full SHA for 8813ffa - Browse repository at this point
Copy the full SHA 8813ffaView commit details
Commits on Nov 5, 2022
-
Changed interface to make
CodecOptions
private to the crate.This commits hides `CodecOptions` from the public API. The changes are the following: - Added a new structs to public API `ReaderProperties`, `ReaderPropertiesBuilder` and `ReaderPropertiesPtr` to store inmutable reader config, as it is the case of `CodecOptions`. - Removed `SerializedRowGroupReaderOptions`, `SerializedRowGroupReaderOptionsBuilder`, `SerializedPageReaderOptionsBuilder` and `SerializedPageReaderOptions`. They are not required anymore as `SerializedRowGroupReader` and `SerializedRowGroupReaderOptions` use `ReaderPropertiesPtr` for configuration. - `SerializedRowGroupReader::new_with_options` renamed to `SerializedRowGroupReader::new_with_properties`. - `SerializedPageReader::new_with_options` renamed to `SerializedPageReader::new_with_properties`. - Test added for `ReaderPropertiesBuilder`.
Adrián Gallego Castellanos committedNov 5, 2022 Configuration menu - View commit details
-
Copy full SHA for e838552 - Browse repository at this point
Copy the full SHA e838552View commit details -
Configuration menu - View commit details
-
Copy full SHA for 769f0b6 - Browse repository at this point
Copy the full SHA 769f0b6View commit details -
Removed incorrect cfg macro for
try_hadoop_decompress
function.Adrián Gallego Castellanos committedNov 5, 2022 Configuration menu - View commit details
-
Copy full SHA for e5abedd - Browse repository at this point
Copy the full SHA e5abeddView commit details