Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadoop LZ4 Support for LZ4 Codec #3013

Merged
merged 8 commits into from
Nov 6, 2022
Merged

Commits on Nov 1, 2022

  1. Added tests for hadoop_lz4_compress_large.parquet

    Adrián Gallego Castellanos committed Nov 1, 2022
    Configuration menu
    Copy the full SHA
    bd15e37 View commit details
    Browse the repository at this point in the history
  2. Changed interface to be able to receive CodecOptions.

    * Added `CodecOptions` struct to hold `Codec` configuration.
    * Added `backward_compatible_lz4` option in `CodecOptions`.
    * Added `CodecOptions` to `ReadOptions` to be able to configure `SerializedFileReader`.
    * Added `SerializedRowGroupReaderOptionsBuilder` with `CodecOptions` to be able to configure `SerializedRowGroupReader`, with extensible interface.
    * Added `SerializedPageReaderOptionsBuilder` with `CodecOptions` to be able to configure `SerializedPageReader`, with extensible interface.
    * Added `new_with_config` to `SerializedPageReader` API to be able to configure `SerializedFileReader` without breaking `new` API.
    * `CodecOptions` implements `CopyTrait` as it is composed by `Copy` types. If in the future it contains a non `Copy` type, maybe is better to create `CodecOptionsPtr = Arc<CodecOptions>`.
    * `CodecOptions` is only added in the read path, in the write path the default values are taken, as the options currently only affect the read path and have no effect on write path. If required to add to write path maybe it will be nice to add into `WriteProperties`.
    Adrián Gallego Castellanos committed Nov 1, 2022
    Configuration menu
    Copy the full SHA
    a94b248 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2022

  1. Added support for LZ4_HADOOP compression codec.

    * Added compression and decompression for LZ4_HADOOP.
    Adrián Gallego Castellanos committed Nov 2, 2022
    Configuration menu
    Copy the full SHA
    693b85a View commit details
    Browse the repository at this point in the history
  2. Added tests for LZ4 fallback.

    * Added a test for two parquet files with the same content, both with LZ4 CompressionCodec, but one using the LZ4_HADOOP (no-fallback) algorithm and the other LZ4_RAW algorithm (fallback to last level).
    * Refactor `LZ4HadoopCodec::compress` function to make it easier to understand.
    Adrián Gallego Castellanos committed Nov 2, 2022
    Configuration menu
    Copy the full SHA
    54e01cf View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2022

  1. Fixed documentation tests.

    Adrián Gallego Castellanos committed Nov 4, 2022
    Configuration menu
    Copy the full SHA
    8813ffa View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2022

  1. Changed interface to make CodecOptions private to the crate.

    This commits hides `CodecOptions` from the public API. The changes are the following:
    - Added a new structs to public API `ReaderProperties`, `ReaderPropertiesBuilder` and `ReaderPropertiesPtr` to store inmutable reader config, as it is the case of `CodecOptions`.
    - Removed `SerializedRowGroupReaderOptions`, `SerializedRowGroupReaderOptionsBuilder`, `SerializedPageReaderOptionsBuilder` and `SerializedPageReaderOptions`. They are not required anymore as `SerializedRowGroupReader` and `SerializedRowGroupReaderOptions` use `ReaderPropertiesPtr` for configuration.
    - `SerializedRowGroupReader::new_with_options` renamed to `SerializedRowGroupReader::new_with_properties`.
    - `SerializedPageReader::new_with_options` renamed to `SerializedPageReader::new_with_properties`.
    - Test added for `ReaderPropertiesBuilder`.
    Adrián Gallego Castellanos committed Nov 5, 2022
    Configuration menu
    Copy the full SHA
    e838552 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    769f0b6 View commit details
    Browse the repository at this point in the history
  3. Removed incorrect cfg macro for try_hadoop_decompress function.

    Adrián Gallego Castellanos committed Nov 5, 2022
    Configuration menu
    Copy the full SHA
    e5abedd View commit details
    Browse the repository at this point in the history