Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove dependency on parquet2: Part I #15158

Merged
merged 15 commits into from
Apr 7, 2024

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Apr 2, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

In #14268, the serialization and deserialization of data blocks (of fuse engine) have been migrated to parquet-rs. After sufficient verification, we believe the previous migration is reliable. This PR will completely remove the dependency on Parquet2.

In the previous PR, for caution, we retained the code for reading and writing data blocks using Parquet2. In this PR, we will remove this code, along with the compatibility tests between Parquet2 and parquet-rs.

Some components in our system still depend on Parquet2. They will be migrated to parquet-rs in this PR:

  1. Result cache(read,write)
  2. virtual column(read,write)
  3. bloom index(read, write)

Still residual parquet2 after this PR:

./src/common/storage/src/parquet2.rs:use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/pipeline/sources/src/input_formats/impls/input_format_parquet.rs:use databend_common_arrow::parquet::metadata::ColumnChunkMetaData;
./src/query/pipeline/sources/src/input_formats/impls/input_format_parquet.rs:use databend_common_arrow::parquet::metadata::RowGroupMetaData;
./src/query/pipeline/sources/src/input_formats/impls/input_format_parquet.rs:use databend_common_arrow::parquet::read::read_metadata;
./src/query/service/tests/it/storages/fuse/bloom_index_meta_size.rs:// use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/service/tests/it/storages/fuse/bloom_index_meta_size.rs:// use databend_common_arrow::parquet::metadata::ThriftFileMetaData;
./src/query/storages/common/cache_manager/src/caches.rs:use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/storages/common/table_meta/src/table/table_compression.rs:use databend_common_arrow::parquet;
./src/query/storages/fuse/src/io/read/block/block_reader.rs:use databend_common_arrow::parquet::metadata::SchemaDescriptor;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::metadata::ColumnDescriptor;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::metadata::Descriptor;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::schema::types::FieldInfo;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::schema::types::ParquetType;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::schema::types::PhysicalType;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::schema::types::PrimitiveType;
./src/query/storages/fuse/src/io/read/block/block_reader_native_deserialize.rs:use databend_common_arrow::parquet::schema::Repetition;
./src/query/storages/fuse/src/io/read/meta/meta_readers.rs:    use databend_common_arrow::parquet::error::Error;
./src/query/storages/fuse/src/io/read/utils.rs:use databend_common_arrow::parquet::metadata::RowGroupMetaData;
./src/query/storages/fuse/src/operations/read/native_rows_fetcher.rs:use databend_common_arrow::parquet::metadata::ColumnDescriptor;
./src/query/storages/fuse/src/operations/read/fuse_rows_fetcher.rs:use databend_common_arrow::parquet::metadata::ColumnDescriptor;
./src/query/storages/fuse/src/operations/read/native_data_source_deserializer.rs:use databend_common_arrow::parquet::metadata::ColumnDescriptor;
./src/query/storages/hive/hive/src/hive_block_filter.rs:use databend_common_arrow::parquet::metadata::RowGroupMetaData;
./src/query/storages/hive/hive/src/hive_block_filter.rs:use databend_common_arrow::parquet::statistics::BinaryStatistics;
./src/query/storages/hive/hive/src/hive_block_filter.rs:use databend_common_arrow::parquet::statistics::BooleanStatistics;
./src/query/storages/hive/hive/src/hive_block_filter.rs:use databend_common_arrow::parquet::statistics::PrimitiveStatistics;
./src/query/storages/hive/hive/src/hive_block_filter.rs:use databend_common_arrow::parquet::statistics::Statistics;
./src/query/storages/hive/hive/src/hive_blocks.rs:use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/storages/hive/hive/src/hive_blocks.rs:use databend_common_arrow::parquet::metadata::RowGroupMetaData;
./src/query/storages/hive/hive/src/hive_meta_data_reader.rs:use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/storages/hive/hive/src/hive_meta_data_reader.rs:use databend_common_arrow::parquet::read::read_metadata_async;
./src/query/storages/hive/hive/src/hive_parquet_block_reader.rs:use databend_common_arrow::parquet::metadata::ColumnChunkMetaData;
./src/query/storages/hive/hive/src/hive_parquet_block_reader.rs:use databend_common_arrow::parquet::metadata::FileMetaData;
./src/query/storages/hive/hive/src/hive_parquet_block_reader.rs:use databend_common_arrow::parquet::metadata::RowGroupMetaData;
./src/query/storages/hive/hive/src/hive_parquet_block_reader.rs:use databend_common_arrow::parquet::read::BasicDecompressor;
./src/query/storages/hive/hive/src/hive_parquet_block_reader.rs:use databend_common_arrow::parquet::read::PageReader;
./src/query/storages/result_cache/src/table_function/table.rs:use databend_common_arrow::parquet::read::read_metadata;
  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Existing tests provide enough coverage.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Apr 2, 2024
@sundy-li
Copy link
Member

sundy-li commented Apr 3, 2024

also in input_format_parquet.rs

# Conflicts:
#	tests/sqllogictests/suites/base/13_parquet_rs/13_05_0003_ddl_alter_table_backward.test
#	tests/sqllogictests/suites/base/13_parquet_rs/13_05_0003_ddl_alter_table_forward.test
#	tests/sqllogictests/suites/base/13_parquet_rs/13_05_0028_ddl_alter_table_add_drop_column_backward.test
#	tests/sqllogictests/suites/base/13_parquet_rs/13_05_0028_ddl_alter_table_add_drop_column_forward.test
@SkyFan2002 SkyFan2002 changed the title refactor: remove dependency on parquet2 refactor: remove dependency on parquet2: Part I Apr 7, 2024
@SkyFan2002 SkyFan2002 marked this pull request as ready for review April 7, 2024 06:23
@BohuTANG BohuTANG merged commit 1ca93c8 into databendlabs:main Apr 7, 2024
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants