Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use parquet-rs as an alternative in fuse engine #14268

Merged
merged 73 commits into from
Jan 30, 2024

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Jan 8, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Migrate scope

Migrated to parquet-rs in this PR:

  1. Fuse table block(read, write)
  2. Aggregation index(read, write)
  3. copy into stage(write)

Still parquet2: (They read parquet in different code paths, so I decide to split into different PRs

  1. Result cache(read,write)
  2. virtual column(read,write)
  3. stream load(read parquet)
  4. bloom index(read, write)

Compatibility

Backwards compatibility is tested by

statement ok
SET fuse_write_use_parquet2 = 1;

statement ok
SET fuse_read_use_parquet2 = 0;

in sql logic test.

Fixes #14135

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jan 8, 2024
# Conflicts:
#	src/query/expression/src/convert_arrow_rs/record_batch.rs
#	src/query/expression/src/convert_arrow_rs/schema/from_table_schema.rs
# Conflicts:
#	tests/sqllogictests/suites/mode/standalone/explain/join_reorder/mark.test
# Conflicts:
#	tests/sqllogictests/suites/mode/standalone/explain/eliminate_outer_join.test
#	tests/sqllogictests/suites/mode/standalone/explain/explain.test
#	tests/sqllogictests/suites/mode/standalone/explain/join_reorder/chain.test
#	tests/sqllogictests/suites/mode/standalone/explain/join_reorder/star.test
@JackTan25
Copy link
Contributor

can we get parquet write performance improvement after using parquet-rs in this pr? @SkyFan2002

@SkyFan2002
Copy link
Member Author

SkyFan2002 commented Jan 29, 2024

can we get parquet write performance improvement after using parquet-rs in this pr? @SkyFan2002

No. Do you have any concerns?

@JackTan25
Copy link
Contributor

can we get parquet write performance improvement after using parquet-rs in this pr? @SkyFan2002

No. Do you have any concerns?

At least hopefully there is no performance reducing.

@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Jan 29, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-14268-cf683bb

note: this image tag is only available for internal use,
please check the internal doc for more details.

Copy link
Contributor

@dantengsky dantengsky added this pull request to the merge queue Jan 30, 2024
Merged via the queue into databendlabs:main with commit 2c474ce Jan 30, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fuse engine preapre to use parquet_rs instead of parquet2
6 participants