Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support partition prune api #119

Merged
merged 6 commits into from
Oct 5, 2024
Merged

Conversation

KnightChess
Copy link
Contributor

@KnightChess KnightChess commented Aug 25, 2024

Description

Add filtering capabilities to table API, currently only partition fields are applicable. Multiple predicates are AND together.

hudi_table.read_snapshot(&["foo != a", "bar >= 100"]);

Supported operators are: >, >=, <, <=, =, !=.

For #47

How are the changes test-covered

  • N/A
  • Automated tests (unit and/or integration tests)
  • Manual tests
  • Details are described below

@KnightChess
Copy link
Contributor Author

@xushiyan cc

@xushiyan xushiyan self-assigned this Aug 29, 2024
@xushiyan xushiyan added feature rust Related to Rust codebase labels Aug 29, 2024
@xushiyan xushiyan added this to the release-0.2.0 milestone Aug 29, 2024
@xushiyan xushiyan linked an issue Aug 29, 2024 that may be closed by this pull request
@KnightChess
Copy link
Contributor Author

@xushiyan hello, Is there a problem with my test method in my local?
image
image

@KnightChess
Copy link
Contributor Author

KnightChess commented Aug 30, 2024

hello, Is there a problem with my test method in my local?

ignore checkstyle, fetch the latest commit, it work again, but can not report python test error.

Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 90.07092% with 14 lines in your changes missing coverage. Please review.

Project coverage is 89.32%. Comparing base (e23e6ed) to head (f1ce54d).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/table/partition.rs 88.76% 10 Missing ⚠️
crates/core/src/table/mod.rs 88.00% 3 Missing ⚠️
crates/core/src/table/fs_view.rs 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
+ Coverage   87.82%   89.32%   +1.50%     
==========================================
  Files          14       15       +1     
  Lines         731      834     +103     
==========================================
+ Hits          642      745     +103     
  Misses         89       89              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KnightChess KnightChess force-pushed the prune-partition branch 2 times, most recently from 085fbba to 9551275 Compare September 12, 2024 13:56
@KnightChess
Copy link
Contributor Author

@xushiyan cc:

@xushiyan
Copy link
Member

@KnightChess awesome contribution! let me take a look. i might push some quick fixes just FYI to move faster.

Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KnightChess good progress. i think the main functionalities are there. A major change is to re-model the Partition struct, we need to decouple it from datafusion struct

crates/core/src/config/table.rs Outdated Show resolved Hide resolved
crates/core/src/table/mod.rs Outdated Show resolved Hide resolved
crates/core/src/table/mod.rs Outdated Show resolved Hide resolved
crates/core/src/table/mod.rs Outdated Show resolved Hide resolved
crates/core/Cargo.toml Outdated Show resolved Hide resolved
crates/core/Cargo.toml Outdated Show resolved Hide resolved
crates/core/src/table/partitions.rs Outdated Show resolved Hide resolved
crates/core/src/table/fs_view.rs Show resolved Hide resolved
@xushiyan
Copy link
Member

xushiyan commented Sep 15, 2024

@KnightChess do you think you can address the main comment in the next few days? then i can polish further if needed and land this. Trying to get this in the upcoming release within 2 weeks 🙂 (cutting RC branch with a week)

@KnightChess
Copy link
Contributor Author

@xushiyan sorry for reply late, I will address these two days

@KnightChess
Copy link
Contributor Author

@xushiyan Hello, I couldn't find an implementation similar to ScalarValue, and I am not very familiar with Arrow yet, and. There is a certain learning curve involved, which might delay the progress of this PR. Could you please help improve this PR?

@KnightChess
Copy link
Contributor Author

@xushiyan cc, I try to use arrow Scalar<ArrayRef> to replace datafusion ScalarValue, and modified some suggestions to repair.

@xushiyan
Copy link
Member

@xushiyan cc, I try to use arrow Scalar<ArrayRef> to replace datafusion ScalarValue, and modified some suggestions to repair.

@KnightChess Thanks. I was traveling. Will take a look later today.

crates/core/src/config/table.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
crates/core/Cargo.toml Outdated Show resolved Hide resolved
crates/core/src/table/partitions.rs Outdated Show resolved Hide resolved
@xushiyan
Copy link
Member

xushiyan commented Oct 5, 2024

I was wrapping up my vacation 😄 just now getting back to update this:

Changes I've made:

  • Add PartitionPruner to handle pruning logic, taking care of hudi configs like hive style and url encoded paths.
  • Taking filters from parameters in form of foo > 10, bar != a, etc, only apply AND to them.
  • Limit the operators to only >, >=, <, <=, =, != to avoid rabbit hole of implementing sql parser; IN and NOT IN can be achieved anyway.
  • API to reset should not be needed now, as the pruning with different filters should be applied to loaded fs view directly.

There are more follow up work to do on datafusion integration side, which I'll jot down in the GH issue.

@xushiyan xushiyan merged commit 65d8ed4 into apache:main Oct 5, 2024
9 checks passed
@KnightChess
Copy link
Contributor Author

@xushiyan thanks review

@KnightChess KnightChess deleted the prune-partition branch October 7, 2024 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature rust Related to Rust codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants