Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partition pruning during file listing #47

Closed
xushiyan opened this issue Jul 5, 2024 · 3 comments
Closed

Support partition pruning during file listing #47

xushiyan opened this issue Jul 5, 2024 · 3 comments
Assignees
Milestone

Comments

@xushiyan
Copy link
Member

xushiyan commented Jul 5, 2024

Provide basic table API to accept predicates like foo > 10 or bar != A. The partition loading for file system view should process the predicates and load relevant partitions. Just consider supporting AND for multiple predicate string expressions.

Follow up work #159 #160

@xushiyan xushiyan added this to the release-0.2.0 milestone Jul 5, 2024
@xushiyan xushiyan self-assigned this Jul 5, 2024
@xushiyan xushiyan added the p0 label Jul 19, 2024
@xushiyan xushiyan changed the title Integrate with daft: Hudi read API Support partition pruning during file listing Jul 19, 2024
@xushiyan xushiyan removed their assignment Jul 19, 2024
@KnightChess
Copy link
Contributor

Hi, I'd like to work for this feature if no one plan work for it.

@xushiyan
Copy link
Member Author

@KnightChess great! can you please describe a high-level approach for the implementation here?

@KnightChess
Copy link
Contributor

support hudi internal filter, engine like datafusion or other need cover it expression to hudi partition filter.
something like this code,

let filter_one = PartitionFilter::try_from(("shortField", "=", "100", short_field_data_type)).unwrap();
let filter_two = PartitionFilter::try_from(("shortField", ">", "100", short_field_data_type)).unwrap();
hudi_table.partition_filter_replace(vec![filter_one, filter_two]);

the core struct filter define, key is partirion field, value is expression-value

pub struct PartitionFilter {
    /// The key of the PartitionFilter
    pub key: String,
    /// The value of the PartitionFilter
    pub value: PartitionValue
}

expression-value: reuse datafusion ScalarValue.

pub enum PartitionValue {
    /// The partition value with the equal operator
    Equal(ScalarValue),
    /// The partition value with the not equal operator
    NotEqual(ScalarValue),
    /// The partition value with the greater than operator
    GreaterThan(ScalarValue),
    /// The partition value with the greater than or equal operator
    GreaterThanOrEqual(ScalarValue),
    /// The partition value with the less than operator
    LessThan(ScalarValue),
    /// The partition value with the less than or equal operator
    LessThanOrEqual(ScalarValue),
    /// The partition values with the in operator
    In(Vec<ScalarValue>),
    /// The partition values with the not in operator
    NotIn(Vec<ScalarValue>),
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants