Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading string view stats #11861

Closed
wants to merge 1 commit into from

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Part of #11752.

Rationale for this change

Add implementation to support StringView in statistics. This should potentially allow StringView to run faster for parquet files that have good statistics.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Aug 7, 2024
@@ -493,6 +499,7 @@ async fn fetch_statistics(
pub fn statistics_from_parquet_meta_calc(
metadata: &ParquetMetaData,
table_schema: SchemaRef,
force_string_view: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not super elegant as we need to pass this config deep into the implementations. But I don't have a better way to do this.

@alamb
Copy link
Contributor

alamb commented Aug 19, 2024

FWIW now that we implemented this for real in apache/arrow-rs#6181 (thanks @Kev1n8 🙏 ) do we still need this code (once we have updated to get the changes upstream?)

@alamb
Copy link
Contributor

alamb commented Aug 19, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft August 19, 2024 18:43
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Oct 19, 2024
@github-actions github-actions bot closed this Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants