Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write boolean stats for boolean columns (not i32 stats) #661

Merged
merged 1 commit into from
Aug 8, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 5, 2021

Which issue does this PR close?

Resolves #659

Rationale for this change

Bool columns were writing the wrong type of statistics

What changes are included in this PR?

Write boolean stats for boolean columns (not i32 stats)

Are there any user-facing changes?

Statistics for bool columns in parquet files are now boolean. I am not sure if this is visible to users though, as when I used parquet-tools to create a parquet file with a boolean column, the min/max statistics look correct to me (aka are boolean)

alamb@MacBook-Pro rust_parquet % parquet-tools dump -n  /tmp/test_bool.parquet 
parquet-tools dump -n  /tmp/test_bool.parquet 
row group 0 
------------------------------------------------------------------------------------------------------------------
bool_col:  BOOLEAN UNCOMPRESSED DO:0 FPO:4 SZ:40/40/1.00 VC:6 ENC:PLAIN,RLE ST:[min: false, max: true, num_nulls: 1]

    bool_col TV=6 RL=0 DL=1
    --------------------------------------------------------------------------------------------------------------
    page 0:  DLE:RLE RLE:RLE VLE:PLAIN ST:[min: false, max: true, num_nulls: 1] CRC:[none] SZ:7 VC:6

BOOLEAN bool_col 
------------------------------------------------------------------------------------------------------------------
*** row group 1 of 1, values 1 to 6 *** 
value 1: R:0 D:1 V:true
value 2: R:0 D:1 V:true
value 3: R:0 D:1 V:false
value 4: R:0 D:0 V:<null>
value 5: R:0 D:1 V:false
value 6: R:0 D:1 V:true
alamb@MacBook-Pro rust_parquet % 

@alamb alamb marked this pull request as ready for review August 8, 2021 10:34
@alamb alamb requested a review from sunchao August 8, 2021 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet boolean columns write Int32 statistics
2 participants