Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinaryViewArray doesn't roundtrip a single Some(&[]) through parquet #6086

Closed
danhhz opened this issue Jul 18, 2024 · 4 comments
Closed

BinaryViewArray doesn't roundtrip a single Some(&[]) through parquet #6086

danhhz opened this issue Jul 18, 2024 · 4 comments
Labels
bug parquet Changes to the parquet crate

Comments

@danhhz
Copy link

danhhz commented Jul 18, 2024

Describe the bug

It seems that a BinaryViewArray of a single Some(&[]) comes back as None (null) after being roundtripped through parquet. A non-empty byte slice, such as Some(&[1]) works, however.

To Reproduce

Using version 52.1.0

#[test]
fn binary_view_bug() {
    #[track_caller]
    fn roundtrips(x: &[u8]) {
        let before = BinaryViewArray::from(vec![x]);
        let batch =
            RecordBatch::try_from_iter(vec![("val", Arc::new(before.clone()) as ArrayRef)])
                .unwrap();

        let mut buf = Vec::new();
        let mut writer = ArrowWriter::try_new(&mut buf, batch.schema(), None).unwrap();
        writer.write(&batch).unwrap();
        writer.close().unwrap();

        let builder = ParquetRecordBatchReaderBuilder::try_new(Bytes::from(buf)).unwrap();
        let mut reader = builder.build().unwrap();
        let batch = reader.next().unwrap().unwrap();
        let after = batch.columns()[0].as_binary_view();
        assert_eq!(
            before.iter().collect::<Vec<_>>(),
            after.iter().collect::<Vec<_>>()
        );
    }

    roundtrips(&[1]); // This one works
    roundtrips(&[]); // This one fails
}

The output on failure is

assertion `left == right` failed
  left: [Some([])]
 right: [None]

Expected behavior
The value should be Some([]) on decode.

Additional context

@danhhz danhhz added the bug label Jul 18, 2024
@XiangpengHao
Copy link
Contributor

Hi @danhhz , thank you for reporting this! But I can't reproduce the bug; probably already fixed in the master branch?

@XiangpengHao
Copy link
Contributor

I have reproduced the bug from 52.1.0 on crates.io. We recently revamped the StringViewArray and ByteViewArray in arrow, but those changes do not propagate to 52.1.0, it should be fixed in the next release! cc @alamb

@danhhz
Copy link
Author

danhhz commented Jul 19, 2024

Yup, I can confirm that my repro as well as the original use that led me to find it are both fixed on master. Thanks!

Should I close this issue or do y'all want to keep it open until the fix makes it into a crates.io release?

@alamb
Copy link
Contributor

alamb commented Jul 19, 2024

Thanks @danhhz and @XiangpengHao -- I'll mark this bug closed. #5998 tracks the release. I am hoping to be a closer to Aug 1 than Aug 7

@alamb alamb closed this as completed Jul 19, 2024
@alamb alamb added the parquet Changes to the parquet crate label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

3 participants