Skip to content

Commit

Permalink
Fix Parquet V2 page reading when max def/rep levels are 0 (facebookin…
Browse files Browse the repository at this point in the history
…cubator#9939)

Summary:
The existing code did not skip the definition and repetition level arrays if the `maxDefine_` and `maxRepeat_` are both zero but these arrays were actually more than 0 bytes. This commit fixes this problem and always uses `pageHeader.data_page_header_v2.definition_levels_byte_length` and `pageHeader.data_page_header_v2.repetition_levels_byte_length` as the array lengths because these values are guaranteed to exist.

Fixes facebookincubator#9924

Pull Request resolved: facebookincubator#9939

Reviewed By: pedroerp

Differential Revision: D57975496

Pulled By: Yuhta

fbshipit-source-id: d9a8dde1973942a3b9cc5212db24e38c2ff3983d
  • Loading branch information
yingsu00 authored and facebook-github-bot committed May 31, 2024
1 parent 05c2925 commit 0b4f3e2
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 6 deletions.
11 changes: 5 additions & 6 deletions velox/dwio/parquet/reader/PageReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -261,12 +261,11 @@ void PageReader::prepareDataPageV2(const PageHeader& pageHeader, int64_t row) {
return;
}

uint32_t defineLength = maxDefine_ > 0
? pageHeader.data_page_header_v2.definition_levels_byte_length
: 0;
uint32_t repeatLength = maxRepeat_ > 0
? pageHeader.data_page_header_v2.repetition_levels_byte_length
: 0;
uint32_t defineLength =
pageHeader.data_page_header_v2.definition_levels_byte_length;
uint32_t repeatLength =
pageHeader.data_page_header_v2.repetition_levels_byte_length;

auto bytes = pageHeader.compressed_page_size;
pageData_ = readBytes(bytes, pageBuffer_);

Expand Down
Binary file added velox/dwio/parquet/tests/examples/v2_page.parquet
Binary file not shown.
25 changes: 25 additions & 0 deletions velox/dwio/parquet/tests/reader/ParquetReaderTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1156,3 +1156,28 @@ TEST_F(ParquetReaderTest, readVarbinaryFromFLBA) {
result->as<RowVector>()->childAt(0)->asFlatVector<StringView>()->valueAt(
0));
}

TEST_F(ParquetReaderTest, testV2PageWithZeroMaxDefRep) {
// enum_type.parquet contains 1 column (ENUM) with 3 rows.
const std::string sample(getExampleFilePath("v2_page.parquet"));

facebook::velox::dwio::common::ReaderOptions readerOptions{leafPool_.get()};
auto reader = createReader(sample, readerOptions);
EXPECT_EQ(reader->numberOfRows(), 5ULL);

auto rowType = reader->typeWithId();
EXPECT_EQ(rowType->type()->kind(), TypeKind::ROW);
EXPECT_EQ(rowType->size(), 1ULL);

EXPECT_EQ(rowType->childAt(0)->type()->kind(), TypeKind::BIGINT);

auto outputRowType = ROW({"regionkey"}, {BIGINT()});
auto rowReaderOpts = getReaderOpts(outputRowType);
rowReaderOpts.setScanSpec(makeScanSpec(outputRowType));
auto rowReader = reader->createRowReader(rowReaderOpts);

auto expected = makeRowVector({makeFlatVector<int64_t>({0, 1, 2, 3, 4})});

assertReadWithReaderAndExpected(
outputRowType, *rowReader, expected, *leafPool_);
}

0 comments on commit 0b4f3e2

Please sign in to comment.