Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove row group filter for map and list #10510

Closed
wants to merge 3 commits into from

Conversation

yma11
Copy link
Contributor

@yma11 yma11 commented Jul 19, 2024

There is a fast path for filter all null columns in ScanSpec based on statistics by assuming all null happens when NumberOfValues equals 0. But when do filterRowGroups against one line file with null Map, this rule will apply to the Key column reader, which has testNull() as false and finally decide this rowgroup should be skipped. It means this rule doesn't apply for the columns like Map::Key . But limited to the info that statistics can provide, we don't have a better way to refine this fast path, so we remove the filterRowGroups in MapColumnReader in this PR.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 19, 2024
Copy link

netlify bot commented Jul 19, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 959f214
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66b42ba95d76cd00088faf3b

@Yuhta Yuhta linked an issue Jul 19, 2024 that may be closed by this pull request
@majetideepak majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Aug 1, 2024
@yma11 yma11 changed the title Remove fast path for all null in filter Remove row group filter for map and list Aug 2, 2024
@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kagamiori
Copy link
Contributor

Hi @yma11, could you please rebase this PR onto the latest main? That would help with merging. Thanks!

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kagamiori
Copy link
Contributor

Hi @yma11, there are three unit tests failing internal: velox/dwio/parquet/tests/reader:e2e_filter_test - E2EFilterTest.list, E2EFilterTest.map, and E2EFilterTest.subfieldsPruning. They all fails with SIGABRT "Assertion '__n < this->size()' failed." with the same stack trace, so the issues are related. Below is the full stacktrace of E2EFilterTest.list.

Assertion '__n < this->size()' failed.
*** Aborted at 1723073346 (Unix time, try 'date -d @1723073346') ***
*** Signal 6 (SIGABRT) (0x37b3c0003f7e7) received by PID 260071 (pthread TID 0x7fc638e42d80) (linux TID 260071) (maybe from PID 260071, UID 228156) (code: -6), stack trace: ***
    @ 000000000000fd47 folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:45
    @ 000000000000e4c1 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:47
    @ 000000000004455f (unknown)
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c
    @ 000000000009c993 __GI___pthread_kill
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_kill.c:46
    @ 00000000000444ac __GI_raise
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/posix/raise.c:26
    @ 000000000002c432 __GI_abort
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/stdlib/abort.c:79
    @ 00000000005a853b std::__replacement_assert(char const*, int, char const*, char const*)
                       fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/x86_64-facebook-linux/bits/c++config.h:535
                       -> ./fbcode/velox/dwio/parquet/reader/Metadata.cpp
    @ 00000000005bf958 std::vector<facebook::velox::parquet::thrift::ColumnChunk, std::allocator<facebook::velox::parquet::thrift::ColumnChunk> >::operator[](unsigned long) const
                       fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_vector.h:1063
                       -> ./fbcode/velox/dwio/parquet/reader/Metadata.cpp
    @ 00000000005bf8cb facebook::velox::parquet::RowGroupMetaDataPtr::columnChunk(int) const
                       ./fbcode/velox/dwio/parquet/reader/Metadata.cpp:284
    @ 0000000002911f78 facebook::velox::parquet::ParquetData::rowGroupMatches(unsigned int, facebook::velox::common::Filter*)
                       ./fbcode/velox/dwio/parquet/reader/ParquetData.cpp:75
    @ 0000000002910003 facebook::velox::parquet::ParquetData::filterRowGroups(facebook::velox::common::ScanSpec const&, unsigned long, facebook::velox::dwio::common::StatsContext const&, facebook::velox::dwio::common::FormatData::FilterRowGroupsResult&)
                       ./fbcode/velox/dwio/parquet/reader/ParquetData.cpp:48
    @ 0000000000d2e7c9 facebook::velox::dwio::common::SelectiveColumnReader::filterRowGroups(unsigned long, facebook::velox::dwio::common::StatsContext const&, facebook::velox::dwio::common::FormatData::FilterRowGroupsResult&) const
                       ./fbcode/velox/dwio/common/SelectiveColumnReader.cpp:59
    @ 0000000002c0c9c7 facebook::velox::parquet::StructColumnReader::filterRowGroups(unsigned long, facebook::velox::dwio::common::StatsContext const&, facebook::velox::dwio::common::FormatData::FilterRowGroupsResult&) const
                       ./fbcode/velox/dwio/parquet/reader/StructColumnReader.cpp:197
    @ 000000000295a065 facebook::velox::parquet::ParquetRowReader::Impl::filterRowGroups()
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:808
    @ 0000000002958b61 facebook::velox::parquet::ParquetRowReader::Impl::Impl(std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&)
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:794
    @ 00000000029580bf std::_MakeUniq<facebook::velox::parquet::ParquetRowReader::Impl>::__single_object std::make_unique<facebook::velox::parquet::ParquetRowReader::Impl, std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&>(std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&)
                       fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/unique_ptr.h:962
                       -> ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000002957f06 facebook::velox::parquet::ParquetRowReader::ParquetRowReader(std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&)
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:944
    @ 0000000002969f1f std::_MakeUniq<facebook::velox::parquet::ParquetRowReader>::__single_object std::make_unique<facebook::velox::parquet::ParquetRowReader, std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&>(std::shared_ptr<facebook::velox::parquet::ReaderBase> const&, facebook::velox::dwio::common::RowReaderOptions const&)
                       fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/unique_ptr.h:962
                       -> ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp
    @ 000000000291e17b facebook::velox::parquet::ParquetReader::createRowReader(facebook::velox::dwio::common::RowReaderOptions const&) const
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:100
    @ 000000000009996a facebook::velox::dwio::common::E2EFilterTestBase::readWithFilter(std::shared_ptr<facebook::velox::common::ScanSpec>, facebook::velox::dwio::common::MutationSpec const&, std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long&, bool, bool)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:162
    @ 00000000000d3be8 facebook::velox::dwio::common::E2EFilterTestBase::testFilterSpecs(std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<facebook::velox::dwio::common::FilterSpec, std::allocator<facebook::velox::dwio::common::FilterSpec> > const&)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:308
    @ 00000000000d65d3 facebook::velox::dwio::common::E2EFilterTestBase::testNoRowGroupSkip(std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:338
    @ 00000000000dfc70 facebook::velox::dwio::common::E2EFilterTestBase::testScenario(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int, bool)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:420
    @ 0000000000375785 E2EFilterTest::testWithTypes(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int)
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:44
    @ 00000000003432e1 E2EFilterTest_list_Test::TestBody()
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:521
    @ 0000000000123f7e void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)
                       fbsource/src/gtest.cc:2675
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 0000000000123804 testing::Test::Run()
                       fbsource/src/gtest.cc:2692
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000012943f testing::TestInfo::Run()
                       fbsource/src/gtest.cc:2841
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 00000000001313f6 testing::TestSuite::Run()
                       fbsource/src/gtest.cc:3020
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016cd5b testing::internal::UnitTestImpl::RunAllTests()
                       fbsource/src/gtest.cc:5925
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016bdbb bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)
                       fbsource/src/gtest.cc:2675
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016b2f9 testing::UnitTest::Run()
                       fbsource/src/gtest.cc:5489
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 00000000004d1ba0 RUN_ALL_TESTS()
                       fbsource/gtest/gtest.h:2317
                       -> ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp
    @ 00000000004d1a6c main
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:720

What I'm seeing is that the unit test reads a struct that contains an array of integers. Because this PR removes the override of filterRowGroups() from ListColumnReader, when StructColumnReader::filterRowGroups() calls child->filterRowGroups() where child is the array, SelectiveColumnReader::filterRowGroups() in the base class was incorrectly called. Inside SelectiveColumnReader::filterRowGroups(), the program tries to access the columnChunk of the array column with an index -1 (i.e., kNonLeaf).

Could you take a look and see if you could reproduce this locally? Thanks!

cc @Yuhta

@@ -211,14 +211,6 @@ void MapColumnReader::read(
elementReader_->seekTo(childTargetReadOffset_, false);
}

void MapColumnReader::filterRowGroups(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot the base class implementation is not suitable for these 2 classes. Can we override the methods with empty implementations instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry not noticing that. Let me update. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yma11, just to confirm, do you see the failure when you run velox_parquet_e2e_filter_test? I wonder why the PR-time tests didn't catch this.

Copy link
Contributor Author

@yma11 yma11 Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kagamiori, I can't reproduce any failures in this suite locally. FYI.

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kagamiori merged this pull request in e7a6ffc.

Copy link

Conbench analyzed the 1 benchmark run on commit e7a6ffcb.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet reader: can't read null map row in a single line file
5 participants