[C++] Lengthy destruction of ScannerRecordBatchReader #40653

rouault · 2024-03-18T23:48:10Z

Describe the bug, including details regarding any error messages, version, and platform.

The destruction of the ScannerRecordBatchReader object returned by arrow::dataset::Scanner::ToRecordBatchReader() on a Parquet dataset of 1 GB with ~ 10 million rows and 77 row groups (https://overturemaps-us-west-2.s3.amazonaws.com/release/2024-03-12-alpha.0/theme%3Dbuildings/type%3Dbuilding/part-00000-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet) is extremely long, when reading for example just only a few rows, due to SerialIterator::~SerialIterator() iterating until the end of the dataset. It would be desirable that the destruction of the batch reader doesn't trigger such lengthy operations.

thread_pool.h has the following comment, but trying to implement that is beyond my understanding of the libarrow/libparquet deep internals:

  /// Note: The iterator's destructor will run until the given generator is fully
  /// exhausted. If you wish to abandon iteration before completion then the correct
  /// approach is to use a stop token to cause the generator to exhaust early.

Invoking explicitly the Close() method on the record batch reader doesn't improve performance either.

This is the result of the analysis of OSGeo/gdal#9497

Version: libarrow/libparquet from apache-arrow-15.0.0

Related stack trace:

#0  arrow::io::RandomAccessFile::ReadAsync (this=0x555555fb42e0, ctx=..., position=31630301, nbytes=29124528) at /home/even/arrow/cpp/src/arrow/io/interfaces.cc:169
#1  0x00007fffdf8eb916 in arrow::io::internal::ReadRangeCache::LazyImpl::MaybeRead (this=0x7fffd41ffe00, entry=0x7fffd420b3e0) at /home/even/arrow/cpp/src/arrow/io/caching.cc:270
#2  0x00007fffdf8eb74b in arrow::io::internal::ReadRangeCache::Impl::WaitFor (this=0x7fffd41ffe00, ranges=std::vector of length 33, capacity 33 = {...}) at /home/even/arrow/cpp/src/arrow/io/caching.cc:249
#3  0x00007fffdf8ebd8f in arrow::io::internal::ReadRangeCache::LazyImpl::WaitFor (this=0x7fffd41ffe00, ranges=std::vector of length 0, capacity 0) at /home/even/arrow/cpp/src/arrow/io/caching.cc:304
#4  0x00007fffdf8ea26c in arrow::io::internal::ReadRangeCache::WaitFor (this=0x7fffd4200900, ranges=std::vector of length 0, capacity 0) at /home/even/arrow/cpp/src/arrow/io/caching.cc:331
#5  0x00007fffec678c0f in parquet::SerializedFile::WhenBuffered (this=0x555556014860, row_groups=std::vector of length 1, capacity 1 = {...}, column_indices=std::vector of length 33, capacity 33 = {...}) at /home/even/arrow/cpp/src/parquet/file_reader.cc:418
#6  0x00007fffec6750f3 in parquet::ParquetFileReader::WhenBuffered (this=0x7fffd41f52d0, row_groups=std::vector of length 1, capacity 1 = {...}, column_indices=std::vector of length 33, capacity 33 = {...}) at /home/even/arrow/cpp/src/parquet/file_reader.cc:905
#7  0x00007fffec41d812 in parquet::arrow::RowGroupGenerator::FetchNext (this=0x7fffd4001630) at /home/even/arrow/cpp/src/parquet/arrow/reader.cc:1123
#8  0x00007fffec41d342 in parquet::arrow::RowGroupGenerator::FillReadahead (this=0x7fffd4001630) at /home/even/arrow/cpp/src/parquet/arrow/reader.cc:1102
#9  0x00007fffec41d1ca in parquet::arrow::RowGroupGenerator::operator() (this=0x7fffd4001630) at /home/even/arrow/cpp/src/parquet/arrow/reader.cc:1090
#10 0x00007fffec42da9c in std::_Function_handler<arrow::Future<std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()> > (), parquet::arrow::RowGroupGenerator>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#11 0x00007fffecfce934 in std::function<arrow::Future<std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()> > ()>::operator()() const (this=0x7fffd42026a0) at /usr/include/c++/9/bits/std_function.h:688
#12 0x00007fffecfcca75 in arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::State::PullSource (this=0x7fffd42026a0) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:1158
#13 0x00007fffecfdb2e1 in arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::InnerCallback::operator() (this=0x555557324588, maybe_next_ref=...) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:1335
#14 0x00007fffecfd9dfb in arrow::Future<std::shared_ptr<arrow::RecordBatch> >::WrapResultOnComplete::Callback<arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::InnerCallback>::operator()(arrow::FutureImpl const&) && (this=0x555557324588, impl=...) at /home/even/arrow/cpp/src/arrow/util/future.h:442
#15 0x00007fffecfd94f1 in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::shared_ptr<arrow::RecordBatch> >::WrapResultOnComplete::Callback<arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::InnerCallback> >::invoke(arrow::FutureImpl const&) (this=0x555557324580, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:152
#16 0x00007fffdf9ba13c in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::operator()(arrow::FutureImpl const&) && (this=0x7fffffffb7b0, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:140
#17 0x00007fffdf9b97ea in arrow::ConcreteFutureImpl::RunOrScheduleCallback (self=std::shared_ptr<arrow::FutureImpl> (use count 2, weak count 1) = {...}, callback_record=..., in_add_callback=true) at /home/even/arrow/cpp/src/arrow/util/future.cc:110
#18 0x00007fffdf9b9224 in arrow::ConcreteFutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) (this=0x5555573d4400, callback=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.cc:64
#19 0x00007fffdf9b7112 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) (this=0x5555573d4400, callback=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.cc:229
#20 0x00007fffecfcc857 in arrow::Future<std::shared_ptr<arrow::RecordBatch> >::AddCallback<arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::InnerCallback, arrow::Future<std::shared_ptr<arrow::RecordBatch> >::WrapResultOnComplete::Callback<arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::InnerCallback> > (this=0x7fffffffb900, on_complete=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.h:493
#21 0x00007fffecfca507 in arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> >::operator() (this=0x7fffd4200610) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:1086
#22 0x00007fffec436eff in std::_Function_handler<arrow::Future<std::shared_ptr<arrow::RecordBatch> > (), arrow::MergedGenerator<std::shared_ptr<arrow::RecordBatch> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#23 0x00007fffecfcaeb2 in std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()>::operator()() const (this=0x7fffd4202a60) at /usr/include/c++/9/bits/std_function.h:688
#24 0x00007fffed1034e6 in arrow::dataset::SlicingGenerator::operator() (this=0x7fffd42005f0) at /home/even/arrow/cpp/src/arrow/dataset/file_parquet.cc:564
#25 0x00007fffed10b8fd in std::_Function_handler<arrow::Future<std::shared_ptr<arrow::RecordBatch> > (), arrow::dataset::SlicingGenerator>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#26 0x00007fffecfcaeb2 in std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()>::operator()() const (this=0x555555f9b5f0) at /usr/include/c++/9/bits/std_function.h:688
#27 0x00007fffecfc8c07 in arrow::FutureFirstGenerator<std::shared_ptr<arrow::RecordBatch> >::operator() (this=0x555555bc34c0) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:665
#28 0x00007fffecfc6c55 in std::_Function_handler<arrow::Future<std::shared_ptr<arrow::RecordBatch> > (), arrow::FutureFirstGenerator<std::shared_ptr<arrow::RecordBatch> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#29 0x00007fffecfcaeb2 in std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()>::operator()() const (this=0x555555a154d0) at /usr/include/c++/9/bits/std_function.h:688
#30 0x00007fffed051d3a in arrow::DefaultIfEmptyGenerator<std::shared_ptr<arrow::RecordBatch> >::operator() (this=0x7fffb4061370) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:2036
#31 0x00007fffed047b49 in std::_Function_handler<arrow::Future<std::shared_ptr<arrow::RecordBatch> > (), arrow::DefaultIfEmptyGenerator<std::shared_ptr<arrow::RecordBatch> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#32 0x00007fffecfcaeb2 in std::function<arrow::Future<std::shared_ptr<arrow::RecordBatch> > ()>::operator()() const (this=0x7fffb40513d0) at /usr/include/c++/9/bits/std_function.h:688
#33 0x00007fffed052222 in arrow::EnumeratingGenerator<std::shared_ptr<arrow::RecordBatch> >::operator() (this=0x7fffb40616f0) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:1524
#34 0x00007fffed047dba in std::_Function_handler<arrow::Future<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > > (), arrow::EnumeratingGenerator<std::shared_ptr<arrow::RecordBatch> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#35 0x00007fffed05cf20 in std::function<arrow::Future<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > > ()>::operator()() const (this=0x555555a15580) at /usr/include/c++/9/bits/std_function.h:688
#36 0x00007fffed052777 in arrow::FutureFirstGenerator<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > >::operator() (this=0x555555ea1250) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:665
#37 0x00007fffed048210 in std::_Function_handler<arrow::Future<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > > (), arrow::FutureFirstGenerator<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#38 0x00007fffed05cf20 in std::function<arrow::Future<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> > > ()>::operator()() const (this=0x555555bc2ee0) at /usr/include/c++/9/bits/std_function.h:688
#39 0x00007fffed052cb2 in arrow::MappingGenerator<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> >, arrow::dataset::EnumeratedRecordBatch>::operator() (this=0x55555775b360) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:163
#40 0x00007fffed0484d5 in std::_Function_handler<arrow::Future<arrow::dataset::EnumeratedRecordBatch> (), arrow::MappingGenerator<arrow::Enumerated<std::shared_ptr<arrow::RecordBatch> >, arrow::dataset::EnumeratedRecordBatch> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#41 0x00007fffed033f92 in std::function<arrow::Future<arrow::dataset::EnumeratedRecordBatch> ()>::operator()() const (this=0x5555578691a0) at /usr/include/c++/9/bits/std_function.h:688
#42 0x00007fffed059b82 in arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch>::operator() (this=0x555555fb3a60) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:1086
#43 0x00007fffed04f29a in std::_Function_handler<arrow::Future<arrow::dataset::EnumeratedRecordBatch> (), arrow::MergedGenerator<arrow::dataset::EnumeratedRecordBatch> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#44 0x00007fffed033f92 in std::function<arrow::Future<arrow::dataset::EnumeratedRecordBatch> ()>::operator()() const (this=0x55555598f5a0) at /usr/include/c++/9/bits/std_function.h:688
#45 0x00007fffed05ac48 in arrow::MappingGenerator<arrow::dataset::EnumeratedRecordBatch, std::optional<arrow::compute::ExecBatch> >::operator() (this=0x555555c166e0) at /home/even/arrow/cpp/src/arrow/util/async_generator.h:163
#46 0x00007fffed04fc1d in std::_Function_handler<arrow::Future<std::optional<arrow::compute::ExecBatch> > (), arrow::MappingGenerator<arrow::dataset::EnumeratedRecordBatch, std::optional<arrow::compute::ExecBatch> > >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#47 0x00007fffed05f73e in std::function<arrow::Future<std::optional<arrow::compute::ExecBatch> > ()>::operator()() const (this=0x5555561360c0) at /usr/include/c++/9/bits/std_function.h:688
#48 0x00007fffe6545d41 in arrow::acero::(anonymous namespace)::SourceNode::<lambda()>::operator()(void) const (__closure=0x555555ea0e28) at /home/even/arrow/cpp/src/arrow/acero/source_node.cc:201
#49 0x00007fffe6549b77 in arrow::Callback::operator() (this=0x555555ea0e28, maybe_control=...) at /home/even/arrow/cpp/src/arrow/util/future.h:837
#50 0x00007fffe6551051 in arrow::Future<std::optional<int> >::WrapResultOnComplete::Callback<arrow::Loop(Iterate) [with Iterate = arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>; Control = std::optional<int>; BreakValueType = int]::Callback>::operator()(const arrow::FutureImpl &) (this=0x555555ea0e28, impl=...) at /home/even/arrow/cpp/src/arrow/util/future.h:442
#51 0x00007fffe6550f39 in arrow::internal::FnOnce<void(const arrow::FutureImpl&)>::FnImpl<arrow::Future<std::optional<int> >::WrapResultOnComplete::Callback<arrow::Loop(Iterate) [with Iterate = arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>; Control = std::optional<int>; BreakValueType = int]::Callback> >::invoke(const arrow::FutureImpl &) (this=0x555555ea0e20, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:152
#52 0x00007fffdf9ba13c in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::operator()(arrow::FutureImpl const&) && (this=0x5555578d3e70, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:140
#53 0x00007fffdf9b97ea in arrow::ConcreteFutureImpl::RunOrScheduleCallback (self=std::shared_ptr<arrow::FutureImpl> (use count 2, weak count 1) = {...}, callback_record=..., in_add_callback=false) at /home/even/arrow/cpp/src/arrow/util/future.cc:110
#54 0x00007fffdf9b9b4e in arrow::ConcreteFutureImpl::DoMarkFinishedOrFailed (this=0x5555578d4490, state=arrow::FutureState::SUCCESS) at /home/even/arrow/cpp/src/arrow/util/future.cc:148
#55 0x00007fffdf9b8fc5 in arrow::ConcreteFutureImpl::DoMarkFinished (this=0x5555578d4490) at /home/even/arrow/cpp/src/arrow/util/future.cc:39
#56 0x00007fffdf9b7072 in arrow::FutureImpl::MarkFinished (this=0x5555578d4490) at /home/even/arrow/cpp/src/arrow/util/future.cc:224
#57 0x00007fffe65593ae in arrow::Future<std::optional<int> >::DoMarkFinished (this=0x5555565a5b48, res=...) at /home/even/arrow/cpp/src/arrow/util/future.h:658
#58 0x00007fffe65591c9 in arrow::Future<std::optional<int> >::MarkFinished (this=0x5555565a5b48, res=...) at /home/even/arrow/cpp/src/arrow/util/future.h:403
#59 0x00007fffe65595c6 in arrow::detail::MarkNextFinished<arrow::Future<std::optional<int> >, arrow::Future<std::optional<int> >, false, false>::operator()(arrow::Result<std::optional<int> > const&) && (this=0x5555565a5b48, res=...) at /home/even/arrow/cpp/src/arrow/util/future.h:111
#60 0x00007fffe6559573 in arrow::Future<std::optional<int> >::WrapResultOnComplete::Callback<arrow::detail::MarkNextFinished<arrow::Future<std::optional<int> >, arrow::Future<std::optional<int> >, false, false> >::operator()(arrow::FutureImpl const&) && (this=0x5555565a5b48, impl=...) at /home/even/arrow/cpp/src/arrow/util/future.h:442
#61 0x00007fffe655952f in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::FnImpl<arrow::Future<std::optional<int> >::WrapResultOnComplete::Callback<arrow::detail::MarkNextFinished<arrow::Future<std::optional<int> >, arrow::Future<std::optional<int> >, false, false> > >::invoke(arrow::FutureImpl const&) (this=0x5555565a5b40, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:152
#62 0x00007fffdf9ba13c in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::operator()(arrow::FutureImpl const&) && (this=0x7fffffffc7b0, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:140
#63 0x00007fffdf9b97ea in arrow::ConcreteFutureImpl::RunOrScheduleCallback (self=std::shared_ptr<arrow::FutureImpl> (use count 2, weak count 1) = {...}, callback_record=..., in_add_callback=true) at /home/even/arrow/cpp/src/arrow/util/future.cc:110
#64 0x00007fffdf9b9224 in arrow::ConcreteFutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) (this=0x555557405e20, callback=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.cc:64
#65 0x00007fffdf9b7112 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void (arrow::FutureImpl const&)>, arrow::CallbackOptions) (this=0x555557405e20, callback=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.cc:229
#66 0x00007fffe655910b in arrow::Future<std::optional<int> >::AddCallback<arrow::detail::MarkNextFinished<arrow::Future<std::optional<int> >, arrow::Future<std::optional<int> >, false, false>, arrow::Future<std::optional<int> >::WrapResultOnComplete::Callback<arrow::detail::MarkNextFinished<arrow::Future<std::optional<int> >, arrow::Future<std::optional<int> >, false, false> > > (this=0x7fffffffc8d0, on_complete=..., opts=...) at /home/even/arrow/cpp/src/arrow/util/future.h:493
#67 0x00007fffe6551877 in arrow::detail::ContinueFuture::operator()<arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)>, const std::optional<arrow::compute::ExecBatch>&>(arrow::Future<std::optional<int> >, arrow::acero::(anonymous namespace)::SourceNode::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)> &&) const (this=0x7fffffffc9bf, next=..., f=...) at /home/even/arrow/cpp/src/arrow/util/future.h:181
#68 0x00007fffe655149a in arrow::detail::ContinueFuture::IgnoringArgsIf<arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)>, arrow::Future<std::optional<int> >, const std::optional<arrow::compute::ExecBatch>&>(std::false_type, arrow::Future<std::optional<int> > &&, arrow::acero::(anonymous namespace)::SourceNode::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)> &&) const (this=0x7fffffffc9bf, next=..., f=...) at /home/even/arrow/cpp/src/arrow/util/future.h:193
#69 0x00007fffe655120c in arrow::Future<std::optional<arrow::compute::ExecBatch> >::ThenOnComplete<arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)>, arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const arrow::Status&)> >::operator()(const arrow::Result<std::optional<arrow::compute::ExecBatch> > &) (this=0x555555ea0518, result=...) at /home/even/arrow/cpp/src/arrow/util/future.h:545
#70 0x00007fffe6551091 in arrow::Future<std::optional<arrow::compute::ExecBatch> >::WrapResultOnComplete::Callback<arrow::Future<std::optional<arrow::compute::ExecBatch> >::ThenOnComplete<arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)>, arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const arrow::Status&)> > >::operator()(const arrow::FutureImpl &) (this=0x555555ea0518, impl=...) at /home/even/arrow/cpp/src/arrow/util/future.h:442
#71 0x00007fffe6550f81 in arrow::internal::FnOnce<void(const arrow::FutureImpl&)>::FnImpl<arrow::Future<std::optional<arrow::compute::ExecBatch> >::WrapResultOnComplete::Callback<arrow::Future<std::optional<arrow::compute::ExecBatch> >::ThenOnComplete<arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const std::optional<arrow::compute::ExecBatch>&)>, arrow::acero::(anonymous namespace)::SourceNode::StartProducing()::<lambda()>::<lambda(const arrow::Status&)> > > >::invoke(const arrow::FutureImpl &) (this=0x555555ea0510, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:152
#72 0x00007fffdf9ba13c in arrow::internal::FnOnce<void (arrow::FutureImpl const&)>::operator()(arrow::FutureImpl const&) && (this=0x7fffb41af678, a#0=...) at /home/even/arrow/cpp/src/arrow/util/functional.h:140
#73 0x00007fffdf9b956e in arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool)::{lambda()#1}::operator()() (__closure=0x7fffb41af668) at /home/even/arrow/cpp/src/arrow/util/future.cc:106
#74 0x00007fffdf9bf88a in arrow::internal::FnOnce<void ()>::FnImpl<arrow::ConcreteFutureImpl::RunOrScheduleCallback(std::shared_ptr<arrow::FutureImpl> const&, arrow::FutureImpl::CallbackRecord&&, bool)::{lambda()#1}>::invoke() (this=0x7fffb41af660) at /home/even/arrow/cpp/src/arrow/util/functional.h:152
#75 0x00007fffdfa19f7d in arrow::internal::FnOnce<void ()>::operator()() && (this=0x7fffffffcb70) at /home/even/arrow/cpp/src/arrow/util/functional.h:140
#76 0x00007fffdfa146ad in arrow::internal::SerialExecutor::RunLoop (this=0x55555611d5b0) at /home/even/arrow/cpp/src/arrow/util/thread_pool.cc:252
#77 0x00007fffed03d806 in arrow::internal::SerialExecutor::IterateGenerator<arrow::dataset::TaggedRecordBatch>(arrow::internal::FnOnce<arrow::Result<std::function<arrow::Future<arrow::dataset::TaggedRecordBatch> ()> > (arrow::internal::Executor*)>)::SerialIterator::Next() (this=0x555555a15110) at /home/even/arrow/cpp/src/arrow/util/thread_pool.h:363
#78 0x00007fffed03d6e2 in arrow::internal::SerialExecutor::IterateGenerator<arrow::dataset::TaggedRecordBatch>(arrow::internal::FnOnce<arrow::Result<std::function<arrow::Future<arrow::dataset::TaggedRecordBatch> ()> > (arrow::internal::Executor*)>)::SerialIterator::~SerialIterator() (this=0x555555a15110, __in_chrg=<optimized out>) at /home/even/arrow/cpp/src/arrow/util/thread_pool.h:338
#79 0x00007fffed055120 in arrow::Iterator<arrow::dataset::TaggedRecordBatch>::Delete<arrow::internal::SerialExecutor::IterateGenerator<arrow::dataset::TaggedRecordBatch>(arrow::internal::FnOnce<arrow::Result<std::function<arrow::Future<arrow::dataset::TaggedRecordBatch> ()> > (arrow::internal::Executor*)>)::SerialIterator>(void*) (ptr=0x555555a15110) at /home/even/arrow/cpp/src/arrow/util/iterator.h:193
#80 0x00007fffecf3284c in std::unique_ptr<void, void (*)(void*)>::~unique_ptr (this=0x555555a152b8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:292
#81 0x00007fffed02e578 in arrow::Iterator<arrow::dataset::TaggedRecordBatch>::~Iterator (this=0x555555a152b8, __in_chrg=<optimized out>) at /home/even/arrow/cpp/src/arrow/util/iterator.h:87
#82 0x00007fffed0288f4 in arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader::~ScannerRecordBatchReader (this=0x555555a152a0, __in_chrg=<optimized out>) at /home/even/arrow/cpp/src/arrow/dataset/scanner.cc:96
#83 0x00007fffed02aded in __gnu_cxx::new_allocator<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader>::destroy<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader> (this=0x555555a152a0, __p=0x555555a152a0) at /usr/include/c++/9/ext/new_allocator.h:152
#84 0x00007fffed02a189 in std::allocator_traits<std::allocator<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader> >::destroy<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader> (__a=..., __p=0x555555a152a0) at /usr/include/c++/9/bits/alloc_traits.h:496
#85 0x00007fffed0296c9 in std::_Sp_counted_ptr_inplace<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader, std::allocator<arrow::dataset::(anonymous namespace)::ScannerRecordBatchReader>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555555a15290) at /usr/include/c++/9/bits/shared_ptr_base.h:557
#86 0x00007fffeda2d84c in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555555a15290) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#87 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555555a15290) at /usr/include/c++/9/bits/shared_ptr_base.h:148
#88 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
#89 std::__shared_ptr<arrow::RecordBatchReader, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
#90 std::__shared_ptr<arrow::RecordBatchReader, (__gnu_cxx::_Lock_policy)2>::reset (this=0x555555993288) at /usr/include/c++/9/bits/shared_ptr_base.h:1287
#91 OGRParquetDatasetLayer::ResetReading (this=0x555555992ec0) at /home/even/gdal/gdal/ogr/ogrsf_frmts/parquet/ogrparquetdatasetlayer.cpp:104

Component(s)

C++

The text was updated successfully, but these errors were encountered:

pitrou · 2024-03-20T16:11:55Z

@bkietz

pitrou · 2024-09-18T14:53:25Z

@westonpace might want to elaborate on what kind of "resource leakage" these comments allude to.

Looking at the SerialExecutor implementation, there doesn't seem to be an obvious cause for resource leaks.

rouault added the Type: bug label Mar 18, 2024

github-actions bot added the Component: C++ label Mar 18, 2024

rouault mentioned this issue Mar 18, 2024

ogr2ogr hangs reading from remote GeoParquet datasets OSGeo/gdal#9497

Closed

rouault changed the title ~~Lengthy destruction of ScannerRecordBatchReader~~ [C++] Lengthy destruction of ScannerRecordBatchReader Mar 19, 2024

pitrou added a commit to pitrou/arrow that referenced this issue Sep 18, 2024

apacheGH-40653: [C++] Avoid running more tasks in ~SerialExecutor

7e5ada6

github-actions bot assigned pitrou Sep 18, 2024

github-actions bot mentioned this issue Sep 18, 2024

GH-40653: [C++] Avoid running more tasks in ~SerialExecutor #44162

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Lengthy destruction of ScannerRecordBatchReader #40653

[C++] Lengthy destruction of ScannerRecordBatchReader #40653

rouault commented Mar 18, 2024 •

edited

Loading

pitrou commented Mar 20, 2024

pitrou commented Sep 18, 2024

[C++] Lengthy destruction of ScannerRecordBatchReader #40653

[C++] Lengthy destruction of ScannerRecordBatchReader #40653

Comments

rouault commented Mar 18, 2024 • edited Loading

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

pitrou commented Mar 20, 2024

pitrou commented Sep 18, 2024

rouault commented Mar 18, 2024 •

edited

Loading