You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thread_pool.h has the following comment, but trying to implement that is beyond my understanding of the libarrow/libparquet deep internals:
/// Note: The iterator's destructor will run until the given generator is fully
/// exhausted. If you wish to abandon iteration before completion then the correct
/// approach is to use a stop token to cause the generator to exhaust early.
Invoking explicitly the Close() method on the record batch reader doesn't improve performance either.
Describe the bug, including details regarding any error messages, version, and platform.
The destruction of the ScannerRecordBatchReader object returned by arrow::dataset::Scanner::ToRecordBatchReader() on a Parquet dataset of 1 GB with ~ 10 million rows and 77 row groups (https://overturemaps-us-west-2.s3.amazonaws.com/release/2024-03-12-alpha.0/theme%3Dbuildings/type%3Dbuilding/part-00000-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet) is extremely long, when reading for example just only a few rows, due to SerialIterator::~SerialIterator() iterating until the end of the dataset. It would be desirable that the destruction of the batch reader doesn't trigger such lengthy operations.
thread_pool.h has the following comment, but trying to implement that is beyond my understanding of the libarrow/libparquet deep internals:
Invoking explicitly the Close() method on the record batch reader doesn't improve performance either.
This is the result of the analysis of OSGeo/gdal#9497
Version: libarrow/libparquet from apache-arrow-15.0.0
Related stack trace:
Component(s)
C++
The text was updated successfully, but these errors were encountered: