forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-15410: [C++][Datasets] Improve memory usage of datasets API whe…
…n scanning parquet This PR changes a few things. * The default file readahead is changed to 4. This doesn't seem to affect performance on HDD/SSD and users should already be doing special tuning for S3. Besides, in many cases, users are reading IPC/Parquet files that have many row groups and so we already have sufficient I/O parallelism. This is important for bringing down the overall memory usage as can be seen in the formula below. * The default batch readahead is changed to 16. Previously, when we were doing filtering and projection within the scanner, it made sense to read many batches ahead (generally want at least 2 * # of CPUs in that case). Now that the exec plan is doing the computation the exec plan buffering is instead handled by kDefaultBackpressureLowBytes and kDefaultBackpressureHighBytes. * Moves around the parquet readahead a bit. The previous version would read ahead N row groups. Now we always read ahead exactly 1 row group but we read ahead N batches (this may mean that we read ahead more than 1 row group if the batch size is much larger than the row group size). * Backpressure now utilizes the pause/resume producing signals in the execution plan. I've adding a `counter` argument to the calls to help deal with the challenges that arise when we try and sequence backpressure signals. Partly this was to add support for monitoring backpressure (for tests). Partly it is because I have since become more aware of the reasons for these signals. They are needed to allow for backpressure from the aggregate & join nodes. * Sink backpressure can now be monitored. This makes it easier to test and could be potentially useful to a user that wanted to know when they are consuming the plan too slowly. * Changes the default scanner batch size to 128Ki rows. Now that we have more or less decoupled the scanning batch size from the row group size we can pass smaller batches through the scanner. This makes it easier to get parallelism on small datasets.. Putting this altogether the scanner should now buffer in memory: MAX(fragment_readahead * row_group_size_bytes * 2, fragment_readahead * batch_readahead * batch_size_bytes) The exec plan sink node should buffer ~ kDefaultBackpressureHighBytes bytes. The exec plan itself can have some number of tasks in flight but, assuming there are no pipeline breakers, this will be limited to the number of threads in the CPU thread pool and so it should be parallelism * batch_size_bytes. Adding those together should give the total RAM usage of a plan being read via a sink node that doesn't have any pipeline breakers. When the sink is a write node then there is a separate backpressure consideration based on # of rows (we can someday change this to be # of bytes but it would be a bit tricky at the moment because we need to balance this with the other write parameters like min_rows_per_group). So, given the parquet dataset mentioned in the JIRA (21 files, 10 million rows each, 10 row groups each) and knowing that 1 row group is ~140MB when decompressed into Arrow format we should get the following default memory usage: Scanner readahead = MAX(4 * 140MB * 2, 4 * 16 * 17.5MB) = MAX(1120MB, 1120MB) = 1120MB Sink readahead ~ 1GiB Total RAM usage should then be ~2GiB. - [x] Add tests to verify memory usage - [ ] ~~Update docs to mention that S3 users may want to increase the fragment readahead but this will come at the cost of more RAM usage.~~ - [ ] ~~Update docs to give some of this "expected memory usage" information~~ Closes apache#12228 from westonpace/feature/ARROW-15410--improve-dataset-parquet-memory-usage Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
- Loading branch information
1 parent
6c1a160
commit 78fb2ed
Showing
28 changed files
with
593 additions
and
477 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.