Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR partially fixes #1404. In particular it changes the main task in node_copier.cpp, which is to populate the columns, to issues a single task that is "morselized". Previously, we would issue one task for each part of the input csv or parquet file.
There are 3 other things I squeezed into this PR.
This would let then any thread working on this pipeline to stop in the next morsel they get (the code is inside physical_operator.h).
This PR changes this to also stop early similarly if the task has errored by simply adding an else to the if as follows:
The PR also makes the node_copier benefit from this by adding a similar code in physical_operator.h to node_copier's
batchPopulateColumnsTaskNew
function, which is where threads get their morsels.I did not write a unit test for this but I tested this manually by checking that indeed that large tables are not copied from CSV files till the end if there is an error in the beginning of CSVs.