[Datasets] Do not eagerly execute first block for read_xxx API #31558
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Cheng Su scnju13@gmail.com
Why are these changes needed?
This PR is the followup of #31286 (review). The change includes:
read_api.py:read_datasource()
: Remove the logic to eagerly execute first block for read.Dataset.schema()
: Change default value offetch_if_missing
from False to True. So always trigger execution if schame is missing.ExecutionPlan.schema()
: if plan is having lazy block list as output, execute the first block only to get schema, instead of executing all blocks.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.