-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49519][SQL] Merge options of table and relation when constructing FileScanBuilder #47996
Conversation
201d78d
to
54fd1b1
Compare
Hi @cloud-fan. Could you help to review? |
CSVScanBuilder(sparkSession, fileIndex, schema, dataSchema, options) | ||
override def newScanBuilder(options: CaseInsensitiveStringMap): CSVScanBuilder = { | ||
val finalOptions = this.options.asCaseSensitiveMap().asScala.filterNot { | ||
case (k, _) => options.containsKey(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? I think combining two maps will respect the keys in the later map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the filter and combining directly.
case (k, _) => options.containsKey(k) | ||
}.toMap ++ options.asScala.toMap | ||
CSVScanBuilder(sparkSession, fileIndex, schema, dataSchema, | ||
new CaseInsensitiveStringMap(finalOptions.asJava)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do other formats have the same issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan orc/parquet/json/textTable do not combine the options of table and relation, while jdbcTable does merge them. I think we can add merging logic for all of them. How do you think?
cf4cdab
to
91752e9
Compare
91752e9
to
834995b
Compare
Is it possible to add a test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, it needs a test case for the following claim. Please add it to protect your contribution from any potential future regressions, @liujiayi771 .
If only the relation options are used here, the TableCatalog will not be able to pass dsOptions that contains table options to FileScan.
Okay, I will add a test case, thanks. |
@cloud-fan @dongjoon-hyun I have added a test case, could you help to review? |
+1, LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @liujiayi771 , @cloud-fan , @adrian-wang .
Merged to master for Apache Spark 4.0.0-preview2.
To @liujiayi771 , FYI, Apache Spark 4.0.0-preview2 RC1 will be released next Monday. |
…ing FileScanBuilder ### What changes were proposed in this pull request? Merge the options of both `DataSourceV2Relation` and `FileTable` when constructing `FileScanBuilder`. ### Why are the changes needed? Currently, the subclass of `FileTable` only uses the options from relation when constructing the `FileScanBuilder`, which leads to the omission of the contents in `FileTable.options`. For the `TableCatalog`, the `dsOptions` can be set into the `FileTable.options` returned by the `TableCatalog.loadTable` method. If only the relation options are used here, the `TableCatalog` will not be able to pass `dsOptions` that contains table options to `FileScan`. Merge the two options is a better option. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47996 from liujiayi771/csv-options. Lead-authored-by: joey.ljy <joey.ljy@alibaba-inc.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ing FileScanBuilder ### What changes were proposed in this pull request? Merge the options of both `DataSourceV2Relation` and `FileTable` when constructing `FileScanBuilder`. ### Why are the changes needed? Currently, the subclass of `FileTable` only uses the options from relation when constructing the `FileScanBuilder`, which leads to the omission of the contents in `FileTable.options`. For the `TableCatalog`, the `dsOptions` can be set into the `FileTable.options` returned by the `TableCatalog.loadTable` method. If only the relation options are used here, the `TableCatalog` will not be able to pass `dsOptions` that contains table options to `FileScan`. Merge the two options is a better option. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47996 from liujiayi771/csv-options. Lead-authored-by: joey.ljy <joey.ljy@alibaba-inc.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Merge the options of both
DataSourceV2Relation
andFileTable
when constructingFileScanBuilder
.Why are the changes needed?
Currently, the subclass of
FileTable
only uses the options from relation when constructing theFileScanBuilder
, which leads to the omission of the contents inFileTable.options
. For theTableCatalog
, thedsOptions
can be set into theFileTable.options
returned by theTableCatalog.loadTable
method. If only the relation options are used here, theTableCatalog
will not be able to passdsOptions
that contains table options toFileScan
.Merge the two options is a better option.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.