-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system #29976
Conversation
… HDFS file system
This is similar changes to #29971. @HyukjinKwon @yuningzh-db @dongjoon-hyun Please, review this PR. |
I ran the test from this PR (just changed |
Kubernetes integration test starting |
Kubernetes integration test status failure |
I have looked at build failures, it seems they are not related to the changes - some failures while downloading artefacts. |
Test build #129554 has finished for PR 29976 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @MaxGekk and @HyukjinKwon .
The K8s IT test failure is irrelevant to this one.
- Test basic decommissioning *** FAILED ***
Merged to master for Apache Spark 3.1.0
@dongjoon-hyun Can this be merged to
but gets the following exception because the settings above are not propagated to the filesystem:
|
@HyukjinKwon You merged similar fix for avro to branch-3.0 in #29971 . WDYT should I open a PR with the changes for branch-3.0? |
So it can fix a bug right? sure let's open a PR to port back. |
+1 for backporting, @MaxGekk and @HyukjinKwon . |
…tions to underlying HDFS file system Propagate ORC options to Hadoop configs in Hive `OrcFileFormat` and in the regular ORC datasource. There is a bug that when running: ```scala spark.read.format("orc").options(conf).load(path) ``` The underlying file system will not receive the conf options. Yes Added UT to `OrcSourceSuite`. Closes apache#29976 from MaxGekk/orc-option-propagation. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit c5f6af9) Signed-off-by: Max Gekk <max.gekk@gmail.com>
Here is the backport to |
Regarding to #29976 (comment) , I could put the test to a common trait and test all built-in datasources including Avro, ORC, LibSVM, CSV and so on. Let me know if you think it makes sense for improving test coverage. cc @gatorsmile @cloud-fan |
Here is the PR #30067 with common test. |
What changes were proposed in this pull request?
Propagate ORC options to Hadoop configs in Hive
OrcFileFormat
and in the regular ORC datasource.Why are the changes needed?
There is a bug that when running:
spark.read.format("orc").options(conf).load(path)
The underlying file system will not receive the conf options.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Added UT to
OrcSourceSuite
.