-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs #2676
Conversation
QA tests have started for PR 2676 at commit
|
QA tests have finished for PR 2676 at commit
|
Test PASSed. |
LGTM. |
Can one of the admins verify this patch? |
@@ -641,6 +641,7 @@ class SparkContext(config: SparkConf) extends Logging { | |||
kClass: Class[K], | |||
vClass: Class[V], | |||
conf: Configuration = hadoopConfiguration): RDD[(K, V)] = { | |||
// mapreduce.Job (NewHadoopJob) merges any credentials for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't very clear to me. What is it trying to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just saying that the call to "new NewHadoopJob" adds the credentials to the conf passed in for you, so we don't need an explicit call to SparkHadoopUtil.get.addCredentials(jconf).
I can try to rephrase .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
The call to
new NewHadoopJob"
automatically adds security credentials toconf
, so we don't need to explicitly add them ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can change the comment
/bump; what's the status on this PR? It looks like this is small and probably pretty close to being merged. |
I was waiting for clarification from @pwendell on my question about his comment. |
We really need to merge this soon. This actually makes the NewHadoopRDD useless on secure clusters (users would need to work around this in their own code, and the error message is not even obvious). |
Hey guys - sorry don't block on my comment. If you all think this looks good, just merge it. |
@tgravescs mind brining it up to date? |
I bumped the severity per @harishreedharan's commnet. |
I'll try to bring it up to date today. I'm out all next week though so if you find issues someone else might need to take it over. |
for whatever reason this pull request didn't update. Filed new one: Its rebased and I made the comment change suggested by @JoshRosen |
…cure hdfs .this was #2676 https://issues.apache.org/jira/browse/SPARK-3778 This affects if someone is trying to access secure hdfs something like: val lines = { val hconf = new Configuration() hconf.set("mapred.input.dir", "mydir") hconf.set("textinputformat.record.delimiter","\003432\n") sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]) } Author: Thomas Graves <tgraves@apache.org> Closes #4292 from tgravescs/SPARK-3788 and squashes the following commits: cf3b453 [Thomas Graves] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
https://issues.apache.org/jira/browse/SPARK-3778
This affects if someone is trying to access secure hdfs something like:
val lines = {
val hconf = new Configuration()
hconf.set("mapred.input.dir", "mydir")
hconf.set("textinputformat.record.delimiter","\003432\n")
sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
}