-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs #2676
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -641,6 +641,7 @@ class SparkContext(config: SparkConf) extends Logging { | |
kClass: Class[K], | ||
vClass: Class[V], | ||
conf: Configuration = hadoopConfiguration): RDD[(K, V)] = { | ||
// mapreduce.Job (NewHadoopJob) merges any credentials for you. | ||
val job = new NewHadoopJob(conf) | ||
NewFileInputFormat.addInputPath(job, new Path(path)) | ||
val updatedConf = job.getConfiguration | ||
|
@@ -661,7 +662,10 @@ class SparkContext(config: SparkConf) extends Logging { | |
fClass: Class[F], | ||
kClass: Class[K], | ||
vClass: Class[V]): RDD[(K, V)] = { | ||
new NewHadoopRDD(this, fClass, kClass, vClass, conf) | ||
// Add necessary security credentials to the JobConf. Required to access secure HDFS. | ||
val jconf = new JobConf(conf) | ||
SparkHadoopUtil.get.addCredentials(jconf) | ||
new NewHadoopRDD(this, fClass, kClass, vClass, jconf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I don't see what you are referring to in hadoopFile? Do you mean hadoopRDD, which already takes a Jobconf? We need a JobConf for addCredentials routine to work. I could look at add the credentials another way but the nice thing about creating another JobConf is if the user passes in a JobConf instance creating a new one handles transferring any credentials from that one as well. |
||
} | ||
|
||
/** Get an RDD for a Hadoop SequenceFile with given key and value types. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't very clear to me. What is it trying to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just saying that the call to "new NewHadoopJob" adds the credentials to the conf passed in for you, so we don't need an explicit call to SparkHadoopUtil.get.addCredentials(jconf).
I can try to rephrase .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can change the comment