-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs #4292
Conversation
@JoshRosen you had looked at this before mind taking another look |
Test build #26408 has started for PR 4292 at commit
|
Test build #26408 has finished for PR 4292 at commit
|
Test PASSed. |
+1 |
1 similar comment
+1 |
new NewHadoopRDD(this, fClass, kClass, vClass, conf) | ||
// Add necessary security credentials to the JobConf. Required to access secure HDFS. | ||
val jconf = new JobConf(conf) | ||
SparkHadoopUtil.get.addCredentials(jconf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mode is not yarn, SparkHadoopUtil.addCredentials didnot do anything. so here it donot resolve when non-Yarn mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since security is supported only in Yarn mode, this should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks like addCredentials
is implemented as a no-op, so this should be fine.
@pwendell - This is a small enough patch - and relatively less risk. It would be great to merge this into 1.3 |
LGTM and seems very straightforward. |
LGTM, too, so I'm going to merge this into |
Ugh, I just realized that this might potentially regress behavior for some weird corner-cases that arise due to our shared mutable |
For example: 15/02/02 22:49:12 INFO SparkILoop: Created spark context..
Spark context available as sc.
scala> import org.apache.hadoop.mapred.{FileInputFormat, InputFormat, JobConf, SequenceFileInputFormat, TextInputFormat}
import org.apache.hadoop.mapred.{FileInputFormat, InputFormat, JobConf, SequenceFileInputFormat, TextInputFormat}
scala> import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.conf.Configuration
scala> val conf = new Configuration()
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml
scala> val jobConf = new JobConf(conf)
jobConf: org.apache.hadoop.mapred.JobConf = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml
scala> jobConf.getInt("myInt", 0)
res3: Int = 0
scala> conf.setInt("myInt", 1)
scala> jobConf.getInt("myInt", 0)
res5: Int = 0 |
@JoshRosen that's a great point and could cause regressing behavior that would be really hard for users to diagnose. @tgravescs. What about deferring the injection of the credentials until just before the conf is broadcast? |
We could also just leave it as-is and then do something like that if we find this is encountered by users. |
I found some previous discussion of this issue. I'd say that expecting |
I think @JoshRosen is right. I don't think we need to worry about the change in the conf after the RDD has been defined. That makes sense. |
sorry for my delay, I was out last week. I would have to agree with @JoshRosen last comment. If they have already created the RDD then I wouldn't expect any changes to the hadoop configuration to apply. |
.this was #2676
https://issues.apache.org/jira/browse/SPARK-3778
This affects if someone is trying to access secure hdfs something like:
val lines = {
val hconf = new Configuration()
hconf.set("mapred.input.dir", "mydir")
hconf.set("textinputformat.record.delimiter","\003432\n")
sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
}