Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs #2676

Closed
wants to merge 1 commit into from

Conversation

tgravescs
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-3778

This affects if someone is trying to access secure hdfs something like:
val lines = {
val hconf = new Configuration()
hconf.set("mapred.input.dir", "mydir")
hconf.set("textinputformat.record.delimiter","\003432\n")
sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
}

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have started for PR 2676 at commit c3d6b83.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have finished for PR 2676 at commit c3d6b83.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21331/Test PASSed.

@vanzin
Copy link
Contributor

vanzin commented Oct 6, 2014

LGTM.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tgravescs
Copy link
Contributor Author

ping @pwendell @mateiz

@@ -641,6 +641,7 @@ class SparkContext(config: SparkConf) extends Logging {
kClass: Class[K],
vClass: Class[V],
conf: Configuration = hadoopConfiguration): RDD[(K, V)] = {
// mapreduce.Job (NewHadoopJob) merges any credentials for you.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't very clear to me. What is it trying to say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just saying that the call to "new NewHadoopJob" adds the credentials to the conf passed in for you, so we don't need an explicit call to SparkHadoopUtil.get.addCredentials(jconf).

I can try to rephrase .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

The call to new NewHadoopJob" automatically adds security credentials to conf, so we don't need to explicitly add them ourselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I can change the comment

@JoshRosen
Copy link
Contributor

/bump; what's the status on this PR? It looks like this is small and probably pretty close to being merged.

@tgravescs
Copy link
Contributor Author

I was waiting for clarification from @pwendell on my question about his comment.

@harishreedharan
Copy link
Contributor

We really need to merge this soon. This actually makes the NewHadoopRDD useless on secure clusters (users would need to work around this in their own code, and the error message is not even obvious).

@pwendell
Copy link
Contributor

Hey guys - sorry don't block on my comment. If you all think this looks good, just merge it.

@pwendell
Copy link
Contributor

@tgravescs mind brining it up to date?

@pwendell
Copy link
Contributor

I bumped the severity per @harishreedharan's commnet.

@tgravescs
Copy link
Contributor Author

I'll try to bring it up to date today. I'm out all next week though so if you find issues someone else might need to take it over.

@tgravescs
Copy link
Contributor Author

for whatever reason this pull request didn't update. Filed new one:
#4292

Its rebased and I made the comment change suggested by @JoshRosen

@tgravescs tgravescs closed this Jan 30, 2015
asfgit pushed a commit that referenced this pull request Feb 3, 2015
…cure hdfs

.this was #2676

https://issues.apache.org/jira/browse/SPARK-3778

This affects if someone is trying to access secure hdfs something like:
val lines = {
val hconf = new Configuration()
hconf.set("mapred.input.dir", "mydir")
hconf.set("textinputformat.record.delimiter","\003432\n")
sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text])
}

Author: Thomas Graves <tgraves@apache.org>

Closes #4292 from tgravescs/SPARK-3788 and squashes the following commits:

cf3b453 [Thomas Graves] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants