Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5068][SQL]fix bug query data when path doesn't exists #3907

Closed
wants to merge 3 commits into from

Conversation

jeanlyn
Copy link
Contributor

@jeanlyn jeanlyn commented Jan 6, 2015

the issue is descript on SPARK-5068 and this PR is fix the same problem as PR3891.however,this pull request catch the exception when it is thrown instead of doing it serially when constructing the RDD

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jeanlyn jeanlyn changed the title Origin/spark 5068 [spark 5068][SQL]fix bug query data when path doesn't exists Jan 6, 2015
@jeanlyn jeanlyn closed this Jan 6, 2015
@jeanlyn jeanlyn reopened this Jan 6, 2015
@jeanlyn
Copy link
Contributor Author

jeanlyn commented Jan 6, 2015

Hi @marmbrus. Any suggestions?

import org.apache.hadoop.mapred.JobID
import org.apache.hadoop.mapred.TaskAttemptID
import org.apache.hadoop.mapred.TaskID
import org.apache.hadoop.mapred._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would leave this as individual imports for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for your suggestions.

@jeanlyn jeanlyn force-pushed the origin/SPARK-5068 branch from 0db35b3 to d4a5b9a Compare January 6, 2015 12:27
@jeanlyn jeanlyn changed the title [spark 5068][SQL]fix bug query data when path doesn't exists [spark-5068][SQL]fix bug query data when path doesn't exists Jan 7, 2015
@jeanlyn jeanlyn changed the title [spark-5068][SQL]fix bug query data when path doesn't exists [SPARK-5068][SQL]fix bug query data when path doesn't exists Jan 7, 2015
@marmbrus
Copy link
Contributor

marmbrus commented Jan 7, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25181 has started for PR 3907 at commit d4a5b9a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25181 has finished for PR 3907 at commit d4a5b9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25181/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 17, 2015

Is this superseded by #4356 ? if so can this be closed?

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Feb 20, 2015

OK.I close this one

@jeanlyn jeanlyn closed this Feb 20, 2015
asfgit pushed a commit that referenced this pull request Apr 12, 2015
…ontext

This PR follow up PR #3907 & #3891 & #4356.
According to  marmbrus  liancheng 's comments, I try to use fs.globStatus to retrieve all FileStatus objects under path(s), and then do the filtering locally.

[1]. get pathPattern by path, and put it into pathPatternSet. (hdfs://cluster/user/demo/2016/08/12 -> hdfs://cluster/user/demo/*/*/*)
[2]. retrieve all FileStatus objects ,and cache them by undating existPathSet.
[3]. do the filtering locally
[4]. if we have new pathPattern,do 1,2 step again. (external table maybe have more than one partition pathPattern)

chenghao-intel jeanlyn

Author: lazymam500 <lazyman500@gmail.com>
Author: lazyman <lazyman500@gmail.com>

Closes #5059 from lazyman500/SPARK-5068 and squashes the following commits:

5bfcbfd [lazyman] move spark.sql.hive.verifyPartitionPath to SQLConf,fix scala style
e1d6386 [lazymam500] fix scala style
f23133f [lazymam500] bug fix
47e0023 [lazymam500] fix scala style,add config flag,break the chaining
04c443c [lazyman] SPARK-5068: fix bug when partition path doesn't exists #2
41f60ce [lazymam500] Merge pull request #1 from apache/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants