-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve maven plugin configuration #590
Conversation
The change adds the `./yarn/stable/target/<scala-version>/classes` to the _Classpath_ when a _dependencies_ assembly is available at the assembly directory. Why is this change necessary? Ease the development features and bug-fixes for Spark-YARN. [ticket: X] : NA Author : bernardo.gomezpalacio@gmail.com Reviewer : ? Testing : ?
…ectory. Why is this change necessary? While developing in Spark I found myself rebuilding either the dependencies assembly or the full spark assembly. I kept running into the case of having both the dep-assembly and full-assembly in the same directory and getting an error when I called either `spark-shell` or `spark-submit`. Quick fix: move either of them as a .bkp file depending on the development work flow you are executing at the moment and enabling the `spark-class` to ignore non-jar files. An other option could be to move the "offending" jar to a different directory but in my opinion keeping them in there is a bit tidier. e.g. ``` ll ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp ``` [ticket: X] : ?
…UNCH_COMMAND . Why is this change necessary? Most likely when enabling the `--log-conf` through the `spark-shell` you are also interested on the full invocation of the java command including the _classpath_ and extended options. e.g. ``` INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark INFO: Spark Master is yarn-client INFO: Spark REPL options -Dspark.logConf=true Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main ``` [ticket: X] : ?
Why is this change necessary? Renamed the SBT "root" project to "spark" to enhance readability. Currently the assembly is qualified with the Hadoop Version but not if YARN has been enabled or not. This change qualifies the assembly such that it is easy to identify if YARN was enabled. e.g ``` ./make-distribution.sh --hadoop 2.3.0 --with-yarn ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar ``` vs ``` ./make-distribution.sh --hadoop 2.3.0 ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar ``` [ticket: X] : ?
Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and removed incorrect version for the "org.apache.hadoop:hadoop-client" dependency at yarn/pom.xml.
Can one of the admins verify this patch? |
@witgo thanks! This was indeed a very pleasant surprise. |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
|
||
# Apache Hadoop 0.23.x | ||
$ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -Dyarn.version=0.23.7 -DskipTests clean package | ||
|
||
# Different versions of HDFS vs YARN. | ||
$ mvn -Pyarn-alpha -Dhadoop.version=2.3.0 -Dyarn.version= 0.23.7 -DskipTests clean package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Dyarn.version=0.23.7
Merged build finished. All automated tests passed. |
All automated tests passed. |
…n spark built for hadoop 2.3.0 , 2.4.0
@witgo would you mind remove your travis changes and |
Jenkins, test this please. |
@@ -21,7 +21,6 @@ | |||
<groupId>org.apache.spark</groupId> | |||
<artifactId>yarn-parent_2.10</artifactId> | |||
<version>1.0.0-SNAPSHOT</version> | |||
<relativePath>../pom.xml</relativePath> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these might be necessary for users who link against this artifact. In general the yarn module is not something people really link against in spark, but we do publish it, so I think it might be good to include these.
Merged build triggered. |
Merged build started. |
@@ -55,7 +55,7 @@ object SparkBuild extends Build { | |||
val SCALAC_JVM_VERSION = "jvm-1.6" | |||
val JAVAC_JVM_VERSION = "1.6" | |||
|
|||
lazy val root = Project("root", file("."), settings = rootSettings) aggregate(allProjects: _*) | |||
lazy val root = Project("spark", file("."), settings = rootSettings) aggregate(allProjects: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering - what is the benefit of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to increase readability.
@witgo - would you mind isolating (1) and (3) and putting them in a separate pull request? Those I think should go in ASAP. #626 still has a bunch of other changes. I might just do this myself quickly. If this is a sufficient fix for SPARK-1693, could you send a pull request containing this change? |
@pwendell |
This is a part of [PR 590](#590) Author: witgo <witgo@qq.com> Closes #626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version
This is a part of [PR 590](#590) Author: witgo <witgo@qq.com> Closes #626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version (cherry picked from commit fb05432) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@pwendell |
This still has some changes that I don't know are intended. commons-lang 2.5 should not be a dependency now. I don't know that conf XML files should be ignore by git? |
@srowen |
Where does Spark use commons-lang though? It uses commons-lang3. You would declare it as a dependency if it were used, or to resolve a version conflict, but is there evidence of the latter? |
@srowen |
|
Yea, are they colliding in the assembly jar? or does Maven resolve to 2.5? the latter should be fine. If they're colliding, then I agree that we may have to manually manage it for tidiness, and state why in a comment. |
@srowen |
Submit a new Pull Request #786 |
This is a part of [PR 590](apache#590) Author: witgo <witgo@qq.com> Closes apache#626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version
SPARK-1085: Fix Jenkins pull request builder for branch-0.9 (scalastyle command not found) Added a dummy scalastyle task to sbt. https://spark-project.atlassian.net/browse/SPARK-1085 Author: Reynold Xin <rxin@apache.org> Closes apache#590 and squashes the following commits: d0889bd [Reynold Xin] SPARK-1085: Fix Jenkins pull request builder for branch-0.9 (scalastyle command not found)
* update run.yaml Little optimization, to make build not fail * Update run.yaml * Update run.yaml * Update run.yaml
spark-shell
option--log-conf
also enables the SPARK_PRINT_LAUNCH_COMMAND