Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve maven plugin configuration #590

Closed
wants to merge 34 commits into from
Closed

Conversation

witgo
Copy link
Contributor

@witgo witgo commented Apr 29, 2014

  1. The spark-shell option --log-conf also enables the SPARK_PRINT_LAUNCH_COMMAND
  2. Remove unnecessary maven-antrun-plugin configuration
  3. Improve scalatest-maven-plugin configuration

berngp and others added 7 commits April 15, 2014 14:03
The change adds the `./yarn/stable/target/<scala-version>/classes` to
the _Classpath_ when a _dependencies_ assembly is available at the
assembly directory.

Why is this change necessary?
Ease the development features and bug-fixes for Spark-YARN.

[ticket: X] : NA

Author      : bernardo.gomezpalacio@gmail.com
Reviewer    : ?
Testing     : ?
…ectory.

Why is this change necessary?

While developing in Spark I found myself rebuilding either the
dependencies assembly or the full spark assembly. I kept running into
the case of having both the dep-assembly and full-assembly in the same
directory and getting an error when I called either `spark-shell` or
`spark-submit`.

Quick fix: move either of them as a .bkp file depending on
the development work flow you are executing at the moment and enabling
the `spark-class` to ignore non-jar files. An other option could be to
move the "offending" jar to a different directory but in my opinion
keeping them in there is a bit tidier.

e.g.

```
ll ./assembly/target/scala-2.10
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp
```

[ticket: X] : ?
…UNCH_COMMAND .

Why is this change necessary?
Most likely when enabling the `--log-conf` through the `spark-shell` you
are also interested on the full invocation of the java command including the
_classpath_ and extended options. e.g.

```
INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark
INFO: Spark Master is yarn-client
INFO: Spark REPL options   -Dspark.logConf=true
Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main
```

[ticket: X] : ?
Why is this change necessary?
Renamed the SBT "root" project to "spark" to enhance readability.

Currently the assembly is qualified with the Hadoop Version but not if
YARN has been enabled or not. This change qualifies the assembly such
that it is easy to identify if YARN was enabled.

e.g

```
./make-distribution.sh --hadoop 2.3.0 --with-yarn

ls -l ./assembly/target/scala-2.10
    spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar
```

vs

```
./make-distribution.sh --hadoop 2.3.0

ls -l ./assembly/target/scala-2.10
    spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar
```

[ticket: X] : ?
Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and
removed incorrect version for the "org.apache.hadoop:hadoop-client"
dependency at yarn/pom.xml.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@berngp
Copy link
Contributor

berngp commented Apr 29, 2014

@witgo thanks! This was indeed a very pleasant surprise.

@pwendell
Copy link
Contributor

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.


# Apache Hadoop 0.23.x
$ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -Dyarn.version=0.23.7 -DskipTests clean package

# Different versions of HDFS vs YARN.
$ mvn -Pyarn-alpha -Dhadoop.version=2.3.0 -Dyarn.version= 0.23.7 -DskipTests clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Dyarn.version=0.23.7

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14567/

@pwendell
Copy link
Contributor

pwendell commented May 1, 2014

@witgo would you mind remove your travis changes and .jvmopts? I think this PR looks good but I'd like to merge it without those changes. We'll actually probably disable the travis build for now... I think people find it confusing and don't realize it's just experimental.

@pwendell
Copy link
Contributor

pwendell commented May 1, 2014

Jenkins, test this please.

@@ -21,7 +21,6 @@
<groupId>org.apache.spark</groupId>
<artifactId>yarn-parent_2.10</artifactId>
<version>1.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these might be necessary for users who link against this artifact. In general the yarn module is not something people really link against in spark, but we do publish it, so I think it might be good to include these.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@@ -55,7 +55,7 @@ object SparkBuild extends Build {
val SCALAC_JVM_VERSION = "jvm-1.6"
val JAVAC_JVM_VERSION = "1.6"

lazy val root = Project("root", file("."), settings = rootSettings) aggregate(allProjects: _*)
lazy val root = Project("spark", file("."), settings = rootSettings) aggregate(allProjects: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - what is the benefit of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to increase readability.

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

@witgo - would you mind isolating (1) and (3) and putting them in a separate pull request? Those I think should go in ASAP. #626 still has a bunch of other changes. I might just do this myself quickly.

If this is a sufficient fix for SPARK-1693, could you send a pull request containing this change?
witgo@0ed124d

@witgo
Copy link
Contributor Author

witgo commented May 4, 2014

@pwendell
I did not notice here, has been modified

asfgit pushed a commit that referenced this pull request May 4, 2014
This is a part of [PR 590](#590)

Author: witgo <witgo@qq.com>

Closes #626 from witgo/yarn_version and squashes the following commits:

c390631 [witgo] restore  the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version
asfgit pushed a commit that referenced this pull request May 4, 2014
This is a part of [PR 590](#590)

Author: witgo <witgo@qq.com>

Closes #626 from witgo/yarn_version and squashes the following commits:

c390631 [witgo] restore  the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version
(cherry picked from commit fb05432)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@witgo
Copy link
Contributor Author

witgo commented May 12, 2014

@pwendell
Big changes have been removed.
The PR can be merged into master and branch-1.0.

@srowen
Copy link
Member

srowen commented May 12, 2014

This still has some changes that I don't know are intended. commons-lang 2.5 should not be a dependency now. I don't know that conf XML files should be ignore by git?

@witgo
Copy link
Contributor Author

witgo commented May 12, 2014

@srowen
In some cases,commons-lang has multiple version dependency.
fairscheduler.xml,hive-site.xml should be ignored

@srowen
Copy link
Member

srowen commented May 12, 2014

Where does Spark use commons-lang though? It uses commons-lang3. You would declare it as a dependency if it were used, or to resolve a version conflict, but is there evidence of the latter?

@witgo
Copy link
Contributor Author

witgo commented May 12, 2014

@srowen
Has been removed

@witgo
Copy link
Contributor Author

witgo commented May 12, 2014

[INFO] |  +- org.apache.hadoop:hadoop-client:jar:1.0.4:compile
[INFO] |  |  \- org.apache.hadoop:hadoop-core:jar:1.0.4:compile
[INFO] |  |     +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |     +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |     +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |     |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |     |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  |     |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |     |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |     |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |     +- commons-el:commons-el:jar:1.0:compile
[INFO] |  |     +- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO] |  |     \- oro:oro:jar:2.0.8:compile
[INFO] +- org.apache.hive:hive-exec:jar:0.12.0:compile
[INFO] |  +- com.google.protobuf:protobuf-java:jar:2.4.1:compile
[INFO] |  +- org.iq80.snappy:snappy:jar:0.2:compile
[INFO] |  +- org.json:json:jar:20090211:compile
[INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  +- com.googlecode.javaewah:JavaEWAH:jar:0.3.2:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-common:jar:0.23.9:compile
[INFO] |  |  |  +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  |  +- commons-el:commons-el:jar:1.0:compile
[INFO] |  |  |  +- commons-lang:commons-lang:jar:2.5:compile
[INFO] |  |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  |  +- oro:oro:jar:2.0.8:compile
[INFO] |  |  |  +- org.apache.hadoop:hadoop-auth:jar:0.23.9:compile
[INFO] |  |  |  \- com.googlecode.json-simple:json-simple:jar:1.1:compile

@srowen
Copy link
Member

srowen commented May 12, 2014

Yea, are they colliding in the assembly jar? or does Maven resolve to 2.5? the latter should be fine. If they're colliding, then I agree that we may have to manually manage it for tidiness, and state why in a comment.

@witgo
Copy link
Contributor Author

witgo commented May 12, 2014

@srowen
I will submit a new Pull Request to solve this problem.

@witgo witgo changed the title Improve build configuration Ⅱ Improve maven plugin configuration May 15, 2014
@witgo
Copy link
Contributor Author

witgo commented May 15, 2014

Submit a new Pull Request #786

@witgo witgo closed this May 15, 2014
@witgo witgo deleted the improved_build branch June 10, 2014 16:02
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
This is a part of [PR 590](apache#590)

Author: witgo <witgo@qq.com>

Closes apache#626 from witgo/yarn_version and squashes the following commits:

c390631 [witgo] restore  the yarn dependency declarations
f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha
2df6cf5 [witgo] review commit
a1d876a [witgo] review commit
20e7e3e [witgo] review commit
c76763b [witgo] The default value of yarn.version is equal to hadoop.version
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
SPARK-1085: Fix Jenkins pull request builder for branch-0.9 (scalastyle command not found)

Added a dummy scalastyle task to sbt.

https://spark-project.atlassian.net/browse/SPARK-1085

Author: Reynold Xin <rxin@apache.org>

Closes apache#590 and squashes the following commits:

d0889bd [Reynold Xin] SPARK-1085: Fix Jenkins pull request builder for branch-0.9 (scalastyle command not found)
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* update run.yaml

Little optimization, to make build not fail

* Update run.yaml

* Update run.yaml

* Update run.yaml
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants