[SPARK-2894][Core] Fixes spark-shell and pyspark CLI options #1864

liancheng · 2014-08-09T01:13:27Z

This PR tries to fix SPARK-2894 in a way different from #1825, namely filtering out all spark-submit options from user application options, since we have control over option list of spark-submit. By extracting the filtering into a utility shell function, fixing other shell scripts can be much easier, and adding new spark-submit options wouldn't need too much duplicated work. In the mid term, we still need a cleaner solution such as what has been discussed in #1715.

All mode pyspark supported (Python, IPython and deprecated pyspark app.py style) should work. @JoshRosen @davies It would be great if you can help review PySpark related code, thanks!

An open issue is Windows scripts need also to be updated.

SparkQA · 2014-08-09T01:19:33Z

QA tests have started for PR 1864. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18237/consoleFull

SparkQA · 2014-08-09T02:09:43Z

QA results for PR 1864:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main ${SUBMISSION_OPTS[@]} spark-shell ${APPLICATION_OPTS[@]}
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main ${SUBMISSION_OPTS[@]} spark-shell ${APPLICATION_OPTS[@]}

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18237/consoleFull

andrewor14 · 2014-08-09T07:24:47Z

bin/spark-shell

@@ -46,11 +49,11 @@ function main(){
        # (see https://github.com/sbt/sbt/issues/562).
        stty -icanon min 1 -echo > /dev/null 2>&1
        export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
-        $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main spark-shell "$@"
+        $FWDIR/bin/spark-submit --class org.apache.spark.repl.Main ${SUBMISSION_OPTS[@]} spark-shell ${APPLICATION_OPTS[@]}


Does this handle quoted strings, e.g. --name "awesome app"? You may need to put double quotes around the argument lists.

Confirmed it doesn't, need to add similar logic as the for loop in pyspark to handle quoted arguments.

andrewor14 · 2014-08-09T07:31:31Z

Hi @liancheng, this looks pretty good. We should merge this quickly because Spark shell for master is currently broken. However, I don't see a code path for handling application args for pyspark shell. Do you intend to add that in this PR?

liancheng · 2014-08-09T14:28:48Z

@andrewor14 Sorry, I was busy attending the 1st Spark Meetup in Beijing today, and thanks to @sarutak, his most recently updated PR (#1825) fixed all the issues you've pointed out here. I'll test it locally to make sure.

As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1715, Closes #1864, and Closes #1861 Closes #1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options

As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by apache#1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#1715, Closes apache#1864, and Closes apache#1861 Closes apache#1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options

…jobs should fail the streaming context (apache#1864) ### What changes were proposed in this pull request? Currently Spark streaming uses Scala.Try to catch NonFatal exception during generating streaming jobs in `JobGenerator.generateJobs`. But if any fatal error (e.g., `OutOfMemoryError`) happens, it won't be caught. Normally just error should propagate up and finally crash Spark runtime, but Spark streaming does this in separate thread. If the fatal error crashes entire JVM, it also can bring Spark down. But for the customer case, the OOM is thrown by `ByteArrayOutputStream` when it cannot grow up to new capacity (new capacity is overflow on Int range, see Java [source code](https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/io/ByteArrayOutputStream.java#L123)). So it is not actually running memory in JVM and JVM won't be down by that. Eventually the OOM is ignored by Spark streaming and next batches are continuing to execute. Such behavior causes unnoticed data loss for the streaming application. We should propagate all failures including fatal ones. ### Why are the changes needed? Making consistent behavior on Spark streaming on fatal and non-fatal errors. ### Does this PR introduce _any_ user-facing change? If user's streaming application contains fatal error during streaming job generation, previously it will be ignored. Now it will bring down the application. Note that if the application contains non-fatal error, it brings down the application previously without this change. That's why the previous behavior that fatal error not causing failure is not correct. ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No

liancheng added 2 commits August 9, 2014 01:17

Fixing pyspark and spark-shell CLI options

e630d19

Filter out spark-submit options when starting Python gateway

5afc584

liancheng mentioned this pull request Aug 9, 2014

[SPARK-2894] spark-shell doesn't accept flags #1825

Closed

andrewor14 reviewed Aug 9, 2014
View reviewed changes

asfgit closed this in 4f4a988 Aug 10, 2014

liancheng deleted the spark-2894 branch September 24, 2014 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-2894][Core] Fixes spark-shell and pyspark CLI options #1864

[SPARK-2894][Core] Fixes spark-shell and pyspark CLI options #1864

liancheng commented Aug 9, 2014

SparkQA commented Aug 9, 2014

SparkQA commented Aug 9, 2014

andrewor14 Aug 9, 2014

liancheng Aug 9, 2014

andrewor14 commented Aug 9, 2014

liancheng commented Aug 9, 2014

[SPARK-2894][Core] Fixes spark-shell and pyspark CLI options #1864

[SPARK-2894][Core] Fixes spark-shell and pyspark CLI options #1864

Conversation

liancheng commented Aug 9, 2014

SparkQA commented Aug 9, 2014

SparkQA commented Aug 9, 2014

andrewor14 Aug 9, 2014

Choose a reason for hiding this comment

liancheng Aug 9, 2014

Choose a reason for hiding this comment

andrewor14 commented Aug 9, 2014

liancheng commented Aug 9, 2014