-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2894] spark-shell doesn't accept flags #1825
Conversation
Jenkins, add to whitelist. |
QA tests have started for PR 1825. This patch merges cleanly. |
This will allow spark-shell to take spark-submit options, but will remove its ability to take spark-shell-specific options (currently there's only one, "file"). I'm unclear on the best way to support both of these. |
does the shell actually take flags? I didn't realize this when I OK'd #1801. If there are specific flags, we should trap them and pass them after |
AFAIK I don't believe spark-shell takes in any application-specific arguments, at least not documented ones. @sryza What file are you referring to? |
org.apache.spark.repl.SparkRunnerSettings |
ah... looks like we need some special logic to filter that one out here |
QA results for PR 1825: |
QA tests have started for PR 1825. This patch merges cleanly. |
QA results for PR 1825: |
It would be good to verify whether pyspark is affected too (see java_gateway.py) |
There seems to be a similar PySpark issue, too (I just ran into it during testing): |
args="$@" | ||
inLoadFileFlag=0 | ||
|
||
for arg in $args;do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, could you use the same approach that we use in spark-sql
script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K. I'll check it.
Maybe instead of filtering out all I'll try to write a utility script to filter out |
@liancheng However that means every time we want to change a spark-submit config, we need to change it in multiple places, which might make things harder to maintain. For spark-shell, aren't we only expecting one type of argument? I'm not super familiar with this functionality of passing a settings file to spark-shell, but I don't think this file path matches |
@andrewor14 Ah OK, I only had a glance at @sarutak Thanks for help fixing this. I'd also suggest what @pwendell suggested, i.e. using a similar while/case structure in |
stty icanon echo > /dev/null 2>&1 | ||
else | ||
extractLoadFileOpt "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this call before the if
statement to cover the Cygwin branch. And please use 4-space indentation here :)
@JoshRosen I'm checking |
@liancheng |
QA tests have started for PR 1825. This patch merges cleanly. |
QA results for PR 1825: |
case $1 in | ||
-i) | ||
CLI_ARGS+=($1); shift | ||
CLI_ARGS+=($1); shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation is off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, included tabs.
QA tests have started for PR 1825. This patch merges cleanly. |
QA results for PR 1825: |
This failed a flaky test. Retest this please |
QA tests have started for PR 1825. This patch merges cleanly. |
@JoshRosen Thanks, then I think gathering all |
QA tests have started for PR 1825. This patch merges cleanly. |
exit 1; | ||
fi | ||
SUBMISSION_OPTS+=($1); shift | ||
SUBMISSION_OPTS+=($1); shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following case fails because this line doesn't handle quoted string with spaces properly:
./bin/pyspark app.py --master spark://lian-laptop.local:7077 --name "awesome name"
A possible fix is to replace this line with the trick bin/pyspark
uses:
whitespace="[[:space:]]"
i=$1
if [[ $1 =~ \" ]]; then i=$(echo $1 | sed 's/\"/\\\"/g'); fi
if [[ $1 =~ $whitespace ]]; then i=\"$1\"; fi
SUBMISSION_OPTS+=($i); shift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that. Newer PR is modified that.
I think the main reason is $1 is not double-quoted in utils.sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm afraid only adding double quotes doesn't help, the case I mentioned above still fails, and the pyspark
trick can be helpful. And we only need to deal with the argument part of those options that need an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually double quotes should be sufficient here, since it preserves whitespaces by default. The only reason why we needed the special handling for pyspark is that we need to pass the string literal in as an environment variable, otherwise python won't know how to split our arguments. We don't need to do the same here because this is already split by bash.
Hey @sarutak, thanks for fixing this :) Did some tests locally and the only issue I found is the quoted string case I just commented. Otherwise LGTM. |
APPLICATION_OPTS=() | ||
while (($#)); do | ||
case "$1" in | ||
--master | --deploy-mode | --class | --name | --jars | --py-files | --files | \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a note here that says something like All changes here must be reflected in SparkSubmitArguments.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and add a similar comment in SparkSubmitArguments.scala too. It would be bad if these two go out of sync
I have tested this locally and I was able to get both spark shell and pyspark working as before. I also tried it with quoted arguments and backslashes and the arguments are still propagated properly. I haven't tried the settings file for spark-shell, however. Pending a few comments this LGTM, and we should get this in quickly if possible. |
Added comments in utils.py and SparkSubmitArguments.scala to notice someone that both of the files should be modified simultaneously
QA tests have started for PR 1825. This patch merges cleanly. |
Added comments in utils.py and SparkSubmitArguments.scala to notice someone that both of the files should be modified simultaneously
QA tests have started for PR 1825. This patch merges cleanly. |
LGTM, thanks @sarutak and @liancheng. |
Okay cool - we can merge this once the tests pass. |
Tests passed so I'm going to merge it |
As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1715, Closes #1864, and Closes #1861 Closes #1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
Does this work with Python? I ask because
doesn't seem to respect my ipython flags. Am I using this right? |
You need to set them through |
Doh, I forgot about that. I tried a bunch of other Python configurations and in every case the behavior seems to match 1.0.2, which is great. Thanks! |
As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by apache#1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#1715, Closes apache#1864, and Closes apache#1861 Closes apache#1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
As @sryza reported, spark-shell doesn't accept any flags.
The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801