-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2678][Core][SQL] A workaround for SPARK-2678 #1801
Conversation
@liancheng thanks for doing this. I don't have too much comments about it. But I have some other questions from the user perspective:
Sorry, those questions are not related to this PR. |
QA tests have started for PR 1801. This patch merges cleanly. |
QA tests have started for PR 1801. This patch merges cleanly. |
Thanks cheng. This LGTM pending tests. |
Hey @chenghao-intel, answers for your questions:
|
QA results for PR 1801: |
QA results for PR 1801: |
QA tests have started for PR 1801. This patch merges cleanly. |
Thanks Cheng! I'm gonna merge this. |
JIRA issues: - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678) - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874) Related PR: - #1715 This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1801 from liancheng/spark-2874 and squashes the following commits: 8045d7a [Cheng Lian] Make sure test suites pass 8493a9e [Cheng Lian] Using eval to retain quoted arguments aed523f [Cheng Lian] Fixed typo in bin/spark-sql f12a0b1 [Cheng Lian] Worked arount SPARK-2678 daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts (cherry picked from commit a6cd311) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@liancheng @andrewor14 @pwendell With this patch things like |
Oh, it's been reported by #1825. |
Backport SPARK-2678 fix in PR #1801 to branch-1.0 Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1831 from liancheng/spark-2678-for-1.0 and squashes the following commits: cc59929 [Cheng Lian] Backported SPARK-2678 fix
This change broke backwards compatibility with existing spark-submit commands. e.g., this used to work fine and doesn't anymore:
You have to change it to:
I don't think that's acceptable. Lot's of users will be tripped by that, especially since the error message is less than helpful. |
@vanzin The point here is that, the only valid way to call
And we decided to be only compatible with this spec we have at hand. Existing scripts that rely on this bug (i.e., putting |
I see what you're saying, but what the docs say and what the code does are different things... well, let's see how many people are tripped by this once this change goes out. A better error message, at least, would go a long way. Currently it says "Cannot load main class" and tells you to add "--verbose". But if you add "--verbose" at the end, you get the same error message... so nothing is clarified. |
@vanzin I understand what you're worrying about :) Actually in #1715 we've roughly reached a cleaner solution (see the commends) than this one, but some details haven't been quite nailed down yet, and it's already close to 1.1 release. So, this PR is just a not so elegant fix of SPARK-2678 that aims to be compatible with the spec and sacrificing some clarity in bash scripts. We'll do it again in the right way as what Matei, Patrick, Andrew and I discussed in #1715 once details are nailed down. |
Yeah, I agree that this break in backward compatibility is pretty bad. At the same time we don't want to introduce some new config (e.g. the proposals in #1715) until we settle on a reasonably intuitive one. Otherwise we may have to support a clumsy way of passing arguments forever. Fortunately the documented way is not affected. It just so happens that our existing implementation is much more flexible than we intended it to be. But yes, at the very least we should have an error message that points the users of the now deprecated way to the right direction. |
As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1715, Closes #1864, and Closes #1861 Closes #1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1715, Closes #1864, and Closes #1861 Closes #1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
JIRA issues: - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678) - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874) Related PR: - apache#1715 This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#1801 from liancheng/spark-2874 and squashes the following commits: 8045d7a [Cheng Lian] Make sure test suites pass 8493a9e [Cheng Lian] Using eval to retain quoted arguments aed523f [Cheng Lian] Fixed typo in bin/spark-sql f12a0b1 [Cheng Lian] Worked arount SPARK-2678 daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
As sryza reported, spark-shell doesn't accept any flags. The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by apache#1801 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Cheng Lian <lian.cs.zju@gmail.com> Closes apache#1715, Closes apache#1864, and Closes apache#1861 Closes apache#1825 from sarutak/SPARK-2894 and squashes the following commits: 47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894 2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py 98287ed [Kousuke Saruta] Removed useless code from java_gateway.py 513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces 28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments 5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
* rdar://84031512 Add stored procedure API * Try to fix scala 2.13 build * Trigger build Co-authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
JIRA issues:
Related PR:
This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.