Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6435] spark-shell --jars option does not add all jars to classpath #5227

Closed
wants to merge 6 commits into from

Conversation

tsudukim
Copy link
Contributor

Modified to accept double-quotated args properly in spark-shell.cmd.

…path

Modified to accept multiple jars in spark-shell.cmd.
@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #29297 has finished for PR 5227 at commit 04f4291.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 27, 2015

I tend to trust your judgment on modifying these Windows scripts, especially if you've verified that this works where it did not before. CC @nishkamravi2 and @vanzin since this will touch the scripts also being modified in #5085

@vanzin
Copy link
Contributor

vanzin commented Mar 27, 2015

I think the real problem here is that , is a delimiter in Windows batch scripts and thus needs to be escaped:
http://ss64.com/nt/syntax-esc.html

So I think the correct fix would be to change CommandBuilderUtils.quoteForBatchScript to escape commas also. That would also avoid the temp file... which can cause problems if two jobs are being launched at the same time.

@vanzin
Copy link
Contributor

vanzin commented Mar 27, 2015

Ah, and even though jenkins doesn't run unit tests on Windows, it would be nice to update SparkLauncherSuite to add a unit test for this, since that will exercise this code path (regardless of which fix).

@srowen
Copy link
Member

srowen commented Mar 28, 2015

OK @vanzin sounds like you recommend we not make this change. @tsudukim not sure if you feel comfortable trying to make the change he suggests?

@vanzin
Copy link
Contributor

vanzin commented Mar 30, 2015

Hi @tsudukim , are you planning on trying my suggestion? Otherwise I can make the change and test it.

@tsudukim
Copy link
Contributor Author

@vanzin I'm not sure I have got your suggestion right, but as I wrote in JIRA, I think this is not the Java side problem.
https://issues.apache.org/jira/browse/SPARK-6435
Exactly the , is the problem. When Java application receives the arguments, they are already splitted by comma. If we want to escape comma, we should do it in batch side.

@tsudukim
Copy link
Contributor Author

Java application means the launcher launcher\src\main\java\org\apache\spark\launcher\Main.java.

tsudukim added 2 commits April 2, 2015 15:05
Conflicts:
	bin/spark-class2.cmd
- added random string to the temporary filename
- double-quotation followed by `cmd /c` did not worked properly
- no need to escape `=` by `^`
- if double-quoted string ended with `\` like classpath, the last `\` is parsed as the escape charactor and the closing `"` didn't work properly
@tsudukim
Copy link
Contributor Author

tsudukim commented Apr 2, 2015

Ah @vanzin, I didn't understand your suggestion. CommandBuilderUtils needs modified to escape comma.
But I think we still need to modify spark-class2.cmd as well.

When we execute bin\spark-shell.cmd --jars "C:\Path to\jar1.jar,C:\Path to\jar2.jar",
we get following with original spark-class2.cmd even if CommandBuilderUtils is fixed to escape comma.
... --jars "C:\Path to\jar1.jar C:\Path to\jar2.jar" ...
but we get with my PR spark-class2.cmd
... --jars "C:\Path to\jar1.jar,C:\Path to\jar2.jar" ...

Two job at the same time is surely a problem, so added random string to the filename to reduce the probability this problem occurs.

I know it's better to solve this problem without using temporary file if possible, but I have no idea now. If you've got any suggestion, please let me know.
I'd really like to replace all *.cmd to powershell rather than write ugly codes like this....

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29590 has finished for PR 5227 at commit 0d4dc41.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@tsudukim
Copy link
Contributor Author

tsudukim commented Apr 2, 2015

oops, forgot to include fixed test code.

default:
break;
}
quoted.appendCodePoint(cp);
}
if (arg.codePointAt(arg.length() - 1) == '\\') {
quoted.append("\\");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indented too far. Why would a backslash need escaping only in the final position?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In batch, backslash \ followed by double-quotation " is parsed as a escape charactor.
For example, \" means " itself.
We face a problem in this case:
java.exe -cp "C:\path\to\directory\" ...
The closing " doesn't work properly, so we should escape only the last \.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I assumed you would escape all backslashes if they are the escape character. Is that not necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backslash is escape character only when followed by ". I assumed it's the only case when the string ends with backslash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it's not a general escape character, got it.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29599 has finished for PR 5227 at commit 1fee420.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@@ -260,15 +260,14 @@ static String quoteForBatchScript(String arg) {
quoted.append('"');
break;

case '=':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run SparkLauncherSuite on Windows? I added this specifically to fix an issue with that path (see 92a9cfb).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run SparkLauncherSuite on Windows and it's OK.
If double-quotation is parsed properly, = in double-quotation is not need to be escaped.

@vanzin
Copy link
Contributor

vanzin commented Apr 3, 2015

I tried to figure out a solution that didn't involve a temp file, but failed. Guess I'll just delegate - change looks fine except for the one comment I left about testing.

@srowen
Copy link
Member

srowen commented Apr 6, 2015

@tsudukim back to you regarding the test case. I'd like to get this in one way or the other since it's an important fix for Windows.

@srowen
Copy link
Member

srowen commented Apr 12, 2015

@tsudukim Ping? I think this still has an outstanding question from Marcelo above that should be addressed.

@tsudukim
Copy link
Contributor Author

Ah, sorry to be late... I didn't have time to struggle with it. I'll do it this week.

@srowen
Copy link
Member

srowen commented Apr 22, 2015

Is this one still active? I'd like to figure out whether this is stalled or might proceed.

…/SPARK-6435-2

Conflicts:
	bin/spark-class2.cmd
@tsudukim
Copy link
Contributor Author

I was checking about the SparkLauncherSuite on Windows as vanzin's comment, and faced some trouble. It seems not to related with this PR, but I'm not sure yet. Please give me more time a little. When I resolve the problem, I'll rebase this PR.

@tsudukim
Copy link
Contributor Author

The problem I mentioned was that the spark-shell.cmd which is called by SparkLauncherSuite somehow failed to launch test application.
It turned out to be caused by the limitation of Windows batch that the one command line must be shorter than 8192 characters. (The fullpath for classpath was long because I worked at a deep folder.)

So I assume all issue is now cleared up. Sorry for my late response.

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #719 has finished for PR 5227 at commit ac55787.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor

vanzin commented Apr 27, 2015

bq. limitation of Windows batch that the one command line must be shorter than 8192 characters

This may be fixable is it's something we really care about, but not related to this change.

Given that the test passes, LGTM (based on my previous comments - didn't look at the changes again).

@SparkQA
Copy link

SparkQA commented Apr 27, 2015

Test build #31039 has started for PR 5227 at commit ac55787.

@asfgit asfgit closed this in 268c419 Apr 28, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 14, 2015
…path

Modified to accept double-quotated args properly in spark-shell.cmd.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes apache#5227 from tsudukim/feature/SPARK-6435-2 and squashes the following commits:

ac55787 [Masayoshi TSUZUKI] removed unnecessary argument.
60789a7 [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6435-2
1fee420 [Masayoshi TSUZUKI] fixed test code for escaping '='.
0d4dc41 [Masayoshi TSUZUKI] - escaped comman and semicolon in CommandBuilderUtils.java - added random string to the temporary filename - double-quotation followed by `cmd /c` did not worked properly - no need to escape `=` by `^` - if double-quoted string ended with `\` like classpath, the last `\` is parsed as the escape charactor and the closing `"` didn't work properly
2a332e5 [Masayoshi TSUZUKI] Merge branch 'master' into feature/SPARK-6435-2
04f4291 [Masayoshi TSUZUKI] [SPARK-6435] spark-shell --jars option does not add all jars to classpath
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…path

Modified to accept double-quotated args properly in spark-shell.cmd.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes apache#5227 from tsudukim/feature/SPARK-6435-2 and squashes the following commits:

ac55787 [Masayoshi TSUZUKI] removed unnecessary argument.
60789a7 [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6435-2
1fee420 [Masayoshi TSUZUKI] fixed test code for escaping '='.
0d4dc41 [Masayoshi TSUZUKI] - escaped comman and semicolon in CommandBuilderUtils.java - added random string to the temporary filename - double-quotation followed by `cmd /c` did not worked properly - no need to escape `=` by `^` - if double-quoted string ended with `\` like classpath, the last `\` is parsed as the escape charactor and the closing `"` didn't work properly
2a332e5 [Masayoshi TSUZUKI] Merge branch 'master' into feature/SPARK-6435-2
04f4291 [Masayoshi TSUZUKI] [SPARK-6435] spark-shell --jars option does not add all jars to classpath
@LK-Tmac1
Copy link

The same issue happens to the Mac OS. If multiple jar paths were specified, only the first one would be visible. Don't know if this issue was created and resolved or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants