Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path #5447

Closed
wants to merge 13 commits into from

Conversation

tsudukim
Copy link
Contributor

escape spaces in the arguments.

@tsudukim
Copy link
Contributor Author

This is the new version of #5347 which is accidently merged and closed by mistake.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30012 has finished for PR 5447 at commit 2c62e3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

*/
def formatWindowsPath(path: String): String = path.replace("\\", "/")
def formatPath(path: String, windows: Boolean): String = {
val formatted = path.replace(" ", "%20")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK for now and solves the immediate problem. I wonder if we will need to do more complete URI escaping later -- although, it's not clear what URI scheme we assume anyway. LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, how about:

scala> new URI(null, "/a b c", null)
res6: java.net.URI = /a%20b%20c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When path contains #, it doesn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new java.net.URI(null, "/a#b", null).toString
res1: String = /a%23b

... is technically correct, if this is what you mean. # denotes the start of the fragment portion of a URI, so when it occurs in the path, it must be escaped. The URIs we create here will not have a fragment. So I think @vanzin's change sounds more robust to similar cases like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should assume path never contains fragment?
If we execute bin/spark-shell --jars hdfs:/path/to/jar1.jar, URI-style string is passed to path.
So I thought path may have fragment. (though I wonder if there are any good working example of the path with fragment...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran I think the issue is that we don't start with a File but with a String that may not be a file: URI. @tsudukim fragment is the bit after # in a URI, such as in file:/foo/bar#baz There's no fragment in your example and I would never expect to support such a thing. In fact I was a little surprised that this constructor handled the scheme in a scheme-specific part, but it seems to:

scala> val uri = new java.net.URI(null, "hdfs:/foo/b ar#baz", null)
uri: java.net.URI = hdfs:/foo/b%20ar%23baz

scala> uri.toString
res12: String = hdfs:/foo/b%20ar%23baz

scala> uri.getScheme
res13: String = hdfs

scala> uri.getPath
res14: String = /foo/b ar#baz

scala> uri.getFragment
res15: String = null

This still seems like the way to go as far as I can tell.

@steveloughran
Copy link
Contributor

This may seem a silly question, but why not just use File.toURI? It does handle windows paths robustly

@vanzin
Copy link
Contributor

vanzin commented Apr 13, 2015

but why not just use File.toURI

That works for me. If doing that, though, should make sure that all the call sites can handle URIs with a scheme, since File.toURI will return file:/blah.

(Or maybe, @steveloughran , you're suggesting the change in a more specific spot than formatPath? It's not clear since you didn't comment on a specific part of the code.)

@vanzin
Copy link
Contributor

vanzin commented Apr 14, 2015

Trying to make sense of @steveloughran's suggestion, maybe rewrite resolveURI like this:

def resolveURI(path: String, testWindows: Boolean = false): URI = {
  try {
    val uri = new URI(path)
    if (uri.getScheme() != null) {
      uri
    } else {
      new URI("file", uri.getSchemeSpecificPart(), uri.getFragment())
    }
  } catch {
    case e: URISyntaxException =>
      new File(path).toURI()
  }
}

Then you don't even need that "format path" function at all.

@srowen
Copy link
Member

srowen commented Apr 17, 2015

@tsudukim what do you think about these further comments? I'd like to wrap this up since it's a good fix and I know we've been talking about it for a while now.

@tsudukim
Copy link
Contributor Author

Oh, thank you for all of your comments. Fixing it to use File.toURI seems to be better.
I've fixed this PR.

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30484 has finished for PR 5447 at commit 016128d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@@ -1801,7 +1793,7 @@ private[spark] object Utils extends Logging {
Array.empty
} else {
paths.split(",").filter { p =>
val formattedPath = if (windows) formatWindowsPath(p) else p
val formattedPath = formatPath(p, windows)
val uri = new URI(formattedPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use resolveURI here; then you shouldn't need formatPath at all.

@vanzin
Copy link
Contributor

vanzin commented Apr 17, 2015

Seems like the tests don't like the "fake Windows" code path with your changes.

@tsudukim
Copy link
Contributor Author

I tested only on Windows, but I noticed I get different results on Linux.
This is because...
On Windows:

scala> new File("C:\\path\\to\\file.txt").toURI
res0: java.net.URI = file:/C:/path/to/file.txt

But on Linux:

scala> new File("C:\\path\\to\\file.txt").toURI
res0: java.net.URI = file:/home/tsudukim/.../C:%5Cpath%5Cto%5Cfile.txt

So I think the tests for Windows path should be run only on Windows.

- run tests for Windows path only on Windows
@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30740 has finished for PR 5447 at commit 84c33d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@srowen
Copy link
Member

srowen commented Apr 22, 2015

@tsudukim but that Windows path is being interpreted correctly on Linux -- it is just that this string does not denote the same logical path at all!

assertResolves("C:\\path\\to\\file.txt", "file:/C:/path/to/file.txt", testWindows = true)
assertResolves("spark.jar#app.jar", s"file:$cwd/spark.jar%23app.jar")
assertResolves("path to/file.txt", s"file:$cwd/path%20to/file.txt")
if (SystemUtils.IS_OS_WINDOWS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Utils.isWindows instead of introducing a third party lib

@tsudukim
Copy link
Contributor Author

The problem is that the result is different on Windows and Linux even if input path strings are exactly the same. We can't use the same test code.

@srowen
Copy link
Member

srowen commented Apr 23, 2015

@tsudukim that's true, and that makes this hard to test, since we don't run on Windows machines. Your code manages to detect whether the test is running on Windows and change accordingly, so that much is good. The risk is just that a change breaks the Windows tests and goes unnoticed for a while. This may be a risk we have to take. Or, you can simply not include the Windows tests since this change is also primarily about Linux now.

@vanzin
Copy link
Contributor

vanzin commented Apr 23, 2015

So, the previous code tried to run this "fake windows" mode so that it could test the Windows-specific escaping functions when running tests on Linux, since (almost) nobody runs these tests on Windows.

Since now we're not using the custom escaping and instead relying on the JDK library, IMO it's ok to remove these tests and replace them with more generic ones that work on both Windows and Linux.

@@ -82,7 +82,7 @@ object PythonRunner {
s"spark-submit is currently only supported for local files: $path")
}
val windows = Utils.isWindows || testWindows
var formattedPath = if (windows) Utils.formatWindowsPath(path) else path
var formattedPath = Utils.formatPath(path, windows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only use of formatPath leftover and it doesn't feel necessary. Instead, how about:

val formattedPath = Try(new URI(formattedPath).getPath()).getOrElse(formattedPath)

Then you can remove Utils.formatPath. Maybe even Utils.windowsDrive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I'll try to remove them.

@srowen
Copy link
Member

srowen commented May 3, 2015

@tsudukim What do you think about Marcelo's last comment?
It would be cool to get this in for 1.4 and we're now in feature freeze, but could probably still get in a legitimate bug fix.

@srowen
Copy link
Member

srowen commented May 8, 2015

I'm not sure what the status of this is @tsudukim -- do you want to finish it off or should someone else? I just want to resolve this one way or the other since it's been open a long time. Great work.

tsudukim added 2 commits May 8, 2015 20:40
replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows.
removed Utils.formatPath from PythonRunner.scala.
@tsudukim
Copy link
Contributor Author

tsudukim commented May 8, 2015

I'm so sorry to leave it for a very long time. I modified it as your comments.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32227 has finished for PR 5447 at commit 1784239.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #32360 has finished for PR 5447 at commit ed46047.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 11, 2015

Looking strong, thank you. If there are no more comments, I'll merge.

@@ -18,6 +18,8 @@
package org.apache.spark.deploy

import java.net.URI
import java.io.File
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ordering; scala import should be grouped with the other scala imports.

@vanzin
Copy link
Contributor

vanzin commented May 12, 2015

Just a few formatting nits, but otherwise LGTM.

I think things are a little weird on Windows because if you write down a path like c:/foo/bar it looks like a URL with scheme c, so maybe that's why the extra windowsDrive checks are still needed.

@tsudukim
Copy link
Contributor Author

@vanzin Thank you for your comments.
About Windows path, you're right. Someone might write down like C:/foo/bar though / is not a correct path separator on Windows.
About the other comments, I fixed them.

@SparkQA
Copy link

SparkQA commented May 12, 2015

Test build #32513 has finished for PR 5447 at commit 3f9a188.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 12, 2015

Looking good. Thanks for the perseverance @tsudukim in fixing a reasonably important issue, and cleaning up code along the way, and @vanzin for detailed review. Merging soonish ...

asfgit pushed a commit that referenced this pull request May 13, 2015
…at has space in its path

escape spaces in the arguments.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits:

3f9a188 [Masayoshi TSUZUKI] modified some errors.
ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors.
1784239 [Masayoshi TSUZUKI] removed Utils.formatPath.
e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows
016128d [Masayoshi TSUZUKI] fixed to use File.toURI()
2c62e3b [Masayoshi TSUZUKI] Merge pull request #1 from sarutak/SPARK-6568-2
7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2
10f1c73 [Kousuke Saruta] Added a comment
93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2
649da82 [Kousuke Saruta] Fix classpath handling
c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path

(cherry picked from commit 50c7270)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 50c7270 May 13, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…at has space in its path

escape spaces in the arguments.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits:

3f9a188 [Masayoshi TSUZUKI] modified some errors.
ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors.
1784239 [Masayoshi TSUZUKI] removed Utils.formatPath.
e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows
016128d [Masayoshi TSUZUKI] fixed to use File.toURI()
2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2
7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2
10f1c73 [Kousuke Saruta] Added a comment
93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2
649da82 [Kousuke Saruta] Fix classpath handling
c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…at has space in its path

escape spaces in the arguments.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits:

3f9a188 [Masayoshi TSUZUKI] modified some errors.
ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors.
1784239 [Masayoshi TSUZUKI] removed Utils.formatPath.
e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows
016128d [Masayoshi TSUZUKI] fixed to use File.toURI()
2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2
7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2
10f1c73 [Kousuke Saruta] Added a comment
93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2
649da82 [Kousuke Saruta] Fix classpath handling
c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…at has space in its path

escape spaces in the arguments.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits:

3f9a188 [Masayoshi TSUZUKI] modified some errors.
ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors.
1784239 [Masayoshi TSUZUKI] removed Utils.formatPath.
e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows
016128d [Masayoshi TSUZUKI] fixed to use File.toURI()
2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2
7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2
10f1c73 [Kousuke Saruta] Added a comment
93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2
649da82 [Kousuke Saruta] Fix classpath handling
c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
@jmdvinodjmd
Copy link

I am facing following problem with C:\spark-1.4.1-bin-hadoop2.6.
Some people referred that this is fixed in 1.4 but I am still getting this error on windows 7.
Jarl Haggerty, reffered this problem in https://issues.apache.org/jira/browse/SPARK-6568.
Moreover some more people talked about this(following exception) in PR 5447.

Exception-

Exception Traceback (most recent call last)
in ()
10
11 # Initialize PySpark to predefine the SparkContext variable 'sc'
---> 12 execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

C:\spark-1.4.1-bin-hadoop2.6\python/pyspark/shell.py in ()
41 SparkContext.setSystemProperty("spark.executor.uri", os.environ["SPARK_EXECUTOR_URI"])
42
---> 43 sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
44 atexit.register(lambda: sc.stop())
45

C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\context.pyc in init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
108 """
109 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110 SparkContext._ensure_initialized(self, gateway=gateway)
111 try:
112 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\context.pyc in _ensure_initialized(cls, instance, gateway)
227 with SparkContext._lock:
228 if not SparkContext._gateway:
--> 229 SparkContext._gateway = gateway or launch_gateway()
230 SparkContext._jvm = SparkContext._gateway.jvm
231

C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\java_gateway.pyc in launch_gateway()
87 callback_socket.close()
88 if gateway_port is None:
---> 89 raise Exception("Java gateway process exited before sending the driver its port number")
90
91 # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

I got this problem while runnning following program in ipython notebook.

Program-

import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")

Add the py4j to the path.

You may need to change the version number to match your install

sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))

Initialize PySpark to predefine the SparkContext variable 'sc'

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

Can somebody suggest some solution to this problem.

@steveloughran
Copy link
Contributor

@jmdvinodjmd : that's something to raise on the spark user list and then, if you don't get help, as a JIRA on issues.apache.org; these pull requests are for reviewing patches, and once into the code, don't get looked at again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants