-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path #5447
Conversation
…at has space in its path escape spaces in the arguments.
Fix the class path handling logic for the case that the path includes white space(%20)
This is the new version of #5347 which is accidently merged and closed by mistake. |
Test build #30012 has finished for PR 5447 at commit
|
*/ | ||
def formatWindowsPath(path: String): String = path.replace("\\", "/") | ||
def formatPath(path: String, windows: Boolean): String = { | ||
val formatted = path.replace(" ", "%20") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK for now and solves the immediate problem. I wonder if we will need to do more complete URI escaping later -- although, it's not clear what URI scheme we assume anyway. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, how about:
scala> new URI(null, "/a b c", null)
res6: java.net.URI = /a%20b%20c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When path contains #
, it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new java.net.URI(null, "/a#b", null).toString
res1: String = /a%23b
... is technically correct, if this is what you mean. #
denotes the start of the fragment portion of a URI, so when it occurs in the path, it must be escaped. The URIs we create here will not have a fragment. So I think @vanzin's change sounds more robust to similar cases like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we should assume path
never contains fragment?
If we execute bin/spark-shell --jars hdfs:/path/to/jar1.jar
, URI-style string is passed to path
.
So I thought path
may have fragment. (though I wonder if there are any good working example of the path with fragment...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran I think the issue is that we don't start with a File
but with a String
that may not be a file:
URI. @tsudukim fragment is the bit after #
in a URI, such as in file:/foo/bar#baz
There's no fragment in your example and I would never expect to support such a thing. In fact I was a little surprised that this constructor handled the scheme in a scheme-specific part, but it seems to:
scala> val uri = new java.net.URI(null, "hdfs:/foo/b ar#baz", null)
uri: java.net.URI = hdfs:/foo/b%20ar%23baz
scala> uri.toString
res12: String = hdfs:/foo/b%20ar%23baz
scala> uri.getScheme
res13: String = hdfs
scala> uri.getPath
res14: String = /foo/b ar#baz
scala> uri.getFragment
res15: String = null
This still seems like the way to go as far as I can tell.
This may seem a silly question, but why not just use |
That works for me. If doing that, though, should make sure that all the call sites can handle URIs with a scheme, since (Or maybe, @steveloughran , you're suggesting the change in a more specific spot than |
Trying to make sense of @steveloughran's suggestion, maybe rewrite
Then you don't even need that "format path" function at all. |
@tsudukim what do you think about these further comments? I'd like to wrap this up since it's a good fix and I know we've been talking about it for a while now. |
Oh, thank you for all of your comments. Fixing it to use |
Test build #30484 has finished for PR 5447 at commit
|
@@ -1801,7 +1793,7 @@ private[spark] object Utils extends Logging { | |||
Array.empty | |||
} else { | |||
paths.split(",").filter { p => | |||
val formattedPath = if (windows) formatWindowsPath(p) else p | |||
val formattedPath = formatPath(p, windows) | |||
val uri = new URI(formattedPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use resolveURI
here; then you shouldn't need formatPath
at all.
Seems like the tests don't like the "fake Windows" code path with your changes. |
I tested only on Windows, but I noticed I get different results on Linux.
But on Linux:
So I think the tests for Windows path should be run only on Windows. |
- run tests for Windows path only on Windows
Test build #30740 has finished for PR 5447 at commit
|
@tsudukim but that Windows path is being interpreted correctly on Linux -- it is just that this string does not denote the same logical path at all! |
assertResolves("C:\\path\\to\\file.txt", "file:/C:/path/to/file.txt", testWindows = true) | ||
assertResolves("spark.jar#app.jar", s"file:$cwd/spark.jar%23app.jar") | ||
assertResolves("path to/file.txt", s"file:$cwd/path%20to/file.txt") | ||
if (SystemUtils.IS_OS_WINDOWS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Utils.isWindows
instead of introducing a third party lib
The problem is that the result is different on Windows and Linux even if input path strings are exactly the same. We can't use the same test code. |
@tsudukim that's true, and that makes this hard to test, since we don't run on Windows machines. Your code manages to detect whether the test is running on Windows and change accordingly, so that much is good. The risk is just that a change breaks the Windows tests and goes unnoticed for a while. This may be a risk we have to take. Or, you can simply not include the Windows tests since this change is also primarily about Linux now. |
So, the previous code tried to run this "fake windows" mode so that it could test the Windows-specific escaping functions when running tests on Linux, since (almost) nobody runs these tests on Windows. Since now we're not using the custom escaping and instead relying on the JDK library, IMO it's ok to remove these tests and replace them with more generic ones that work on both Windows and Linux. |
@@ -82,7 +82,7 @@ object PythonRunner { | |||
s"spark-submit is currently only supported for local files: $path") | |||
} | |||
val windows = Utils.isWindows || testWindows | |||
var formattedPath = if (windows) Utils.formatWindowsPath(path) else path | |||
var formattedPath = Utils.formatPath(path, windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only use of formatPath
leftover and it doesn't feel necessary. Instead, how about:
val formattedPath = Try(new URI(formattedPath).getPath()).getOrElse(formattedPath)
Then you can remove Utils.formatPath
. Maybe even Utils.windowsDrive
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I'll try to remove them.
@tsudukim What do you think about Marcelo's last comment? |
I'm not sure what the status of this is @tsudukim -- do you want to finish it off or should someone else? I just want to resolve this one way or the other since it's been open a long time. Great work. |
replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala.
I'm so sorry to leave it for a very long time. I modified it as your comments. |
Test build #32227 has finished for PR 5447 at commit
|
Test build #32360 has finished for PR 5447 at commit
|
Looking strong, thank you. If there are no more comments, I'll merge. |
@@ -18,6 +18,8 @@ | |||
package org.apache.spark.deploy | |||
|
|||
import java.net.URI | |||
import java.io.File |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ordering; scala import should be grouped with the other scala imports.
Just a few formatting nits, but otherwise LGTM. I think things are a little weird on Windows because if you write down a path like |
@vanzin Thank you for your comments. |
Test build #32513 has finished for PR 5447 at commit
|
…at has space in its path escape spaces in the arguments. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits: 3f9a188 [Masayoshi TSUZUKI] modified some errors. ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors. 1784239 [Masayoshi TSUZUKI] removed Utils.formatPath. e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala. 84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows 016128d [Masayoshi TSUZUKI] fixed to use File.toURI() 2c62e3b [Masayoshi TSUZUKI] Merge pull request #1 from sarutak/SPARK-6568-2 7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2 45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2 10f1c73 [Kousuke Saruta] Added a comment 93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2 649da82 [Kousuke Saruta] Fix classpath handling c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path (cherry picked from commit 50c7270) Signed-off-by: Sean Owen <sowen@cloudera.com>
…at has space in its path escape spaces in the arguments. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits: 3f9a188 [Masayoshi TSUZUKI] modified some errors. ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors. 1784239 [Masayoshi TSUZUKI] removed Utils.formatPath. e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala. 84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows 016128d [Masayoshi TSUZUKI] fixed to use File.toURI() 2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2 7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2 45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2 10f1c73 [Kousuke Saruta] Added a comment 93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2 649da82 [Kousuke Saruta] Fix classpath handling c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
…at has space in its path escape spaces in the arguments. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits: 3f9a188 [Masayoshi TSUZUKI] modified some errors. ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors. 1784239 [Masayoshi TSUZUKI] removed Utils.formatPath. e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala. 84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows 016128d [Masayoshi TSUZUKI] fixed to use File.toURI() 2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2 7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2 45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2 10f1c73 [Kousuke Saruta] Added a comment 93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2 649da82 [Kousuke Saruta] Fix classpath handling c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
…at has space in its path escape spaces in the arguments. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#5447 from tsudukim/feature/SPARK-6568-2 and squashes the following commits: 3f9a188 [Masayoshi TSUZUKI] modified some errors. ed46047 [Masayoshi TSUZUKI] avoid scalastyle errors. 1784239 [Masayoshi TSUZUKI] removed Utils.formatPath. e03f289 [Masayoshi TSUZUKI] removed testWindows from Utils.resolveURI and Utils.resolveURIs. replaced SystemUtils.IS_OS_WINDOWS to Utils.isWindows. removed Utils.formatPath from PythonRunner.scala. 84c33d0 [Masayoshi TSUZUKI] - use resolveURI in nonLocalPaths - run tests for Windows path only on Windows 016128d [Masayoshi TSUZUKI] fixed to use File.toURI() 2c62e3b [Masayoshi TSUZUKI] Merge pull request apache#1 from sarutak/SPARK-6568-2 7019a8a [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2 45946ee [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-6568-2 10f1c73 [Kousuke Saruta] Added a comment 93c3c40 [Kousuke Saruta] Merge branch 'classpath-handling-fix' of github.com:sarutak/spark into SPARK-6568-2 649da82 [Kousuke Saruta] Fix classpath handling c7ba6a7 [Masayoshi TSUZUKI] [SPARK-6568] spark-shell.cmd --jars option does not accept the jar that has space in its path
I am facing following problem with C:\spark-1.4.1-bin-hadoop2.6. Exception-Exception Traceback (most recent call last) C:\spark-1.4.1-bin-hadoop2.6\python/pyspark/shell.py in () C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\context.pyc in init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls) C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\context.pyc in _ensure_initialized(cls, instance, gateway) C:\spark-1.4.1-bin-hadoop2.6/python\pyspark\java_gateway.pyc in launch_gateway() Exception: Java gateway process exited before sending the driver its port numberI got this problem while runnning following program in ipython notebook. Program-import os spark_home = os.environ.get('SPARK_HOME', None) Add the py4j to the path.You may need to change the version number to match your installsys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip')) Initialize PySpark to predefine the SparkContext variable 'sc'execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))Can somebody suggest some solution to this problem. |
@jmdvinodjmd : that's something to raise on the spark user list and then, if you don't get help, as a JIRA on issues.apache.org; these pull requests are for reviewing patches, and once into the code, don't get looked at again. |
escape spaces in the arguments.