-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11327] [MESOS] Dispatcher does not respect all args from the Submit request #10370
Conversation
ok to test |
At first I was wondering if we just copy all scheduler properties will it clash with the ones we used above (i.e --total-executor-cores), but looks like the command line flags takes precedence in SparkSubmit so we should be ok. How have you tested this? |
Test build #47989 has finished for PR 10370 at commit
|
@tnachen I've tested that patch against branch-1.5 and I'm now able to specify options such as |
@jayv Looks like you need to fix the line length > 100 style rule. |
@@ -440,6 +444,9 @@ private[spark] class MesosClusterScheduler( | |||
.mkString(",") | |||
options ++= Seq("--py-files", formattedFiles) | |||
} | |||
desc.schedulerProperties | |||
.filter { case (key, _) => !replicatedOptionsBlacklist.contains(key) } | |||
.foreach { case (key, value) => options ++= Seq("--conf", Seq(key, "=\"", value, "\"").mkString("")) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the quotes inside --conf key="value"
valid? I know I tried --conf spark.mesos.executor.home="/var/spark"
and that failed (the actual value of spark.mesos.executor.home
contained the quotes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..one more thing: better use string interpolation instead of Seq.mkString
: s"$key=$value"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the quotes work for me.
In order to pass multiple system properties via eg spark.driver.extraJavaOptions
, I passed the arg in quotes on the shell to spark-submit, but the value itself doesn't have quotes around it and ends up broken on the commandline for the driver without wrapping, unless I add escaped quotes inside the string.
So this fails when executing the driver, but spark-submit accepts the command:
bin/spark-submit [...] --conf spark.driver.extraJavaOptions="-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true -Dorg.jboss.weld.bootstrap.concurrentDeployment=false" [...]
This doesn't look very intuitive to me, but it does work without wrapping values in quotes:
bin/spark-submit [...] --conf spark.driver.extraJavaOptions="\"-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true -Dorg.jboss.weld.bootstrap.concurrentDeployment=false\"" [...]
The same is true for --driver-java-options
as opposed to the --conf
route.
String interpolation doesn't support double quotes, that's why I initially went with Seq.mkString
but I've since learned about triple-double-quoted string interpolation which does...
thanks @jayv, I left one note, otherwise this looks good. |
Test build #48229 has finished for PR 10370 at commit
|
This PR might help with #8872 testing |
Ok, it seems you need to filter out a few more things, particularly
[1]
|
I opened jayv#2 with two more filters: |
@dragos I don't remember seeing these errors on the 1.5.x branch I tested against,but I may have missed them since I was focussing on the missing args. This looks totally reasonable to me. |
Test build #49706 has finished for PR 10370 at commit
|
The failed test is in Spark Streaming, no doubt it's a flaky one. LGTM. @andrewor14 can you please have a look? |
@@ -440,6 +446,9 @@ private[spark] class MesosClusterScheduler( | |||
.mkString(",") | |||
options ++= Seq("--py-files", formattedFiles) | |||
} | |||
desc.schedulerProperties | |||
.filter { case (key, _) => !replicatedOptionsBlacklist.contains(key) } | |||
.foreach { case (key, value) => options ++= Seq("--conf", s"""$key="$value"""") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are lines 437-443 now redundant? If setting --total-executor-cores is identical to --conf spark.cores.max, then we don't need to set both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this escape quotes, double quotes, whitespaces and backslashes correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, Mesos and CommandInfo
use /bin/sh
to launch the command. 😕
Spaces should be ok, everything else won't be correctly escaped. Skimming through Spark properties I think the only ones that could pose problems are spark.authenticate.secret
and the other passwords (SSL, etc.). Still, this needs a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mesos should also handle spaces if you provide the arguments as args in the CommandInfo even with spaces and other characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess Tim refers to this line, so we'd need to set shell
to false
. That's probably the simplest way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, I can check that, I wrote a bunch of regexes and tests today to solve escaping for most Unix shells. Was about to push a PR early morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Mesos way is more robust and requires less code. Hopefully :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meat of it is only 5 lines :) but let's see if we can make the Mesos approach work without side-effects and for my use-case with args with spaces.
retest this please |
Test build #49806 has finished for PR 10370 at commit
|
@jayv do you have time to work on this, or should I? |
@dragos Today I will test the driver issue you mentioned, we also seem to have a rogue driver spinning up on workers, and work on better escaping. Isn't there code for this in spark-submit or something already? |
@jayv the closest thing I can think of is |
@dragos is the only thing that's left before this patch can be merged the escaping thing? |
Got an implementation that escapes all the things under linux shells, but had to suspend work on this, so was unable to test mesos' native escaping. I'll try to get back to it this week. |
As far as I'm concerned, that's the only thing (I'll have to test again on a real Mesos cluster). |
@jayv did you have the chance to look at this again? |
I pushed this commit with the general idea, but I didn't get to test it much. I'll come back to it tomorrow. |
@dragos Unfortunately not, I'll come back to this issue when I've done some more testing. |
My commit is not working, BTW, I'll come back to this (the executable needs to be a complete path to |
@jayv please let me know if you will look into this one. If you're busy, I'm happy to take over, and will happily start with your escaping algorithm ;-) |
Sorry for the delay @dragos very busy indeed, maybe this can help jayv@8023da3 |
@jayv are you planning to update this PR with that commit? |
I can pick this up tomorrow. |
It works, but I found a couple of corner cases. I guess you need to escape |
Nice, will you pick this up @dragos or waiting for me? - can't commit any time this week |
Yeap, I just opened a PR on your repo. If you merge it it should be reflected here. |
Test build #53977 has finished for PR 10370 at commit
|
@jayv can you please rebase? @andrewor14 the last issue (escaping characters for the shell command) has been fixed. Please take a look. |
- the launched driver should be in client mode, but at submission time this setting is set to ‘cluster’. - the `spark.master` setting is set to the dispatcher address (usually on port 7077). Passing it has no effect, since we also pass `—-master` which takes precedence. But this is confusing so we should filter it out.
(cherry picked from commit 8023da3)
11b070d
to
82cd03d
Compare
@dragos rebased! |
Test build #54541 has finished for PR 10370 at commit
|
retest this please |
Latest changes look pretty reasonable. Funny how mesos cluster mode is basically unusable before. :) |
Test build #54556 has finished for PR 10370 at commit
|
LGTM! @andrewor14 please have a look |
Merged into master. Thanks for your patience and hard work everyone. |
Ah thanks, I forgot I merged that. |
Backport for apache#10370 andrewor14 Author: Jo Voordeckers <jo.voordeckers@gmail.com> Closes apache#12101 from jayv/mesos_cluster_params_backport. (cherry picked from commit 91530b0)
Supersedes #9752