SPARK-1706: Allow multiple executors per worker in Standalone mode #731

CodingCat · 2014-05-11T15:02:34Z

resubmit of #636 for a totally different algorithm

https://issues.apache.org/jira/browse/SPARK-1706

In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes...

In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process.

user configure spark.executor.maxCoreNumPerExecutor to suggest the maximum core he/she would like to allocate to each executor
Master assigns the executors to the workers with the major consideration on the memoryPerExecutor and the worker.freeMemory, and tries to allocate as many as possible cores to the executor min(min(memoryPerExecutor, worker.freeCore), maxLeftCoreToAssign) where maxLeftCoreToAssign = maxExecutorCanAssign * maxCoreNumPerExecutor

Other small changes include

change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as "Slave" is overrided to represent both worker and executor in the documents... (we have some discussion on this before?)

mridulm · 2014-05-11T23:42:01Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

-    worker.memoryFree >= app.desc.memoryPerSlave && !worker.hasExecutor(app)
+  private def canUse(app: ApplicationInfo, worker: WorkerInfo): Boolean = {
+    worker.memoryFree >= app.desc.memoryPerExecutor && !worker.hasExecutor(app) &&
+    worker.coresFree > 0


I am not sure about this, but does the above mean that an application can be scheduled only once to a worker at a given point of time ?
So even if there are multiple cores, different partitions cant be executed in parallel for an app on that worker ?

yes,

but this function is only called when we want to schedule a single executor to a certain worker

So what happens if the worker is already running one executor for the app - we cant schedule another executor on that worker until previous one is done ? (in this or subsequent schedule attempts)

I think so....and we can only assign the executor to the same worker in the subsequent schedule() calls...

and this logic has been here for a long while (at least since 0.8.x), the scheduling mode proposed in this PR is just to relax this constraint

Earlier, since single executor, it meant something else.
Now, there is a difference... I don't think this is what we would want.
Though, I would defer to others on this .. @matrix any thoughts ?
On 12-May-2014 5:32 am, "Nan Zhu" notifications@github.com wrote:

In core/src/main/scala/org/apache/spark/deploy/master/Master.scala:

@@ -466,30 +466,14 @@ private[spark] class Master(
* launched an executor for the app on it (right now the standalone backend doesn't like having
* two executors on the same worker).
*/

def canUse(app: ApplicationInfo, worker: WorkerInfo): Boolean = {

worker.memoryFree >= app.desc.memoryPerSlave && !worker.hasExecutor(app)

private def canUse(app: ApplicationInfo, worker: WorkerInfo): Boolean = {

worker.memoryFree >= app.desc.memoryPerExecutor && !worker.hasExecutor(app) &&

worker.coresFree > 0

I think so....and we can only assign the executor to the same worker in
the subsequent schedule() calls...

and this logic has been here for a long while (at least since 0.8.x), the
scheduling mode proposed in this PR is just to relax this constraint

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/731/files#r12511647
.

maybe i think it has another environment.it schedule a single executor to a worker for every application.memory depends on the cores of assigning to worker,not the config 'spark.executor.memory'. because for application master assign different cores to executor,but these executors have the same memory.

CodingCat · 2014-05-13T11:42:14Z

@mridulm thanks for the comments, I addressed them and redefined the confusing maxCoreLeft variable,

aarondav · 2014-06-16T01:29:11Z

core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala

@@ -20,7 +20,7 @@ package org.apache.spark.deploy
 private[spark] class ApplicationDescription(
    val name: String,
    val maxCores: Option[Int],
-    val memoryPerSlave: Int,
+    val memoryPerExecutor: Int, // in Mb


Maybe just call this memoryPerExecutorMB.

good point, fixed

CodingCat · 2014-07-15T21:26:04Z

ping....

SparkQA · 2014-07-15T21:28:02Z

QA tests have started for PR 731. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16693/consoleFull

SparkQA · 2014-07-15T23:06:51Z

QA results for PR 731:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16693/consoleFull

nishkamravi2 · 2014-07-18T23:15:27Z

Would it be possible to reuse existing parameters: spark.executor.instances and spark.executor.cores instead of introducing new ones?

CodingCat · 2014-07-19T03:34:53Z

@nishkamravi2 en....it's OK to reuse the parameters, I'm just not sure which option is more convenient for the user.....

CodingCat · 2014-07-22T14:20:16Z

ping

nishkamravi2 · 2014-07-24T01:04:08Z

I think it would be better to reuse the same parameters to minimize discrepancy across different scheduling modes at the interface level. Also, once this PR gets merged, do we have a compelling use case for starting multiple workers per node or can we retire params like SPARK_WORKER_INSTANCES?

Minor comment:
// allow user to run multiple executors in the same worker
// (within the same worker JVM process)

could be modified to:

//allow user to run multiple executor processes on a worker node (managed by a single worker daemon/JVM)

Have this PR been tested beyond automated unit tests?

CodingCat · 2014-07-24T01:46:29Z

you mean start multiple executors per worker?

lianhuiwang · 2014-07-24T02:25:51Z

if this PR gets merged, i think we can retire params like SPARK_WORKER_INSTANCES.this PR also can allowe user to run multiple executors on a node. so multiple workers on a node is unnecessary.

shivaram · 2014-07-24T04:32:35Z

I haven't looked at the code change, but it might not be a good idea to remove support for multiple workers in a machine. When you have really large memory machines it is sometimes better to run multiple JVMs with smaller heaps rather than one big JVM. This is mostly to avoid GC pauses which grow with larger heap sizes.

nishkamravi2 · 2014-07-24T05:33:30Z

Correct me if I'm wrong, but I think this PR intends to facilitate that by allowing multiple executor JVM's to run on the same node (per worker) as opposed to launching multiple worker daemons and one executor per worker. Multiple worker daemons can be considered redundant.

CodingCat · 2014-07-24T13:14:09Z

@nishkamravi2 I got your point now... yes, this patch is to enable the user to run multiple executors with a single worker instead of running multiple workers

as @shivaram said, this is mainly for a better way of utilizing the memory space

from my opinion, we should still keep supporting the multiple worker per server mechanism...before 1.0, when I used Spark, one of the big headache for me is that the worker can die due to the overloaded executor.....causing other executors die together....from 1.0, it seems to be much better, (maybe because we kill the executor process in its own thread now, but I'm not sure)....multiple workers can prevent this case

SparkQA · 2014-08-15T01:45:05Z

QA tests have started for PR 731. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18587/consoleFull

SparkQA · 2014-08-15T03:18:33Z

QA results for PR 731:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18587/consoleFull

nchammas · 2015-01-07T23:05:49Z

@CodingCat Do we need to ping anyone specific to look at this PR? It's been many months since the last update.

CodingCat · 2015-01-08T07:02:20Z

I will rebase this one and send email to Kay....

CodingCat · 2015-01-08T19:17:45Z

@nchammas , sorry, thought it was SPARK-1143..I think it is supposed to be reviewed by @andrewor14 @pwendell and @mridulm ???

SparkQA · 2015-01-08T20:09:47Z

Test build #25251 has finished for PR 731 at commit b4a8a68.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-08T22:05:40Z

Test build #25266 has finished for PR 731 at commit 782dfcb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-09T13:37:40Z

Test build #25323 has finished for PR 731 at commit 2608d11.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-04-10T02:46:07Z

Test build #29994 has finished for PR 731 at commit b8ca561.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):
This patch does not change any dependencies.

CodingCat · 2015-04-10T07:09:37Z

@andrewor14 , I just found that, if we configure exact core number per executor, the current strategy will be incorrect (or say a bit user-unfriendly)

e.g. I have 8 cores, 2 cores per machine, an application would like to use all of them; in spread mode, we will get an array assigned as Array(2, 2, 2, 2), if we set --executor-cores as 3, then the application will get 0 core, as we have no allocation which is no less than 3...

we can say 3 here is not a smart choice, but with this case, the user has to understand how spread & standalone works to choose a better number...

shall we go forward with the configuration of the exact number but change the allocation algorithm (bring a much larger patch, like before), or we go back to the max core number configuration?

SparkQA · 2015-04-10T08:00:55Z

Test build #30016 has finished for PR 731 at commit 940cb42.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

CodingCat · 2015-04-12T01:02:14Z

IGNORE THIS

after rethinking about this patch, I think it seems to be fine to allocate zero cores in the case I mentioned above, and we just need to filter out those workers if the freeCore in the worker is less than the spark.executor.cores (if it is defined)

CodingCat · 2015-04-12T01:25:54Z

Ignore my last comment, I still hold the position that,

if we want to define the exact number of the cores per executor, we need to change the allocation algorithm (bringing a larger patch)
otherwise, we go back to define the maxNumberOfCores Per Executor

SparkQA · 2015-04-12T01:35:31Z

Test build #30098 has finished for PR 731 at commit 940cb42.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-12T02:41:42Z

Test build #30097 has finished for PR 731 at commit da102d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

andrewor14 · 2015-04-13T23:10:56Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

-   * Can an app use the given worker? True if the worker has enough memory and we haven't already
-   * launched an executor for the app on it (right now the standalone backend doesn't like having
-   * two executors on the same worker).
+   * Schedule executors to be launched on the workers.There are two modes of launching executors.


Can you break "There are two modes of..." into a new paragraph?

andrewor14 · 2015-04-13T23:25:16Z

@CodingCat We shouldn't have to worry about the case when the user asks for more resources per executors than are available on each worker. If each worker only has 2 cores the user shouldn't ask for 3 per executor. This holds regardless of whether spread out mode is used, since an executor cannot be "split" across machines. The existing approach is fine.

andrewor14 · 2015-04-14T20:31:51Z

@CodingCat I'm merging this into master. Thanks for keeping this patch open for a long time and patiently reiterating on the reviews. I think the final solution we have here is much simpler than the one we began with.

CodingCat · 2015-04-14T20:35:27Z

Hey, @andrewor14 , my pleasure, many thanks for your patient review

SparkQA · 2015-04-14T21:37:54Z

Test build #30274 has finished for PR 731 at commit 6dee808.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in #1106 which was erased due to the merging of #731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803. (cherry picked from commit bd5365b) Signed-off-by: Sean Owen <sowen@cloudera.com>

## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in #1106 which was erased due to the merging of #731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803.

## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in #1106 which was erased due to the merging of #731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803. (cherry picked from commit bd5365b) Signed-off-by: Sean Owen <sowen@cloudera.com>

## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in apache#1106 which was erased due to the merging of apache#731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes apache#11702 from CodingCat/SPARK-13803.

mridulm reviewed May 11, 2014
View reviewed changes

aarondav reviewed Jun 16, 2014
View reviewed changes

CodingCat force-pushed the SPARK-1706-2 branch from 61af2e4 to b4a8a68 Compare January 8, 2015 19:16

CodingCat force-pushed the SPARK-1706-2 branch from b4a8a68 to 782dfcb Compare January 8, 2015 21:07

CodingCat force-pushed the SPARK-1706-2 branch from 782dfcb to 2608d11 Compare January 9, 2015 12:26

CodingCat force-pushed the SPARK-1706-2 branch from 2608d11 to d2f413a Compare January 20, 2015 23:38

avoid unnecessary allocation

940cb42

CodingCat force-pushed the SPARK-1706-2 branch from da102d4 to 940cb42 Compare April 12, 2015 01:27

andrewor14 reviewed Apr 13, 2015
View reviewed changes

andrewor14 mentioned this pull request Apr 14, 2015

[SPARK-6350][Mesos] Make mesosExecutorCores configurable in mesos "fine-grained" mode #5063

Closed

CodingCat added 2 commits April 14, 2015 16:07

address the comments

fbeb7e5

change filter predicate

6dee808

asfgit closed this in 8f8dc45 Apr 14, 2015

CodingCat mentioned this pull request Mar 14, 2016

[SPARK-13803] restore the changes in SPARK-3411 #11702

Closed

SPARK-1706: Allow multiple executors per worker in Standalone mode #731

SPARK-1706: Allow multiple executors per worker in Standalone mode #731

Conversation

CodingCat commented May 11, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodingCat commented May 13, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodingCat commented Jul 15, 2014

SparkQA commented Jul 15, 2014

SparkQA commented Jul 15, 2014

nishkamravi2 commented Jul 18, 2014

CodingCat commented Jul 19, 2014

CodingCat commented Jul 22, 2014

nishkamravi2 commented Jul 24, 2014

CodingCat commented Jul 24, 2014

lianhuiwang commented Jul 24, 2014

shivaram commented Jul 24, 2014

nishkamravi2 commented Jul 24, 2014

CodingCat commented Jul 24, 2014

SparkQA commented Aug 15, 2014

SparkQA commented Aug 15, 2014

nchammas commented Jan 7, 2015

CodingCat commented Jan 8, 2015

CodingCat commented Jan 8, 2015

SparkQA commented Jan 8, 2015

SparkQA commented Jan 8, 2015

SparkQA commented Jan 9, 2015

SparkQA commented Apr 10, 2015

CodingCat commented Apr 10, 2015

SparkQA commented Apr 10, 2015

CodingCat commented Apr 12, 2015

IGNORE THIS

CodingCat commented Apr 12, 2015

SparkQA commented Apr 12, 2015

SparkQA commented Apr 12, 2015

Choose a reason for hiding this comment

andrewor14 commented Apr 13, 2015

andrewor14 commented Apr 14, 2015

CodingCat commented Apr 14, 2015

SparkQA commented Apr 14, 2015