Skip to content

Commit

Permalink
[SPARK-42157][CORE] spark.scheduler.mode=FAIR should provide FAIR s…
Browse files Browse the repository at this point in the history
…cheduler

### What changes were proposed in this pull request?

Like our documentation, `spark.sheduler.mode=FAIR` should provide a `FAIR Scheduling Within an Application`.

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

![Screenshot 2023-01-22 at 2 59 22 PM](https://user-images.githubusercontent.com/9700541/213944956-931e3a3c-d094-4455-8990-233c7966194b.png)

This bug is hidden in our CI because we have `fairscheduler.xml` always as one of test resources.
- https://github.com/apache/spark/blob/master/core/src/test/resources/fairscheduler.xml

### Why are the changes needed?

Currently, when `spark.scheduler.mode=FAIR` is given without scheduler allocation file, Spark creates `Fair Scheduler Pools` with `FIFO` scheduler which is wrong. We need to switch the mode to `FAIR` from `FIFO`.

**BEFORE**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:47:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/01/22 14:47:38 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 38 PM](https://user-images.githubusercontent.com/9700541/213944555-6e367a33-ca58-4daf-9ba4-b0319fbe4516.png)

**AFTER**
```
$ bin/spark-shell -c spark.scheduler.mode=FAIR
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/22 14:48:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
```
![Screenshot 2023-01-22 at 2 50 14 PM](https://user-images.githubusercontent.com/9700541/213944551-660aa298-638b-450c-ad61-db9e42a624b0.png)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug fix to match with Apache Spark official documentation.

### How was this patch tested?

Pass the CIs.

Closes #39703 from dongjoon-hyun/SPARK-42157.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4d51bfa)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
dongjoon-hyun committed Jan 24, 2023
1 parent b7ababf commit 41e6875
Showing 1 changed file with 7 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import org.apache.hadoop.fs.Path

import org.apache.spark.SparkContext
import org.apache.spark.internal.Logging
import org.apache.spark.internal.config.SCHEDULER_ALLOCATION_FILE
import org.apache.spark.internal.config.{SCHEDULER_ALLOCATION_FILE, SCHEDULER_MODE}
import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
import org.apache.spark.util.Utils

Expand Down Expand Up @@ -86,9 +86,12 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, sc: SparkContext
logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE")
Some((is, DEFAULT_SCHEDULER_FILE))
} else {
logWarning("Fair Scheduler configuration file not found so jobs will be scheduled in " +
s"FIFO order. To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE or " +
s"set ${SCHEDULER_ALLOCATION_FILE.key} to a file that contains the configuration.")
val schedulingMode = SchedulingMode.withName(sc.conf.get(SCHEDULER_MODE))
rootPool.addSchedulable(new Pool(
DEFAULT_POOL_NAME, schedulingMode, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
logInfo("Fair scheduler configuration not found, created default pool: " +
"%s, schedulingMode: %s, minShare: %d, weight: %d".format(
DEFAULT_POOL_NAME, schedulingMode, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
None
}
}
Expand Down

0 comments on commit 41e6875

Please sign in to comment.