Spark Scheduling Demo

This repository contains an example of starting multiple Spark jobs in parallel. This code was used for demo in a webinar which can be accessed here https://www.youtube.com/watch?v=e7qbrpvmOgY

In order to build the project you will need sbt which you cen get on https://www.scala-sbt.org/. Then you can build with

sbt compile
sbt package

In order to start the Spark application you will need to install Spark. This repo uses Spark 2.4.5, if you have an earlier version, you can change the variable sparkVersion in build.sbt.

The application contains two runnable classes: SimpleJob and ParallelJob

You can run any of it by:

spark-submit \
--master local \
<other configs> \
--class <SimpleJob or ParallelJob>  \
<path to jar> \
<path to data>

Data

Data comes from a training set for this kaggle competition https://www.kaggle.com/c/nlp-getting-started/data

Want to read more?

Very useful example I also recycled https://www.waitingforcode.com/apache-spark/fair-jobs-scheduling-apache-spark/read

Official Documentation https://spark.apache.org/docs/latest/job-scheduling.html

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
project		project
src/main/scala		src/main/scala
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
fairscheduler.xml		fairscheduler.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Scheduling Demo

Data

Want to read more?

About

Releases

Packages

Contributors 2

Languages

DataInsightsGmbH/spark-scheduling

Folders and files

Latest commit

History

Repository files navigation

Spark Scheduling Demo

Data

Want to read more?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages