SparkML-DAG

Implementation of a DAG Pipeline for SparkML.

Motivation

This library extends SparkML to allow for Pipelines that are DAG based. That is multiple input datasets can be manipulated to create complex models. One such example can be seen in the test.

Development

Clone this repository and run mvn clean test

To build for a custom version of Spark/Scala, run mvn clean package \ -Dscala.major.version=<SCALA_MAJOR> \ -Dscala.minor.version=<SCALA_MINOR>\ -Dspark.version=<SPARK_VERSION>

e.g.

mvn clean package \
-Dscala.major.version=2.11 \
-Dscala.minor.version=2.11.8 \
-Dspark.version=2.3.0

build profiles

Alternatively one can build against a limited number of pre-defined profiles. See the pom for a list of the profiles.

Example build with profiles:

mvn clean package -Pspark_2.3,scala_2.11

mvn clean package -Pspark_2.0,scala_2.10

Support

Here is a handy table of supported build version combinations:

Apache Spark	Scala
2.0.x	2.10
2.0.x	2.11
2.1.x	2.10
2.1.x	2.11
2.2.x	2.10
2.2.x	2.11
2.3.x	2.11

License

see the license for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkML-DAG

Motivation

Development

build profiles

Support

License

About

Releases 1

Packages

Languages

License

machinezone/SparkML-DAG

Folders and files

Latest commit

History

Repository files navigation

SparkML-DAG

Motivation

Development

build profiles

Support

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages