-
Notifications
You must be signed in to change notification settings - Fork 1
Executing SAMOA with Apache S4
In this tutorial we will describe how to execute SAMOA on top of Apache S4.
The following dependencies are needed to run SAMOA smoothly on Apache S4
Gradle is a build automation tool and is used to build Apache S4. The installation guide can be found here. The following instructions is a simplified installation guide.
- Download Gradle binaries from downloads, or from the console type
wget http://services.gradle.org/distributions/gradle-1.6-bin.zip
- Unzip the file
unzip gradle-1.6-bin.zip
- Set the Gradle environment variable:
export GRADLE_HOME=/foo/bar/gradle-1.6
- Add to the systems path
export PATH=$PATH:$GRADLE_HOME/bin
- Install Gradle by running
gradle
Now you are all set to install Apache S4
S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. The installation process is as follows:
- Download the latest Apache S4 release from Apache S4 0.6.0 or from command line
wget http://www.apache.org/dist/incubator/s4/s4-0.6.0-incubating/apache-s4-0.6.0-incubating-src.zip
or clone from git.git clone https://git-wip-us.apache.org/repos/asf/incubator-s4.git
- Unzip the file
unzip apache-s4-0.6.0-incubating-src.zip
or go in the cloned directory. - Set the Apache S4 environment variable
export S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src
- Add the S4_HOME to the system PATH.
export PATH=$PATH:$S4_HOME
- Once the previous steps are done we can proceed to build and install Apache S4.
- You can have a look at the available build tasks by typing
gradle tasks
- There are some dependencies issues, therefore you should run the wrapper task first by typing
gradle wrapper
. - Build Apache S4 by running
gradle
in the S4_HOME directory. - Install the S4-TOOLS,
gradle s4-tools::installApp
Done. Now you can configure and run your Apache S4 cluster.
- The SAMOA package can be downloaded from http://samoa-project.net/ or cloned from git
git clone https://github.com/yahoo/samoa.git
. In case of SSH cloning remember to register your public key. - Unzip the SAMOA distribution package
unzip SAMOA-0.0.1-SNAPSHOT-dist.zip
Inside the SAMOA directory you will find the following files:
samoa-api-0.1.jar
samoa-s4.properties
samoa-storm.properties
samoa
SAMOA-S4-0.0.1.jar
SAMOA-Storm-0.0.1.jar
- samoa : is the execution script for the SAMOA framework.
- samoa-api-<version>.jar : is the library with the developers API for implementing new algorithms and topologies.
- SAMOA-S4-<version>.jar : is the Apache S4 platform specific adapter which enables SAMOA to run on top of Apache S4.
- samoa-s4.properties : is the configuration file for defining some S4 specific properties.
- SAMOA-Storm-<version>.jar : is the Storm platform specific adapter which enables SAMOA to run on top of
Storm.
- samoa-storm.properties : is the configuration file for defining some Storm specific properties.
When using a cloned repository, packages are to be prepared with the s4 profile:
mvn package -Ps4
. The SAMOA-S4-0.0.1.jar file will be generated in the /target directory
This section will go through the samoa-s4.properties
file and how to configure it.
In order for SAMOA to run correctly in a distributed environment there are some variables that need to be defined. Since Apache S4 uses ZooKeeper for cluster management we need to define where it is running.
# Zookeeper Server
zookeeper.server=localhost
zookeeper.port=2181
Apache S4 also distributes the application via HTTP, therefore the server and port which contains the S4 application must be provided.
# Simple HTTP Server providing the packaged S4 jar
http.server.ip=localhost
http.server.port=8000
Apache S4 uses the concept of logical clusters to define a group of machines, which are identified by an ID and start serving on a specific port.
# Name of the S4 cluster
cluster.name=cluster
cluster.port=12000
SAMOA can be deployed on a single machine using only one resource or in a cluster environments. The following property can be defined to deploy as a local
application or on a cluster
.
# Deployment strategy
samoa.deploy.mode=local
In order to deploy SAMOA in a distributed environment you MUST configure the samoa-s4.properties
file correctly. If you are running locally it is optional to modify the properties file.
The deployment is done by running the SAMOA execution script samoa
with some additional parameters.
The execution syntax is as follows:
./samoa <platform> <jar-location> <task & options>
Example:
./samoa S4 ../../../target/SAMOA-S4-0.0.1.jar "ClusteringTask -q 1 -P 5 -L 100 -G 5 -i 500000 -s (RandomRBFGeneratorEvents -K 5 -N 0.0 -V 12000 -a 2)"
The <platform> can be s4 or storm.
The <jar-location> must be the absolute path to the platform specific jar file.
The <task & options> should be the name of a known task and the options belonging to that task.