streaming-app requires:
Scala 2.10.3
Hbase 0.98.1
kafka_2.10-0.8.1
conf/spark-env.sh
export HADOOP_CONF_DIR=/home/ocdc/hadoop-2.3.0-cdh5.0.0-och3.1.0/etc/hadoop export SPARK_YARN_APP_JAR=/home/ocdc/spark_dev/target/spark-dev-V00B01C00-SNAPSHOT-jar-with-dependencies.jar export SPARK_JAR=/home/ocdc/spark-assembly-0.9.1-hadoop2.3.0-cdh5.0.0.jar
conf/Sample.xml
Sample.xml is used for processing flow, used for filtering rules, data judgment conditions, judging and dynamic accumulation factor
................ com.asiainfo.ocdc.streaming.impl.KafkaSource Specifies the ZooKeeper connection string in the form hostname:port, where hostname and port are the host and port for a node in your ZooKeeper cluster. To allow connecting through other ZooKeeper nodes when that host is down you can also specify multiple hosts in the form hostname1:port1,hostname2:port2,hostname3:port3 cmbb3 topicProducername test-consumer-group 3 a,b,c,d,e,f,count,fee com.asiainfo.ocdc.streaming.impl.StreamFilter t1 cell lac,cell b,c,t1.cell,count,fee t1.cell!=null com.asiainfo.ocdc.streaming.impl.DynamicOperate t2 b Count,Fee t2.Count+count,t2.Fee+fee b,c,t1.cell,count,fee com.asiainfo.ocdc.streaming.impl.KafkaOut topicName topic ConsumerName dev001:9092 The port the socket server listens on, hostname1:port1,hostname2:port2,hostname3:port3 b,c ......
mvn package
The command format is as follows,To start the streaming app application
./bin/start-streaming-app.sh <streaming-app-name> <time> <file>
Parameter 1, When should the different configuration XML file, the corresponding streamingappname
Parameter 2, flow interval refresh time(seconds)
Parameter 3, The configuration file (as:conf/Sample.xml)
Test documentation, reference streaming-app project wiki https://github.com/asiainfo/streaming-app/wiki