Skip to content

CapitalOne-AU-Hackathon/au-hackathon-streaming-app

 
 

Repository files navigation

There are two ways of listening to the data in Kafka

  1. Using the Kafka console echo consumer
  2. Using this Spark project

Using the Kafka console echo consumer

Install Kafka

Step 1

Download Kafka v0.11.0.1 for Scala 2.11 (One example mirror here)

Step 2

Move to desired location

cd ~/Downloads
mkdir -p /usr/local/Cellar/kafka/0.11.0.1
cp 

Step 3

Unzip the archive

tar -zxvf kafka_2.11-0.11.0.1.tgz

Step 4

Set environment variable KAFKA_HOME to the installation path

export KAFKA_HOME=/usr/local/Cellar/kafka/0.11.0.1

Step 5

Run the shell script

$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server kafkastreaming.capitalonehackathon.com:9092 --topic au_hackathon

Using this project

Compiling the code

Step 1

Clone this repository into local

git clone https://github.com/badrishdavey/au-hackathon-streaming-app.git

Cd into the directory

cd au-hackathon-streaming-app

Compile the code into jar

mvn clean package

Step 2

Upload the Spark jar to your team directory in S3 AWS S3 jar upload image

Deploying onto AWS EMR

Step 1

Create the EMR cluster

Name the cluster after your team

Select Spark as the Software configuration

Select the Hackathon pem file for the EC2 key pair AWS EMR create cluster image

Step 2

Wait for the EMR cluster to initialize AWS EMR cluster initialized image

Step 3

Download the Spark jar from S3 onto the EMR master

ssh -i ~/AU_Hackathon.pem hadoop@ec2-54-159-186-164.compute-1.amazonaws.com
aws s3 cp s3://auhackathon/omar/au-hackathon-streaming-0.1.jar .

Step 4

Navigate to the Steps tab for the EMR cluster Add step

Step type: Spark application
Deploy mode: Client
Spark-submit options: --master yarn --class com.test.App
Application location: /home/hadoop/au-hackathon-streaming-0.1.jar
Arguments: ec2-54-174-211-86.compute-1.amazonaws.com:9092 au_hackathon 5
Action on failure: Continue

AWS EMR Add Step image

Step 5

Wait for the Spark job to start, approximately 5 minutes

Click on View logs

Click on stdout and you should see the dataframe output printout printing every few seconds

AWS EMR Success image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 83.7%
  • Scala 16.3%