-
Notifications
You must be signed in to change notification settings - Fork 4
Home
This Wiki will share on how to get a sample Apache Spark Streaming application to work. A Twitter feed will be used to send messages into a Kafka Topic, and in return, getting it read and processed by a Spark Streaming application.
The guide can be followed by the numbered steps in the pages list on the right of this page.
With this guide, I hope that it can give aspiring Big Data developers or hobbyists a kickstart and a boost of confidence in setting up a simple streaming application end to end by themselves (and hopefully to avoid many pitfalls I have encountered in the process).
Cloud platform used:
Technologies used:
This work is not solely my own - various websites, tutorials, and individuals were helpful in the entire process. They are as follows:
- Mining Twitter Data with Python (Part 1: Collecting data) by Marco Bonzanini
- Getting Started with Spark Streaming, Python, and Kafka by Robin Moffatt
- Run Jupyter Notebook and JupyterHub on Amazon EMR by Tom Zeng
- Using Python 3.4 on EMR Spark Applications by Bruno Faria
- Professor Andrew Koh
Other referenced work may appear in the pages of the guide.
For any feedback or questions, you may contact me at this email address:
chia.yongjian [at] gmail.com
Alternatively, you can raise an issue and I will look into it asap.