Skip to content

This repository contains a real-time pipeline to ingest/receive data from a real-time stream of events and deploys a streaming application to aggregate some events in real time.

Notifications You must be signed in to change notification settings

Sommie09/StreamingApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming App with Amazon Kinesis Data Analytics

This repository contains a real-time pipeline to ingest/receive data from a real-time stream of events and deploys a streaming application to aggregate some events in real time.

Workflow

The web application feeds event data to Kinesis Streams, the raw data is stored in an S3 bucket using Amazon Firehose, additionally, the aggregated data is stored in a sub folder on Amazon S3 (see diagram below)

Copy of New Intro (2)

Installation and Setup

  1. Clone this repository.
  2. Set up an AWS account and download your access and secret key from AWS IAM
  3. Configure AWS CLI on your terminal using aws configure
  4. Type in your credentials
  5. Navigate to IAC in your terminal and type these commands terraform init, terraform apply to set up AWS infrastructure.

Setting up streaming application

Step 1: Navigate to AWS Kinesis Data Streams, select raw_data_stream and click on Process data in real-time

Screenshot 2024-07-31 at 21 31 26

Step 2: Create an Apache Flink - Studio Notebook

Screenshot 2024-07-31 at 11 24 24

Step 3: Select a database (This was already been created on the Terraform)

Screenshot 2024-07-31 at 11 24 31

Step 4: Run the notebook and click Open Apache Zeppelin

Screenshot 2024-07-31 at 21 33 58

Step 5: On the notebook, run the CREATE TABLE raw_data_table on sql_queries file in the repo

Step 6: Run the producer.py file to produce data

Screenshot 2024-07-30 at 15 09 26

**Step 7: Run the Query to perform aggregations on sql_queries file in the repo

WhatsApp Image 2024-07-30 at 17 02 00

Step 8: Run the CREATE TABLE aggregated_data_table and INSERT INTO aggregated_data_table on sql_queries file

Step 9: Create a firehose to connect to S3, specify the source and directory structure

Screenshot 2024-07-31 at 12 14 19

Raw data in the S3 Bucket

Screenshot 2024-07-31 at 21 39 07

Deploy the streaming application

Screenshot 2024-07-31 at 19 20 08

About

This repository contains a real-time pipeline to ingest/receive data from a real-time stream of events and deploys a streaming application to aggregate some events in real time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published