GitHub - phaneesh707/Microservice-based-datapipeline: Microservice-based data pipeline that uses Apache Kafka, Apache Spark, and Metabase for data ingestion, processing, and visualization.

Microservice-based Data Pipeline with Kafka, Spark, and Metabase

Overview

This repository contains a microservice-based data pipeline that uses Apache Kafka, Apache Spark, and Metabase for data ingestion, processing, and visualization. The pipeline is designed to efficiently handle streaming data, process it using Spark, and create interactive dashboards for data visualization using Metabase.

Architecture

Setup and Installation

Prerequisites

minikube /kubectl
docker

Instructions

Clone the repository:

git clone https://github.com/phaneesh707/microservice-based-datapipeline
cd microservice-based-datapipeline

Just run all the script files in each of the folder
```
 ./script_file_name.sh
```

Usage

Create kafka topic & update it in producer , consumer file

kubectl exec POD_NAME -- kafka-topics.sh --create --topic TOPIC-NAME \ 
--bootstrap-server kafka-svc:9092 --partitions 1  --replication-factor 1

to list all the topics created

kubectl exec POD_NAME -- kafka-topics.sh --list --bootstrap-server kafka-svc:9092

Enter into to postgres pod and create a DB and table and update the table name in consumer.py
```
psql -u USER user
CREATE DATABASE DB-NAME
# create table 
```
Enter into producers pod & run producer file
```
python producer.py
```
Enter into consumer pod & run the follwowing command
```
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 consumer.py
```
- now you can see the data being processed and written to database
Run
```
minikube tunnel
```
- this will make the dashboard available to localhost
Get the IP of the metabase pod
```
kubectl get svc
```
- copy the Ip of the 'metabase-service' and paste it in the browser
Ta-da! now you can access you dash-borad and get the analytics of your data

Few useful commands

Enter into pod
```
kubectl exec -it POD-NAME -- /bin/bash
```
Incase of errors to check logs of pod
```
kubectl logs POD-NAME
```
To describe pod
```
kubectl describe pod POD-NAME
```
To get all the pods/services/deployments
```
kubectl get pods/services/deployments
```

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
00-deploy		00-deploy
01-producer		01-producer
02-consumer		02-consumer
03-database		03-database
04-metabase		04-metabase
05-files		05-files
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Architecture

Setup and Installation

Prerequisites

Instructions

Usage

Few useful commands

About

Releases

Packages

Languages

phaneesh707/Microservice-based-datapipeline

Folders and files

Latest commit

History

Repository files navigation

Overview

Architecture

Setup and Installation

Prerequisites

Instructions

Usage

Few useful commands

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages