This repository contains all my data engineering projects. In this repo, I will be exploring various data engineering tools and techniques.
-
- A data pipeline implemented using Apache Airflow on Amazon Web Services (AWS) for processing OpenWeather data.
- The pipeline involves extracting weather data from the OpenWeather API, transforming it, and loading it into a data warehouse for analysis and visualization.
-
- I created a data pipeline using Airflow on docker. The pipeline will download podcast episodes.
- I stored our results in a Postgres database that we can easily query.
-
- A data pipeline implemented on Amazon Web Services (AWS) for processing Spotify data.
- The pipeline involves loading CSV files containing information about artists, tracks, and albums into an S3 bucket.
- performing ETL (Extract, Transform, Load) using AWS Glue, storing the processed data as Parquet files, and finally querying and visualizing the data using Amazon Athena and Power BI.
-
- A data engineering project for simulating data generation using Python for Apache Kafka, processing the data with Apache Spark, and storing it in Amazon S3.
- All services will be orchestrated and run on Docker containers.
If you have any queries, feel free to reach out to me at riteshojha2002@gmail.com or create issue here.