Data Engineering with AWS Nanodegree

Certification Link

Introduction

This repository contains four Data Engineering projects created by Ting Lu, using AWS to build ETL/ELT pipelines and conduct big data analysis to address various business requirements (link as follows):

Data Modeling with Apache Cassandra: Build an ETL pipeline using Python driver from a directory of CSV files to an Apache Cassandra NoSQL database to improved efficiency in querying user activity data.
Cloud Data Warehouse & ELT Pipeline:Build an ETL pipeline that extracts JSON logs and metadata from S3, loads them into AWS Redshift staging tables, and transforms the data into a Star Schema Database with dimensional tables for marketing and analytics teams to query song play insights.
STEDI Human Balance Analytics- Data Lakehouse solution: Construct a lakehouse solution with landing, trusted, and curated data lake zones in AWS, utilizing Spark, Python, Glue Studio, S3, and Athena to address the STEDI data scientists' requirements.
Automatic Data Pipeline with Apache Airflow: Design, automate and monitor ETL pipelines in Apache Airflow for processing JSON logs and metadata from AWS S3 into Redshift data warehouse, involving custom operators for staging, data loading, and data quality checks, to create versatile ETL pipelines with monitoring and backfill capabilities.

Keywords & Reference:

Apache Airflow, Apache Spark,Python,PostgreSQL, Apache Cassandra, NoSQL, Data Warehouse, Data Lakehouse, AWS S3, Redshift, Athena, Glue Studio,Database, Schema, ETL & ELT pipeline, Data Modeling, Big Data

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Airflow-Data-Pipeline		Airflow-Data-Pipeline
Cloud-Data-Warehouse		Cloud-Data-Warehouse
Data-Lakehouse-Spark-AWS		Data-Lakehouse-Spark-AWS
Data-Modeling-Cassandra		Data-Modeling-Cassandra
image		image
.DS_Store		.DS_Store
README.md		README.md
certification.png		certification.png
certification_DE.png		certification_DE.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering with AWS Nanodegree

Certification Link

Introduction

Keywords & Reference:

Some example

Data Warehouse Schema in Redshift for Song Play Analysis

Data Lakehouse Solution for STEDI Human Balance Analytics

Airflow DAG for for User Activities Analysis

About

Releases

Packages

Languages

Ting-DS/Data-Engineering-with-AWS-Nanodegree

Folders and files

Latest commit

History

Repository files navigation

Data Engineering with AWS Nanodegree

Certification Link

Introduction

Keywords & Reference:

Some example

Data Warehouse Schema in Redshift for Song Play Analysis

Data Lakehouse Solution for STEDI Human Balance Analytics

Airflow DAG for for User Activities Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages