Cloud Data Warehouse Implementation in AWS

Udacity Data Engineering Nanodegree

Submitted by: Miriam Farrington

Introduction

Nanodegree Program Overview (see Udacity)

In this project, I was tasked with building an ETL pipeline that extracts song records files from S3, stages them in Redshift, and transforms data into a set of dimensional tables for analytics team to continue finding insights in what songs their users are listening to".

Instructions

Instantiate a new cluster in Redshift ensuring proper IAM role and VPC groups are attached.
Copy cluster endpoint, IAM role ARN and db credentials into dwh.cfg

Open console and import python files:

  import create_tables as ct
  import etl

Run main() method in each file sequentially to perform table setup and etl operations:
```
  ct.main()
  etl.main()
```

Data Visualization

See Level Analysis Dashboard tracks songplay metrics based on User Tier (free vs. paid)

Changelog

7-Jul-2020 initial commit

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Level Analysis Dashboard.png		Level Analysis Dashboard.png
README.md		README.md
create_tables.py		create_tables.py
dwh.cfg		dwh.cfg
etl.py		etl.py
sql_queries.py		sql_queries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Data Warehouse Implementation in AWS

Udacity Data Engineering Nanodegree

Submitted by: Miriam Farrington

Introduction

Instructions

Data Visualization

Changelog

About

Releases

Packages

Languages

License

mmfarrington/songplay_data_warehouse

Folders and files

Latest commit

History

Repository files navigation

Cloud Data Warehouse Implementation in AWS

Udacity Data Engineering Nanodegree

Submitted by: Miriam Farrington

Introduction

Instructions

Data Visualization

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages