Nanodegree Program Overview (see Udacity)
In this project, I was tasked with building an ETL pipeline that extracts song records files from S3, stages them in Redshift, and transforms data into a set of dimensional tables for analytics team to continue finding insights in what songs their users are listening to".
-
Instantiate a new cluster in Redshift ensuring proper IAM role and VPC groups are attached.
-
Copy cluster endpoint, IAM role ARN and db credentials into dwh.cfg
-
Open console and import python files:
import create_tables as ct import etl
-
Run main() method in each file sequentially to perform table setup and etl operations:
ct.main() etl.main()
See Level Analysis Dashboard tracks songplay metrics based on User Tier (free vs. paid)
- 7-Jul-2020 initial commit