Data ETL and Schema using Spark to Conduct Song Play Analysis

Background / Pupose

This projects performs ETL on data generated by a mock startup called Sparkify who provides a music streaming application. Event logs related to song play from users and meta data about the songs are stored in JSON files in AWS S3 which are then extracted, loaded and transformed using Spark and then stored back in AWS S3 in parquet file format.

Rational

By transforming data using Spark, we are able to process large amounts of data.

The database schema is normalized using a star schema and tables are structured in the following format. A star schema reduces data redundancy while also optimizing for specific queries.

Fact Table

song_plays: start_time, user_id, level, song_id, artist_id, session_id, location, user_agent

Dimension Tables

songs: song_id, title, artist_id, year, duration
artists: artist_id, name, location, latitude, longitude
users: user_id, firstname, lastname, gender, level
times: start_time, hour, day, week, month, year, weekday

File Description & Steps to Run ETL

dl.cfg includes parameters required to access S3 resources
etl.py includes script that runs ETL
Paste AWS Secret and Key in the dl.cfg file
s3-etl.ipynb is used to test etl script.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Zeplin_CSM_Interview_Presentation.pdf		Zeplin_CSM_Interview_Presentation.pdf
dl.cfg		dl.cfg
etl.py		etl.py
s3-etl.ipynb		s3-etl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data ETL and Schema using Spark to Conduct Song Play Analysis

Background / Pupose

Rational

File Description & Steps to Run ETL

About

Uh oh!

Releases

Packages

Languages

patrickluvsoj/spark-etl

Folders and files

Latest commit

History

Repository files navigation

Data ETL and Schema using Spark to Conduct Song Play Analysis

Background / Pupose

Rational

File Description & Steps to Run ETL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages