Skip to content

Personal Spotify Wrapped using Airflow, Django and Docker

Notifications You must be signed in to change notification settings

AndresNavarrete/spotify-wrapped

Repository files navigation

update_content

Spotify wrapped

The goal of this repository is to show my personal Spotify data with some trends on my most listened artists and songs.

Data is fetched from Spotify API and stored in a Postgres database on a daily basis. The stored data is my daily short-term ranking of most listened song and artists. This process is orquestrated using Apache Airflow and runs on Docker containers and the data is accesible using a Django API REST

Project diagram

Top artists

Artists

Top songs

Songs

Documentation

Table of Contents

Database & API setup

The Postgres database runs on docker container. It holds some Django related tables, the historic data of artists, songs and the most recent ranking data.

A Django API is also provided in a docker container to fetch data from the database and expose it. Currently the API supports the historic trend of most listened artists and songs.

To build the images and run the container just run

bash bash/docker_init.sh

This command will take take of start the database and generate all tables and views needed for this project. Also will initialize the API Rest for fetching data from the database.

For this to work you need the following enviroment variables.

# Postgres
POSTGRES_DB=""
POSTGRES_PASSWORD=""
POSTGRES_USER=""
POSTGRES_PORT=""
POSTGRES_HOST=""

# Django
DJANGO_ANON_USER=""
DJANGO_ANON_PASSWORD=""
DJANGO_SECRET_KEY=""
DJANGO_PORT=""

Spotify API setup

To use Spotify API we must configure an authorizion token. For the purpose of this project the most suitable one is the authorization code flow. Follow those steps have acces to your personal Spotify Account data.

Ultimately, you will need these enviroment variables to make it work.

# Spotify
CLIENT_ID=""
CLIENT_SECRET=""
REFRESH_TOKEN=""

ETL setup

The data pipeline is modelated as a Extract-Transform-Load process. The recommendation is to use Airflow as the orquestator, but a simple cronjob would to the trick if Airflow is too much for your server.

Airflow setup: Run locally

Basically follow this quickstart guide to install Airflow locally. In addition, install the Postgres Airflow plugin.

Airflow setup: Run on container (recommended)

Build the image with docker-compose build in the airflow directory and run the cointainer with docker-compose up. For more detailed instruction you can read Running Airflow in Docker

Also, add the postgres_spotify_app connection to the connection using the same data from the Database setup section.

Simple Crontab Setup

If you need a simpler version of the ETL without using Airflow you can set a cronjob using the following command

crontab -l | { cat; echo "0 0 * * * (date; cd <Repository absolute path> && bash bash/spotify_daily_etl.sh) >> logs/spotify_logs.log 2>&1 "; } | crontab -