An exploration of my personal Spotify listening habits
Description
Installation
Usage
Roadmap
License
Exploring my personal Spotify listening habits. The project has a data pipeline that extracts data from Genius and Spotify api for further analysis. The project stores raw and intermediary data in a s3 data lake to be further processed and loaded.
- Clone this repository.
git clone https://github.com/vatdaell/spotify-analysis.git
- Install all the python packages
pip install -r src/ETL/requirements.txt -r src/ReportGenerator/requirements.txt
-
Set up an AWS account with an s3 bucket.
-
Create a Spotify developer account and create an application
-
Create a Genius Account account and generate a client token
-
Setup a MySQL database for use
-
Create a .env file in the project directory and fill in the details.
S3_BUCKET=bucket_name
SPOTIPY_CLIENT_ID=clientid
SPOTIPY_CLIENT_SECRET=secret
SPOTIPY_REDIRECT_URI=redirect_uri
TABLE_NAME=recent_plays_table_name
GENIUS_ACCESS_TOKEN=genius_access_token
MYSQL_HOST=mysql_host
MYSQL_PORT=port
MYSQL_USER=user
MYSQL_PASS=password
MYSQL_DB=dbname
To load songs listened to to s3 bucket and load songs data to mysql database along with loading recently played data to mysql database
python src/etl/songs_pipeline.py
python src/etl/recently_played_pipeline.py
Some interesting features I want to implement/analyze in the future
- Use a task scheduler to automate etl tasks
- Extract lyrics of recently listened songs for sentiment analysis
- Recommend similar songs based on listening history
- Link to merch store for top bands
- Analysis of genre of music listened to