GitHub - chronoB/SpotifyDataAnalyzer: Analyzer of User Data saved by Spotify

Spotify User Data Analyzer

About

This repository is build to analyze the user data spotify is collecting. The downloadable personal user data can be examined regarding different features. Right now it only analyses the streaming history of one or multiple users. Other analyses aren't planned at the moment.

Dependencies

PYTHON >=3.7

How to get the spotify user data

Disclaimer: Only checked for german spotify account. I don't know if it works for every account or is a country-specific features due to specific GDPR laws.

v2

Go to : https://www.spotify.com/account/privacy/ You can download the complete extended history of spotify streaming. the data has a different format that the dataset from v1. You can find the dataformat in the README that you will get with the data or look in the documentation of ./src/extendedAnalyzer.py.

v1

Go to : https://www.spotify.com/account/privacy/ Follow the instructions under "Deine Daten herunterladen". The process should take 1-7 days. You will be emailed if your data is ready to be downloaded.

Afterwards copy the extracted folder into this directory.

User data structure (v1)

Detailed explanation of the data : https://support.spotify.com/de/account_payment_help/privacy/understanding-my-data/

The following structure is based on the folder i received. There could be files missing or some additional files (the website is referencing a voiceinput and a conclusions file)

./my-spotify-data

CarThing.json // Information about a (possibly existing) CarThing-device
FamilyPlan.json // Information about family subscription
Follow.json // Details about followers, or accounts/artist you follow
Payments.json // Details about your payment method (if existing)
Playlist.json // Information about created and saved playlist included all saved songs
ReadMeFirst.pdf // General information about the data
SearchQueries.json // List with information about your search queries
StreamingHistory.json // List of Elements you listened to in the last year
Userdata.json // Several different userdata
YourLibrary.json // Information about saved songs in your library

SearchSpecifics

The SearchSpecifics design the search query. There are two main queries: specific time or time period. The keys for either of them cannot be used together (e.g. yearand startYear). Additional parameters described below can be added to both of them.

Search for a specific time (a specific day, a year, a specific hour)

#search for every song that was played on the 2nd of September in 2019 between 10 and 11.
payload = {
  "year": 2019,
  "month": 9,
  "day": 2,
  "hour": 10
}
#search for every song that was played on the 2nd of a month
payload = {
  "day": 2
}

#search for every song that was played in February 2020
payload = {
  "year": 2020
  "month": 2
}

Search for timeperiod (from 2019-05-03 until 2019-07-22)

IMPORTANT: If any of the following parameters is used, every of the following parameters has to be used. They come as a bundle.

payload = {
        "startYear": 2019,
        "startMonth": 5,
        "startDay": 3,
        "endYear": 2019,
        "endMonth": 7,
        "endDay": 22,
    }

Additional search parameters

count specifies how many items should be returned (default=5). media determines if the search result should include/exclude podcast/music (default=all). The podcasts are defined in data/podcastFile.txt. Right now it's a collection of found podcasts in the example files. Future plans involve fetching a list of podcasts from a podcatcher (feel free to contribute). ratingCrit is used to sort items per playtime or per clicks, because spotify is saving the ms played per song click (default=clicks).

payload = {
        ...
        "count": 3, #how many items should be returned
        "media": "podcast", #"podcast", "music", "all"
        "ratingCrit": "time", #"time", "clicks"
    }

Plans

add the possibility to sort for genre. Need external lib for that. Don't know if possible
add the spotify audio analyze api for fancy stuff
add multiple year support (if not already happened)
add more analyzers (searchHistory analyzer, general information analyzer). Therefor maybe restructure the code

Contributions are welcome!

If you want to contribute to this, please read the Contribution guidelines

Name		Name	Last commit message	Last commit date
Latest commit History 211 Commits
.github		.github
data		data
src		src
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
azure-pipelines.yml		azure-pipelines.yml
example.py		example.py
extendedExample.py		extendedExample.py
podcast_fetcher.py		podcast_fetcher.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify User Data Analyzer

About

Dependencies

How to get the spotify user data

v2

v1

User data structure (v1)

SearchSpecifics

Search for a specific time (a specific day, a year, a specific hour)

Search for timeperiod (from 2019-05-03 until 2019-07-22)

Additional search parameters

Plans

Contributions are welcome!

About

Releases 4

Packages

Contributors 3

Languages

License

chronoB/SpotifyDataAnalyzer

Folders and files

Latest commit

History

Repository files navigation

Spotify User Data Analyzer

About

Dependencies

How to get the spotify user data

v2

v1

User data structure (v1)

SearchSpecifics

Search for a specific time (a specific day, a year, a specific hour)

Search for timeperiod (from 2019-05-03 until 2019-07-22)

Additional search parameters

Plans

Contributions are welcome!

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages