Skip to content

Spotify scrobbing project with Google Drive backup support.

License

Notifications You must be signed in to change notification settings

eegli/spotify-history

Repository files navigation

Spotify History

A simple Spotify scrobbler. Gets your listening history from Spotify, saves it to a database and creates a weekly backup in Google Drive.

Coverage Languages Dependencies

Features

  • Free of charge* - uses the AWS free tier and the Google Drive API, which is free to use
  • Utilities to get refresh tokens from both Google and Spotify
  • Easily customizable
  • Listening history export to Google Drive

Motivation

Spotify's API only exposes the last 50 songs you've listened to.

This project seeks to provide an easy and free solution to saving your Spotify listening history in an accessible place (Google Drive) where you can retrieve and analyze it quickly.

Other than that, you can of course use everything here as a starting point/guideline to create something else with Spotify, AWS, Google Drive and Serverless.

Before you start

  • This project makes use of two AWS Lambda functions, one for getting your history from Spotify and one for creating backups in Google Drive.

  • Unlike Last.FM, Spotify apparently counts as song as listened to when you listen to it for "over 30 seconds". The exact behaviour of how Spotify counts a song as listened to is not clear to me, but it seems like 30 seconds are the minimum.

  • By default, the history Lambda (scrobbler) is scheduled to get the history from Spotify at an hourly interval. With this interval, most "regular" users who listen through a song will have their full listening history captured. Assuming a very low average song duration of ~2 minutes would mean that one could listen to max. 30 songs per hour. As Spotify keeps track of the last 50 songs you've listened to, this interval would cover the entire hour. However, you may change the schedule.

  • By default, the backup Lambda is scheduled to run weekly at the start of the week (Monday at 12:30 a.m.). A week is defined according to the ISO 8610 standard and thus starts on Monday.

  • By default, items in the database expire after 1 month since they have already been backed up and are not needed anymore.

  • You might want to adjust the region in serverless.yml - provider.region if you don't live near Frankfurt (default is eu-central-1). Available regions.

You can customize the backup, schedules, item expiration and much more. Customization guide.

Requirements

  • An AWS account
  • A Spotify account
  • serverless >= 3
  • node >= v14.17.4
  • Docker (optional)

Getting started

  1. Fork and/or clone this repository and install dependencies:
git clone git@github.com:eegli/spotify-history.git

cd spotify-history

yarn
  1. Spotify setup - Create a Spotify application - app status "development" is fine - and set the redirect URL to http://localhost:3000.
  2. In the root directory, create a folder named .secrets (notice the dot!)
  3. Create a file named credentials_spotify.json and copy the template below. Insert your client id and client secret. Your Spotify secrets file should look like this:
{
  "clientId": "<your-client-id>",
  "clientSecret": "<your-client-secret>"
}
  1. Google Drive setup - Follow the quickstart guide to create a Google Cloud project and enable the Drive API. When asked to configure the consent screen, your publishing status should be testing. You will need to manually add the Google account who's drive you want to use under "Test users". In the end, you should be prompted to download your OAuth client credentials for your newly created desktop client as a JSON file.
  2. Download the credentials file, rename it to credentials_google.json and put it in the .secrets folder. It should look like this:
{
  "installed": {
    "client_id": "blablabla",
    "project_id": "spotify-history-32as4",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "blablabla",
    "redirect_uris": ["urn:ietf:wg:oauth:2.0:oob", "http://localhost"]
  }
}

Almost done!

  1. Run the following command and follow the steps. This will create a token_spotify.json file in the .secrets folder containing your long-lived Spotify refresh token. KEEP THIS FILE SECURE!
npm run token:spotify
  1. Run the following command and follow the steps. This will create a token_google.json file in the .secrets folder containing your long-lived Google Drive refresh token. KEEP THIS FILE SECURE!
npm run token:google
  1. Done!

Deploying the environments

This project includes both a staging and production environment. By default, the schedules are only enabled in production in order to save quota. The staging version is meant to be deployed but invoked manually only. If you wish to enable the schedule on staging as well, change serverless.yml:

custom:
  scheduleEnabled:
    prod: true
    stg: true # Schedule enabled on staging

Keep in mind that this will double the calls made to Lambda and DynamoDB!

In order to deploy the production version, run:

npm run prod:deploy

You can deploy the staging version as well:

# Deploy everything
npm run stg:deploy

# Deploy functions only
npm run stg:deploy:history
npm run stg:deploy:backup

Again, the staging functions are NOT scheduled by default as they are meant to be invoked manually:

# Get history from Spotify and save to DynamoDB
npm run stg:invoke:history

# Create backup in Google Drive
npm run stg:invoke:backup

Logging

To check the logs of your Lambda functions, either go to the AWS CloudWatch dashboard or retrieve them in your console.

Example: Getting the logs for production in the last 24h

sls logs -f spotify-history -s prod --startTime 1d
sls logs -f spotify-history-backup -s prod --startTime 1d

More info about logging with Serverless.

Customization

Changing history properties

By default, these are the song properties that are saved to the database (and backup):

interface DynamoHistoryElement {
  name: string;
  id: string;
  playedAt: string;
}

If you want to save other song properties, simply change this interface in src/config/types.ts and TypeScript will show you where you'll need to make adjustments. Obviously, it makes sense to at least store the timestamp of when the song was played (playedAt) and its id (id).

Changing item expiration in the database

By default, items in DynamoDB are set to expire after 1 month. If you wish to disable this, set the TTL specification in serverless.yml to false (or remove the implementation altogether for a cleaner codebase).

TimeToLiveSpecification:
  AttributeName: 'expire_at'
  Enabled: false

If you want to specify a different TTL, change the dynamoExpireAfter default in src/config/defaults.ts

Changing the backup schedule

If you want to change the backup schedule, e.g. running it daily or monthly, you'll need to adjust the cron expression in serverles.yml. Here are some resources regarding cron jobs.

⚠️ Keep in mind that the backup schedule, item expiration and time range to retrieve the items for the backup are logically connected! ⚠️

If you change the backup schedule, you'll also need to change the time range of the backup and, most likely, the item TTL as shown above.

// Example: Include history from last month

const defaults: Readonly<Defaults> = {
  dynamoExpireAfter: [2, 'months'], // Extend the expiration date
  backupRange: [1, 'month'], // Extend the backup range
  ...
};

Changing the backup folder

Update the stage and production folder names in src/config/defaults.ts.

Note that, for security reasons, the backup handler only has access to folders and files it has created itself (see OAuth 2.0 scopes and scripts/google.ts). For simplicity, the backup folder is created at the root of your Google Drive.

Development and Testing

Running DynamoDB locally

For local development and testing the db integration, AWS's official DynamoDB Docker image can be run along with another image that provides a nice GUI for inspecting the tables and items.

  1. Start the containers (DynamoDB and GUI):
npm run dynamo:start
  1. Migrate the table and seed:
npm run dynamo:migrate
  1. If you want to check if everything has been setup correctly, visit http://localhost:8001/

  2. Invoke locally:

    Note that npm run local:backup will, despite its naming, still hit the Google Drive API but save the content in a folder separate from stg and prod (local).

# Gets the history and saves it to local DynamoDB
npm run local:history

# Backs up the history to Google Drive
npm run local:backup

Good to know

The core of this project uses AWS DynamoDB Data Mapper. Unfortunately, this package does not seem to be actively maintaned and is only compatible with the AWS SDK v2. By default, the AWS SDK v2 is included in the Lambda runtime environment, but not the modular version 3. Those are the reasons why it is currently not possible to upgrade this project to use the modular AWS SDK.

Resources

About billing

* Serverless uses S3 to store the code of the deployed functions. Technically, S3 is not free. It costs a fraction of a $ per GB, but a deployment takes up so little space, you most likely won't be billed. A full month of testing "cost" me 0.01$ and I was not billed. Be aware that, if you change the schedules, this project may not be "free" anymore!