Jack Phan and Alex Kyllo
UW Bothell CSS 586
Spring 2021
A project for generative music modeling with deep learning on symbolic (MIDI) music data.
This is a Python 3 project using TensorFlow 2.
Miniconda 3.9 or 3.8 is recommended because it will install the correct version of CUDA for TensorFlow to utilize the GPU.
Dependencies are specified in the conda environment file environment.yml.
To install the dependencies:
conda env create -f environment.yml
To activate the conda virtual environment:
conda activate musiclearn
This project uses the MusicNet reference MIDI files, which can be downloaded here: musicnet_midis.tar.gz
Data file paths and other constants should be specified in a .env
file in the
project root (this directory). The
python-dotenv package is used to read
these into shell environment variables.
To run the code you will need to create your own .env file (or
directly set the MUSICNET_MIDI_DIR
environment variable to the
absolute path of your musicnet_midis
directory in your shell
session). Below are the contents of my .env file, yours will have
different directory paths depending where you placed the downloaded
music data. Download and unzip the musicnet_midis.tar.gz
file and
add MUSICNET_MIDI_DIR=<absolute path to musicnet_midis directory>
to
the .env file, like this:
MUSICNET_MIDI_DIR=/media/hdd1/data/school/css586/musicnet_midis
Here is a guide to the project structure:
├── data <- Stores processed training data
├── experiments <- Experiment history logs, checkpoints, and saved models
| └── mtvae <- Saved results for best MTVAE interpolation model experiment
├── musiclearn <- The Python package containing models and training code
│ ├── config.py <- Global configuration variables such as MUSICNET_MIDI_DIR
│ ├── plotting.py <- Plotting code for visualizing metrics such as training loss curves
│ ├── processing.py <- MIDI data preprocessing code (for polyphonic music)
│ ├── sequential_models.py <- Sequential note prediction models (Jack's models)
│ ├── single_note_processing.py<- Sequential note processing code (for monophonic music)
│ ├── training.py <- Model training and hyperparameter tuning code
│ └── vae_models.py <- Multi Track Variational Autoencoder (MTVAE) code (Alex's models)
├── notebooks <- Jupyter notebook examples for model training, inference, and evaluation
├── papers <- LaTeX source code for the papers
├── scripts <- Python scripts for string quartet model inference and evaluation
├── .env <- Environment variables (create your own)
├── .gitignore <- Ignore files in Git.
├── README.md <- This file.
├── environment.yml <- Conda environment definition file (Python dependencies)
└── musiclearn_cli.py <- Command Line Interface for model training and inference
The musiclearn_cli.py
file provides a CLI for working with the models.
Typing python musiclearn_cli.py
will show a list of commands:
Usage: musiclearn_cli.py [OPTIONS] COMMAND [ARGS]...
Command line interface for the musiclearn project
Options:
--help Show this message and exit.
Commands:
fit-mtvae Run MultiTrackVAE experiment named EXP_NAME with...
fit-sequential Fit a sequential model of choice on the specified...
generate-music Generate a short piece of music with a fixed number...
interpolate-mtvae Use MODEL_PATH to interpolate n points between...
plot-losses Plot model training & validation loss curves from...
To get the list of options for a command, type
python musiclearn_cli.py [COMMAND] --help
, for example the model fitting
commands provide all tunable hyperparameters as options:
$ python musiclearn_cli.py fit-mtvae --help
Usage: musiclearn_cli.py fit-mtvae [OPTIONS] EXP_NAME
Run MultiTrackVAE experiment named EXP_NAME with hyperparameter options.
Author: Alex Kyllo
Options:
--ticks-per-beat INTEGER Time steps per quarter note.
--beats-per-phrase INTEGER Quarter notes per phrase.
--epochs INTEGER The training batch size.
--batch-size INTEGER The training batch size.
--learning-rate FLOAT The optimizer learning rate.
--lstm-units INTEGER Number of LSTM units per layer.
--latent-dim INTEGER The latent vector dimension.
--embedding-dim INTEGER The note embedding dimension.
--dropout-rate FLOAT The dropout rate between LSTM layers
--gru / --lstm Use GRU layer instead of LSTM.
--bidirectional / --unidirectional
Use bidirectional LSTM layer in encoder.
--augment / --no-augment Augment the training set with random pitch
shifts.
--patience INTEGER The early stopping patience.
--help Show this message and exit.
A collection of samples of the generated MIDI files, converted to mp3 format, is presented at: mp3samples (Google Drive Link)
Within this directory, there are two subdirectories:
Four output samples, one from each model, are provided.
Each of the 18 directories contains five interpolations between two of the string quartet pieces in the MusicNet corpus, numbered 0-4.
The interpolations are between the following pairs of MusicNet MIDI files:
Haydn/2104_op64n5_1.mid
andRavel/2178_gr_rqtf2.mid
Haydn/2105_op64n5_2.mid
andRavel/2179_gr_rqtf3.mid
Haydn/2106_op64n5_3.mid
andRavel/2177_gr_rqtf1.mid
Beethoven/2497_qt11_4.mid
andMozart/1822_kv_421_1.mid
Beethoven/2433_qt16_3.mid
andMozart/1859_kv_464_2.mid
Beethoven/2368_qt12_4.mid
andMozart/1807_kv_387_3.mid
Beethoven/2314_qt15_2.mid
andMozart/1791_kv_465_4.mid
Beethoven/2480_qt05_1.mid
andMozart/1792_kv_465_1.mid
Beethoven/2481_qt05_2.mid
andMozart/1835_kv_590_3.mid
Beethoven/2379_qt08_4.mid
andMozart/1805_kv_387_1.mid
Beethoven/2365_qt12_1.mid
andMozart/1793_kv_465_2.mid
Beethoven/2562_qt02_4.mid
andMozart/1790_kv_465_3.mid
Beethoven/2494_qt11_1.mid
andMozart/1789_kv_465_2.mid
Beethoven/2403_qt01_4.mid
andMozart/1788_kv_465_1.mid
Beethoven/2376_qt08_1.mid
andDvorak/1916_dvq10m1.mid
Beethoven/2384_qt13_4.mid
andBach/2242_vs1_2.mid
Beethoven/2560_qt02_2.mid
andBeethoven/2621_qt07_1.mid
Beethoven/2377_qt08_2.mid
andBeethoven/2381_qt13_1.mid
A saved model for string quartet interpolation (the best fit model) is provided in this repository for inference use: experiments/mtvae/2021-06-06T09:32:32/saved_model