Skip to content

Walk through insanely commented code for an advanced recurrent model in TensorFlow

Notifications You must be signed in to change notification settings

brijow/tacotron-2-explained

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tacotron 2 Explained

This repository is meant to teach the intricacies of writing advanced Recurrent Neural Networks in Tensorflow. The code is used as a guide, in weekly Deep Learning meetings at Ohio State University, for teaching -

  1. How to read a paper
  2. How to implement it in Tensorflow

I choose Tacotron 2 because -

  1. Encoder-Decoder architectures contain more complexities then standard DNNs. Implementing one helps you master concepts you would otherwise overlook
  2. Tachotron 2 was released less than a year ago (as of 2018) and is a relatively simple model (compared to something like GNTM). The associated paper explains the architecture well
  3. Other public implementations offer a benchmark to compare results
  4. Public datasets are available to achieve state of the art results
  5. Training requires ~10 days given access to a GPU (comparable to GTX 1080)

Note: This code has no affiliation with the companies I worked at. I used none of the proprietery knowledge of any of those companies to write this code. This was purely an exercise in self study.

The paper followed in this repository is - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. The repository only implements the Text to Mel Spectrogram part (called Tacotron 2). The repository does not include the vocoder used to synthesize audio.

This is a production grade code which can be used as state of the art TTS frontend. The blog post [TODO] shows some audio samples synthesized with a Griffin Lin vocoder. But the code has excess comments to aid a novice Tensorflow user which could be a hindrance. To read the code, start from train.py

The repository also uses Tensorflow's tf.data API for pre-processing and [TODO] Estimator API for modularity

Directory Structure

The directory structure followed is as specified in Stanford's CS230 Notes on Tensorflow. We modify the structure a bit to suite our needs.

data/ (Contains all data)
model/ (Contains model architecture)
    input_fn.py (Input data pipeline)
    model_fn.py (Main model)
    utils.py (Utility functions)
    loss.py (Model loss)
    wrappers.py (Wrappers for RNN cells)
    helpers.py (Decoder helpers)
    external/ (Code adapted from other repositories)
        attention.py (Location sensitive attention)
        zoneout_wrapper.py (Zoneout)
train.py (Run training)
config.json (Hyper parameters)
synthesize_results.py (Generate Mels from text)

Requirements

The repository uses Tensorflow 1.8.0. Some code may be incompatible with older versions of Tensorflow (specifically the Location Sensitive Attention Wrapper).

Setup

  1. Setup python 3 virtual environment. If you dont have virtualenv, install it with
pip install virtualenv
  1. Then create the environment with
virtualenv -p $(which python3) env
  1. Activate the environment
source env/bin/activate
  1. Install tensorflow
pip install tensorflow==1.8.0
  1. Clone the repository
git clone https://gitlab.com/codetendolkar/tacotron-2-explained.git
  1. Run the training script
cd tacotron2
python train.py

Generate Mels from Text

Synthesize Audio from Mels

Credits and References

  1. "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu arXiv:1712.05884
  2. Location Sensitive Attention adapted from Tacotron 2 implementation by Keith Ito - GitHub link
  3. Zoneout Wrapper for RNNCell adapted from Tensorflow's official repository for MaskGan. The code contributed by A Dai - GitHub link
  4. And obviously - all the contributors of Tensorflow
  5. Internet

About

Walk through insanely commented code for an advanced recurrent model in TensorFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%