Skip to content

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for building prosody encoder.

Notifications You must be signed in to change notification settings

BridgetteSong/ExpressiveTacotron

Repository files navigation

Expressive Tacotron (implementation with Pytorch)

Introduction

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron.

The framework also includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

  • Only provides kernel model files, not including data prepared scripts, training scripts and synthesis scripts
  • You can reference ExpressiveTacotron for more training scripts

Available recipes

Expressive Mode

Framework Mode

Differences

  • Non-attentive Tacotron: duration stacked convolution layers are concatenated with encoder outputs

Acknowledgements

This implementation uses code from the following repos: NVIDIA, ESPNet, ERISHA, ForwardAttention

About

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for building prosody encoder.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages