FastPitch (arXiv) trained on Thorsten Müller's Thorsten–2022.10 and Thorsten-21.06-emotional datasets.
You can listen to some audio samples here.
Required packages:
torch torchaudio pyyaml phonemizer
Please refer to here to install phonemizer
and the espeak-ng
backend.
~ for training: librosa matplotlib tensorboard
~ for the demo app: fastapi "uvicorn[standard]"
Download the pretrained weights for the FastPitch model link.
Download the HiFi-GAN vocoder weights (link). Either put them into pretrained/hifigan-thor-v1
or edit the following lines in configs/basic.yaml
.
# vocoder
vocoder_state_path: pretrained/hifigan-thor-v1/hifigan-thor.pth
vocoder_config_path: pretrained/hifigan-thor-v1/config.json
The FastPitch
from models.fastpitch
is a wrapper that simplifies text-to-mel inference. The FastPitch2Wave
model includes the HiFi-GAN vocoder for direct text-to-speech inference.
from models.fastpitch import FastPitch
model = FastPitch('pretrained/fastpitch_de.pth')
model = model.cuda()
mel_spec = model.ttmel("Hallo Welt!")
from models.fastpitch import FastPitch2Wave
model = FastPitch2Wave('pretrained/fastpitch_de.pth')
model = model.cuda()
wave = model.tts("Hallo Welt!")
wave_list = model.tts(["null", "eins", "zwei", "drei", "vier", "fünf"])
The web app uses the FastAPI library. To run the app you need the following packages:
fastapi: for the backend api | uvicorn: for serving the app
Install with: pip install fastapi "uvicorn[standard]"
Run with: python app.py
Preview:
Thanks to Thorsten Müller for the high-quality datasets.
The FastPitch files stem from NVIDIA's DeepLearningExamples