SpeechT5-Non-English-TTS

Fine-tune SpeechT5 for non-English text-to-speech tasks, implemented in PyTorch.

This repository contains code and resources for fine-tuning (or training) a SpeechT5 model on a non-English language for a text-to-speech task. The project leverages Huggingface's transformers library and speechbrain to load necessary models and tools. Other parts of the code, such as data preprocessing and train and evaluate functions, have been fully implemented using PyTorch. Therefore, feel free to make any changes you need to train your model efficiently.

Project Overview

The main objective of this project is to fine-tune the SpeechT5 model for text-to-speech on a non-English language. The steps include:

Setting up the environment.
Loading necessary tools (tokenizer and feature extractor) and models (SpeechT5 itself, a model to generate X-vector speaker embeddings, and the vocoder).
Most importantly: Adding the unique characters of the language you want to fine-tune the model on to the tokenizer and modifying the input embedding matrix of the model accordingly.
Loading and preprocessing your data.
Training and evaluating the model.

Generated Samples

Here are some generated samples from the model that I trained on the Persian Common Voice dataset.

Sample 1

1.mp4

Sample 2

2.mp4

Sample 3

3.mp4

Sample 4

4.mp4

Sample 5

5.mp4

References

This code draws lessons from:
https://huggingface.co/learn/audio-course/en/chapter6/fine-tuning

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
imgs		imgs
results		results
README.md		README.md
config.py		config.py
dataset.py		dataset.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechT5-Non-English-TTS

Project Overview

Generated Samples

References

About

Releases

Packages

Languages

HoseinAzad/SpeechT5-Non-English-TTS

Folders and files

Latest commit

History

Repository files navigation

SpeechT5-Non-English-TTS

Project Overview

Generated Samples

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages