Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capacitron #977

Merged
merged 81 commits into from
May 20, 2022
Merged

Capacitron #977

merged 81 commits into from
May 20, 2022

Conversation

a-froghyar
Copy link
Contributor

@a-froghyar a-froghyar commented Nov 29, 2021

This PR implements a new model into 🐸 TTS based on the Capacitron model from Google. It's a partial implementation of the models detailed in the paper, hierarchical latent embeddings are still to be done - this is a TODO for later. If you'd like to get an idea what the model does and how it works, here's a post I did a few months ago.

I have implemented this model as part of my Master's Thesis at TU Berlin. The thesis itself is a detailed report on the implementation and subjective evalutation of this model. You can read my thesis and listen to some samples here. I'm in the process of creating a website with audio samples from my pretrained models and the uploaded thesis as well - this is WIP.

I have implemented this model into an earlier version (March 2020) of 🐸 TTS, so this new "re-implementation" still needs to be tested. I'm in that process right now, however I've wanted to open this PR already to discuss some of the ways the Trainer API needs to be adjusted to accomodate the model.

TODOs:

  • Fix tests
  • Create T1 Capacitron Test
  • Create T2 Capacitron Test
  • Tokenizer update, delete explicit espeak flag
  • Pull out gradient clipping
  • Open PR in the trainer
    |-----> Implement 2 model specific methods Trainer#26
  • Test Tacotron 1 Model
  • Test Tacotron 2 Model (thanks to help from @Edresson, the Capacitron VAE module is modular, so we could test it in T2 as well, however the original authors did not explore this for some reason)
  • Test different attention algos [in my experiments, only graves worked]
    - original doesn't work with capacitron and dynamic_convolution only works with Tacotron 2
    - Use Graves with T1
    - User DCA with T2
  • Fix beta initialisation on continue trainings

@erogol @Edresson @WeberJulian I'd appreciate it if you could review the changes and discuss some of the specifics in the code.

TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
TTS/trainer.py Outdated Show resolved Hide resolved
@a-froghyar
Copy link
Contributor Author

Update: I've just ran the first training and it is slightly off. I suspect there's an error in the loss calculation because of the reorganisations from @Edresson a few months back. I'm investigating that today and will push the new commits.

@a-froghyar
Copy link
Contributor Author

Update: managed to do the first training that gave some promising results. I'm including now a step wise gradual lr scheduler (unlike the Noam Scheduler, this takes in hardcoded step # thresholds and learning rates), which proved necessary in my previous implementation. More updates to follow. :)

@@ -0,0 +1,64 @@
'''
This will be deleted later, only for dev, to see how to infer the capacitron model
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will be deleted before the merge, it's only a script to show how to infer the model for others who are experimenting with this work

- added reference wav and text args for posterior inference
- some formatting
@a-froghyar
Copy link
Contributor Author

@erogol from my side this is ready to go. 😊 Big thanks to @WeberJulian for all the help!

@WeberJulian
Copy link
Contributor

Just waiting for my ljpeech T2 capacitron run to converge. If it does, I'll merge both this PR and coqui-ai/Trainer#26

@a-froghyar
Copy link
Contributor Author

a-froghyar commented Apr 14, 2022

Trainings are not converging since the reorganisation of the previous 2 weeks. Reporting back here soon

Edit: commits below fixed all issues

@WeberJulian WeberJulian merged commit 8be21ec into coqui-ai:dev May 20, 2022
@erogol
Copy link
Member

erogol commented May 30, 2022

FINALLY !!!! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants