Low-latency real-time multispeaker voice conversion (VC) with cyclic variational autoencoder (CycleVAE) and multiband WaveRNN using data-driven linear prediction (MWDLP)
- UNIX
- 3.6 >= python <= 3.9
- CUDA 11.1
- virtualenv
- jq
- make
- gcc
$ cd tools
$ make
$ cd ..
- 3.1 (2021/09/25)
- Finalize VC and MWDLP Python implementations (impl.)
- Bug fixes on C impl. to match the output of Python impl.
- Fix input segmental convolution impl. as in original papers while allowing usage in real-time demo
- Update MWDLP demo and samples with VCC20 dataset
- Update VC demo and samples with VCC20 dataset
- Data preparation and preprocessing
- VC and neural vocoder models training [~ 2.5 and 4 days each, respectively]
- VC fine-tuning with fixed neural vocoder [~ 2.5 days]
- VC decoder fine-tuning with fixed encoder and neural vocoder [~ 2.5 days]
- Dump and compile models
- Decode
Real-time implementation is based on LPCNet.
Please see egs/cycvae_mwdlp_vcc20/README.md for more details on VC + neural vocoder
or
egs/mwdlp_vcc20/README.md for more details on neural vocoder only.
Patrick Lumban Tobing