Low-latency real-time multispeaker voice conversion (VC) with cyclic variational autoencoder (CycleVAE) and multiband WaveRNN using data-driven linear prediction (MWDLP)

Requirements:

UNIX
3.6 >= python <= 3.9
CUDA 11.1
virtualenv
jq
make
gcc

Installation

$ cd tools
$ make
$ cd ..

Latest version

3.1 (2021/09/25)
- Finalize VC and MWDLP Python implementations (impl.)
- Bug fixes on C impl. to match the output of Python impl.
- Fix input segmental convolution impl. as in original papers while allowing usage in real-time demo
- Update MWDLP demo and samples with VCC20 dataset
- Update VC demo and samples with VCC20 dataset

Compilable demo

Samples from compilable demo

Steps to build the models:

Data preparation and preprocessing
VC and neural vocoder models training [~ 2.5 and 4 days each, respectively]
VC fine-tuning with fixed neural vocoder [~ 2.5 days]
VC decoder fine-tuning with fixed encoder and neural vocoder [~ 2.5 days]

Steps for real-time low-latency decoding with CPU:

Dump and compile models
Decode

Real-time implementation is based on LPCNet.

Details

Please see egs/cycvae_mwdlp_vcc20/README.md for more details on VC + neural vocoder

or

egs/mwdlp_vcc20/README.md for more details on neural vocoder only.

References

[1] High-Fidelity and Low-Latency Universal Neural Vocoder based on Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform Modeling

[2] Low-latency real-time non-parallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction

Contact

Patrick Lumban Tobing

patrickltobing@gmail.com

patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
egs		egs
src		src
tools		tools
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Low-latency real-time multispeaker voice conversion (VC) with cyclic variational autoencoder (CycleVAE) and multiband WaveRNN using data-driven linear prediction (MWDLP)

Requirements:

Installation

Latest version

Compilable demo

Samples from compilable demo

Steps to build the models:

Steps for real-time low-latency decoding with CPU:

Details

References

Contact

About

Releases

Packages

Contributors 2

Languages

License

patrickltobing/cyclevae-vc-neuralvoco

Folders and files

Latest commit

History

Repository files navigation

Low-latency real-time multispeaker voice conversion (VC) with cyclic variational autoencoder (CycleVAE) and multiband WaveRNN using data-driven linear prediction (MWDLP)

Requirements:

Installation

Latest version

Compilable demo

Samples from compilable demo

Steps to build the models:

Steps for real-time low-latency decoding with CPU:

Details

References

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages