voco.ai/GMM at master · siddharth17196/voco.ai

README.md

Converting voice from a source speaker to a target speaker using a parallel dataset and GMMs

CMU Arctic dataset
Our own dataset made from an audiobook sampled at 2/3 seconds
- 2 second sampled
- 3 second sampled

T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum likelihood estimation of spectral parameter trajectory,” IEEETrans. Audio, Speech, Lang. Process, vol. 15, no. 8, pp. 2222–2235, Nov. 2007.
Kobayashi, Kazuhiro, et al. “Statistical Singing Voice Conversion with Direct Waveform Modification based on the Spectrum Differential.” Fifteenth Annual Conference of the International Speech Communication Association. 2014.