AutoVC Baseline
Zero-shot voice conversion system based on autoencoder
Research and benchmark different speaker embeddings for the voice conversion system
Speaker encoder: D-Vector (baseline), X-Vector, Resemblyzer
Linguistic Content: No new linguistic encoder, AutoVC encoder contains this info already
Prosodic Encoder: No new prosodic encoder, AutoVC encoder contains the pitch info already
Decoder: AutoVC decoder
Vocoder: Wavenet, HiFi-GAN, Parallel WaveGAN?
main.py
run training process
model_vc.py
model architecture code
solver_encoder.py
training code
model_bl.py
D-Vector speaker embedding Model
synthesis.py
WaveNet vocoder
vocoder.py
WaveNet inferencing at background \
3000000-BL.ckpt
pretrained speech encoder Wan et al. [2018]
g_03280000
pretrained vocoder "HiFi-GAN"
checkpoint_step001000000_ema.pth
pretrained vocoder Wavenet
make_spect.py
to convert audio into mel-spec and output tospmel/
make_wav.py
use to convert flac to wav for the datasetmake_metadata.py
use to generatetrain.pkl
make_metadata4test.py
use to generatemetadata.pkl
, change the variableprocess_uttr
conversion.ipynb
use trained autoVC model to generateresults.pkl
vocoder.ipynb
load in theresults.pkl
and the pretrained vocder to output the coversion
- xvec, go to
dataLoader.py
edit the metadata path totrain_xvec.pkl
. change the dim_emb parameter inmain.py
to 512 - res, ... dim_emb parameter in
main.py
is the same with baseline 256
uttr002: 'Ask her to bring these things with her from the store.'
uttr010: 'People look, but no one ever finds it.'\
uttr050:
- p231: 'People look, but no one ever finds it.'
- p243: 'Have I really come to this?'
- p272: 'This represents a tough game for us.'
- p279: 'The judge said.'
- p314: 'We have no choice but to shut down.'
- p339: 'It is a hard act to follow, the Winning act.'
uttr150:
- p231: 'We have a clean bill of health.'
- p243: 'Did he trip?'
- p272: 'Labour's Scottish general secretary Alex Rowley was delighted yesterday.'
- p279: 'In each case they were a goal down.'
- p314: 'It was a moment of madness.'
- p339: 'Mind you, all was not lost.'
uttr275:
- p231: 'It does not work that way in Scottish football.'
- p243: 'It is also seeking a national mortgage rescue plan.'
- p272: 'All options are open.'
- p279: 'The script was funny.'
- p314: 'He has lost confidence and weight.'
- p339: 'This time it really could happen.'
uttr390:
- p231: 'It was fit for royalty.'
- p243: 'His record on Government has always been highly effective.'
- p272: 'It's like a basketball.'
- p279: 'However, BBC Scotland was not interested in his work.'
- p314: 'It was early morning.'
- p339: 'He has run a hell of a race.'
- Find the better model for xvec,
model_16
ormodel_64
- Using
metric-mcd.ipynb
for evaluating the MCD metrics - For comparing the output quality from the two vocoder, use the
metric-SDR_PESQ.ipynb
- WaveNet audio output folder
eval_audio-WaveNet
. or recalculate the output again with themodel_retrained
- WaveNet audio output folder
- Word Error Rate evaluation using
word_err_rate.ipynb