Multispeaker Community Vocoder model for DiffSinger
This is the code used to train the "HiFiPLN" vocoder.
A trained model for use with OpenUtau is available for download on the official release page.
Because a lot of PLN was spent training this thing.
Python 3.10 or greater is required.
python dataset-utils/split.py --length 1 -sr 44100 -o "dataset/train" PATH_TO_DATASET
You will also need to provide some validation audio files and save them to dataset/valid
and then run:
python preproc.py --path dataset --config "configs/hifipln.yaml"
python train.py --config "configs/hifipln.yaml"
- If you see an error saying "Total length of `Data Loader` across ranks is zero" then you do not have enough validation files.
- You may want to edit
configs/hifipln.yaml
and changetrain: batch_size: 12
to a value that better fits your available VRAM.
python train.py --config "configs/hifipln.yaml" --resume CKPT_PATH
You may set CKPT_PATH to a log directory (eg. logs/HiFiPLN), and it will find the last checkpoint of the last run.
Download a checkpoint from https://utau.pl/hifipln/#checkpoints-for-finetuning
Save the checkpoint as ckpt/HiFiPLN.ckpt then run:
python train.py --config "configs/hifipln-finetune.yaml"
- Finetuning shouldn't be run for too long, especially for small datasets. Just 2-3 epochs or ~20000 steps should be fine.
python export.py --config configs/hifipln.yaml --output out/hifipln --model CKPT_PATH
You may set CKPT_PATH to a log directory (eg. logs/HiFiPLN), and it will find the last checkpoint of the last run.