Learning Source Disentanglement in Neural Audio Codec, ICASSP 2025
Xiaoyu Bie, Xubo Liu, Gaël Richard
The pretrained models can be downloaded via Google Drive
All our models are trained on 8 A-100 80G GPUs
conda env create -f environment.yml
conda activate gen_audio
The code was tested on Python 3.11.7 and PyTorch 2.1.2
To install VisQol, the simples way is to use Bazelisk, especially if you want to install it on the cluster where you do not have the sudo
right.
Here is an example
We use the following dataset:
mkdir manifest
# DnR
pyhthon prepare/mani_dnr.py --data-dir PARH_TO_DnR
# DNS Challenge 5
pyhthon prepare/mani_dns_clean.py --data-dir PARH_TO_DNS_CLEAN # or by partition
pyhthon prepare/mani_dns_noise.py --data-dir PARH_TO_DNS_NOISE
# Jamedo
pyhthon prepare/mani_jamendo.py --data-dir PARH_TO_JAMENDO # or by partition
# MUSAN
pyhthon prepare/mani_musan.py --data-dir PARH_TO_MUSAN
# WHAM
pyhthon prepare/mani_wham.py --data-dir PARH_TO_WHAM
# debug on single GPU
accelerate launch --config_file config/acc/fp16_gpus1.yaml main.py --config-name debug +run_config=slurm_debug
# training on 8 GPUs
accelerate launch --config_file config/acc/fp16_gpus8.yaml main.py --config-name default +run_config=slurm_1
By default, we use the last checkpoint for the evaluation
model_dir=PATH_TO_MODEL
nohup python eval_dnr.py --ret-dir ${model_dir} --csv-path ./manifest/val.csv --length 5 > ${model_dir}/val.log 2>&1 &
nohup python eval_dnr.py --ret-dir ${model_dir} --csv-path ./manifest/test.csv --length 10 > ${model_dir}/test.log 2>&1 &
If you find this project usefule in your resarch, please consider cite:
@inproceedings{bie2025sdcodec,
author={Bie, Xiaoyu and Liu, Xubo and Richard, Ga{\"e}l},
title={Learning Source Disentanglement in Neural Audio Codec},
booktitle={IEEE International Conference on Acoustic, Speech and Signal Procssing (ICASSP)},
year={2025},
}
Some of the code in this project is inspired or modifed from the following projects: