Skip to content

(ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec

Notifications You must be signed in to change notification settings

XiaoyuBIE1994/SDCodec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SD-Codec

Learning Source Disentanglement in Neural Audio Codec, ICASSP 2025

Xiaoyu Bie, Xubo Liu, Gaël Richard

[arXiv], [Project]

The pretrained models can be downloaded via Google Drive

Enviroment Setup

All our models are trained on 8 A-100 80G GPUs

conda env create -f environment.yml
conda activate gen_audio

The code was tested on Python 3.11.7 and PyTorch 2.1.2

To install VisQol, the simples way is to use Bazelisk, especially if you want to install it on the cluster where you do not have the sudo right. Here is an example

Dataset Preparation

We use the following dataset:

mkdir manifest

# DnR
pyhthon prepare/mani_dnr.py --data-dir PARH_TO_DnR

# DNS Challenge 5
pyhthon prepare/mani_dns_clean.py --data-dir PARH_TO_DNS_CLEAN # or by partition
pyhthon prepare/mani_dns_noise.py --data-dir PARH_TO_DNS_NOISE

# Jamedo
pyhthon prepare/mani_jamendo.py --data-dir PARH_TO_JAMENDO # or by partition

# MUSAN
pyhthon prepare/mani_musan.py --data-dir PARH_TO_MUSAN

# WHAM
pyhthon prepare/mani_wham.py --data-dir PARH_TO_WHAM

Training

# debug on single GPU
accelerate launch --config_file config/acc/fp16_gpus1.yaml main.py --config-name debug +run_config=slurm_debug

# training on 8 GPUs
accelerate launch --config_file config/acc/fp16_gpus8.yaml main.py --config-name default +run_config=slurm_1

Evaluation

By default, we use the last checkpoint for the evaluation

model_dir=PATH_TO_MODEL
nohup python eval_dnr.py --ret-dir ${model_dir} --csv-path ./manifest/val.csv --length 5 > ${model_dir}/val.log  2>&1 &
nohup python eval_dnr.py --ret-dir ${model_dir} --csv-path ./manifest/test.csv --length 10 > ${model_dir}/test.log  2>&1 &

Citation

If you find this project usefule in your resarch, please consider cite:

@inproceedings{bie2025sdcodec,
  author={Bie, Xiaoyu and Liu, Xubo and Richard, Ga{\"e}l},
  title={Learning Source Disentanglement in Neural Audio Codec},
  booktitle={IEEE International Conference on Acoustic, Speech and Signal Procssing (ICASSP)},
  year={2025},
}

Acknowledgments

Some of the code in this project is inspired or modifed from the following projects:

About

(ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages