Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
JunzheJosephZhu committed Jan 21, 2023
1 parent 6e82fee commit 7c7cf08
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 10 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ More information in [egs/README.md](./egs).
* [x] [DCCRNet](./asteroid/models/dccrnet.py) ([Hu et al.](https://arxiv.org/abs/2008.00264))
* [x] [DCUNet](./asteroid/models/dcunet.py) ([Choi et al.](https://arxiv.org/abs/1903.03107))
* [x] [CrossNet-Open-Unmix](./asteroid/models/x_umx.py) ([Sawata et al.](https://arxiv.org/abs/2010.04228))
* [x] [Multi-Decoder DPRNN](./asteroid/egs/wsj0-mix-var/Multi-Decoder-DRPNN) ([Zhu et al.](http://www.isle.illinois.edu/speech_web_lg/pubs/2021/zhu2021multi.pdf))
* [x] [Multi-Decoder DPRNN](./egs/wsj0-mix-var/Multi-Decoder-DPRNN) ([Zhu et al.](http://www.isle.illinois.edu/speech_web_lg/pubs/2021/zhu2021multi.pdf))
* [ ] Open-Unmix (coming) ([Stöter et al.](https://sigsep.github.io/open-unmix/))
* [ ] Wavesplit (coming) ([Zeghidour et al.](https://arxiv.org/abs/2002.08933))

Expand Down
3 changes: 3 additions & 0 deletions egs/wsj0-mix-var/Multi-Decoder-DPRNN/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"ros.distro": "noetic"
}
30 changes: 21 additions & 9 deletions egs/wsj0-mix-var/Multi-Decoder-DPRNN/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,28 @@
## This is the official repository for Multi-Decoder DPRNN, published at ICASSP 2021.
Summary: Multi-Decoder DPRNN deals with source separation with variable number of speakers. It has 98.5% accuracy in speaker number classification, which is much higher than all previous SOTA methods. It also has similar SNR as models trained separately on different number of speakers, but its runtime is constant and independent of the number of speakers.
**Summary**: Multi-Decoder DPRNN deals with source separation with variable number of speakers. It has 98.5% accuracy in speaker number classification, which is much higher than all previous SOTA methods. It also has similar SNR as models trained separately on different number of speakers, but **its runtime is constant and independent of the number of speakers.**

**Abstract**: We propose an end-to-end trainable approach to single-channel speech separation with unknown number of speakers, **only training a single model for arbitrary number of speakers**. Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals. Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers. Specifically, we cleared up the issue on how to evaluate the quality when the ground-truth hasmore or less speakers than the ones predicted by the model. We evaluate our approach on the WSJ0-mix datasets, with mixtures up to five speakers. **We demonstrate that our approach outperforms state-of-the-art in counting the number of speakers and remains competitive in quality of reconstructed signals.**

paper arxiv link: https://arxiv.org/abs/2011.12022

## Demo
Project page & example output can be found at: https://junzhejosephzhu.github.io/Multi-Decoder-DPRNN/
## Project Page & Demo
Project page & example output can be found [here](https://junzhejosephzhu.github.io/Multi-Decoder-DPRNN/)

## Getting Started
Install asteroid by running ```pip install -e .``` in asteroid directory
To install the requirements, run ```pip install -r requirements.txt```
To run a pre-trained model on your own .wav mixture files, run ```python eval.py --wav_file {file_name.wav} --use_gpu {1/0} --save_folder {folder_name}```
You can use regular expressions for file names. For example, you can run ```python eval.py --wav_file local/*.wav --use_gpu 0 --save_folder {output}```

To run a pre-trained model on your own .wav mixture files, run ```python eval.py --wav_file {file_name.wav} --use_gpu {1/0}```. The script should automatically download a pre-trained model(link below).

You can use regular expressions for file names. For example, you can run ```python eval.py --wav_file local/*.wav --use_gpu 0 ```

The default output directory will be ./output, but you can override that with ```--output_dir``` option

If you want to download an alternative pre-trained model, you can create a folder, and save the pretrained model in ```{folder_name}/checkpoints/best-model.ckpt```, then run ```python eval.py --wav_file {file_name.wav} --use_gpu {1/0} --exp_dir {folder_name}```

## Train your own model
To train the model, edit the file paths in run.sh and execute ```./run.sh --stage 0``` to generate and train the model
To train the model, edit the file paths in run.sh and execute ```./run.sh --stage 0```, follow the instructions to generate dataset and train the model.

After training the model, execute ```./run.sh --stage 4``` to evaluate the model. Some examples will be saved in exp/tmp_uuid/examples

## Kindly cite this paper
Expand All @@ -29,15 +38,18 @@ After training the model, execute ```./run.sh --stage 4``` to evaluate the model
doi={10.1109/ICASSP39728.2021.9414205}}
```

# Resources
## Resources
Pretrained mini model and config can be found at: https://huggingface.co/JunzheJosephZhu/MultiDecoderDPRNN \

#### This is the refactored version of the code, with some hyperparameter changes. If you want to reproduce the paper results, original experiment code & config can be found at https://github.com/JunzheJosephZhu/MultiDecoder-DPRNN
This is the refactored version of the code, with some hyperparameter changes. If you want to reproduce the paper results, original experiment code & config can be found at https://github.com/JunzheJosephZhu/MultiDecoder-DPRNN

Original Paper Results(Confusion Matrix)
**Original Paper Results**(Confusion Matrix)
2 | 3 | 4 |5
-----|------|------|--
2998 | 17 | 1 |0
2 | 2977 | 27 |0
0 | 6 | 2928 |80
0 | 0 | 44 |2920

## Contact the author
If you have any question, you can reach me at josefzhu@stanford.edu

0 comments on commit 7c7cf08

Please sign in to comment.