DNNによる音源分離(PyTorch実装)
- v0.7.0
- モデルの追加(
MMDenseLSTM
,X-UMX
,HRNet
,SepFormer
). - 学習済みモデルの追加.
- モデルの追加(
モジュール | 参考文献 | 実装 |
---|---|---|
Depthwise-separable convolution | ✔ | |
Gated Linear Units (GLU) | ✔ | |
Feature-wise Linear Modulation (FiLM) | FiLM: Visual Reasoning with a General Conditioning Layer | ✔ |
Point-wise Convolutional Modulation (PoCM) | LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation | ✔ |
手法 | 参考文献 | 実装 |
---|---|---|
Pemutation invariant training (PIT) | Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks | ✔ |
One-and-rest PIT | Recursive Speech Separation for Unknown Number of Speakers | ✔ |
Probabilistic PIT | Probabilistic Permutation Invariant Training for Speech Separation | |
Sinkhorn PIT | Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm | ✔ |
Combination Loss | All for One and One for All: Improving Music Separation by Bridging Networks | ✔ |
Conv-TasNetによるLibriSpeechデータセットを用いた音源分離の例
<REPOSITORY_ROOT>/egs/tutorials/
で他のチュートリアルも確認可能.
cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare_librispeech.sh --dataset_root <DATASET_DIR> --n_sources <#SPEAKERS>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>
学習を途中から再開したい場合,
. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>
cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh
事前学習済みモデルを次のようにダウンロードすることができます.
from models.conv_tasnet import ConvTasNet
model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100, target='vocals')
モデル | データセット | ダウンロードの例 |
---|---|---|
LSTM-TasNet | WSJ0-2mix | model = LSTMTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
Conv-TasNet | WSJ0-2mix | model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
Conv-TasNet | WSJ0-3mix | model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3) |
Conv-TasNet | MUSDB18 | model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/separate-noisy", sample_rate=8000) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/enhance-single", sample_rate=8000) |
Conv-TasNet | WHAM | model = ConvTasNet.build_from_pretrained(task="wham/enhance-both", sample_rate=8000) |
Conv-TasNet | LibriSpeech | model = ConvTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2) |
DPRNN-TasNet | WSJ0-2mix | model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
DPRNN-TasNet | WSJ0-3mix | model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3) |
DPRNN-TasNet | LibriSpeech | model = DPRNNTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2) |
MMDenseLSTM | MUSDB18 | model = MMDenseLSTM.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |
Open-Unmix | MUSDB18 | model = OpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |
Open-Unmix | MUSDB18-HQ | model = OpenUnmix.build_from_pretrained(task="musdb18hq", sample_rate=44100, target="vocals") |
DPTNet | WSJ0-2mix | model = DPTNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2) |
CrossNet-Open-Unmix | MUSDB18 | model = CrossNetOpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100) |
D3Net | MUSDB18 | model = D3Net.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals") |
egs/tutorials/hub/pretrained.ipynb
を見るか, にとんでください.
egs/tutorials/hub/speech-separation.ipynb
を見るか, にとんでください.
- MMDenseLSTM:
egs/tutorials/mm-dense-lstm/separate_music_ja.ipynb
を見るか, にとんでください. - Conv-TasNet:
egs/tutorials/conv-tasnet/separate_music_ja.ipynb
を見るか, にとんでください. - UMX:
egs/tutorials/umx/separate_music_ja.ipynb
を見るか, にとんでください. - X-UMX:
egs/tutorials/x-umx/separate_music_ja.ipynb
を見るか, にとんでください. - D3Net:
egs/tutorials/d3net/separate_music_ja.ipynb
を見るか, にとんでください.