Skip to content

Latest commit

 

History

History
133 lines (114 loc) · 11.2 KB

README_ja.md

File metadata and controls

133 lines (114 loc) · 11.2 KB

DNNによる音源分離

DNNによる音源分離(PyTorch実装)

新しい情報

  • v0.7.0
    • モデルの追加(MMDenseLSTMX-UMXHRNetSepFormer).
    • 学習済みモデルの追加.

モデル

モデル 参考文献 実装
WaveNet WaveNet: A Generative Model for Raw Audio
Wave-U-Net Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation
Deep clustering Single-Channel Multi-Speaker Separation using Deep Clustering
Chimera++ Alternative Objective Functions for Deep Clustering
DANet Deep Attractor Network for Single-microphone Apeaker Aeparation
ADANet Speaker-independent Speech Separation with Deep Attractor Network
TasNet TasNet: Time-domain Audio Separation Network for Real-time, Single-channel Speech Separation
Conv-TasNet Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
DPRNN-TasNet Dual-path RNN: Efficient Long Sequence Modeling for Time-domain Single-channel Speech Separation
Gated DPRNN-TasNet Voice Separation with an Unknown Number of Multiple Speakers
FurcaNet FurcaNet: An End-to-End Deep Gated Convolutional, Long Short-term Memory, Deep Neural Networks for Single Channel Speech Separation
FurcaNeXt FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
DeepCASA Divide and Conquer: A Deep Casa Approach to Talker-independent Monaural Speaker Separation
Conditioned-U-Net Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for multiple source separations
MMDenseNet Multi-scale Multi-band DenseNets for Audio Source Separation
MMDenseLSTM MMDenseLSTM: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation
Open-Unmix (UMX) Open-Unmix - A Reference Implementation for Music Source Separation
Wavesplit Wavesplit: End-to-End Speech Separation by Speaker Clustering
Dual-Path Transformer Network (DPTNet) Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
CrossNet-Open-Unmix (X-UMX) All for One and One for All: Improving Music Separation by Bridging Networks
D3Net D3Net: Densely connected multidilated DenseNet for music source separation
LaSAFT LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation
SepFormer Attention is All You Need in Speech Separation
GALR Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Reccurent networks
HRNet Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation
MRX The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

モジュール

モジュール 参考文献 実装
Depthwise-separable convolution
Gated Linear Units (GLU)
Feature-wise Linear Modulation (FiLM) FiLM: Visual Reasoning with a General Conditioning Layer
Point-wise Convolutional Modulation (PoCM) LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

学習に関する手法

手法 参考文献 実装
Pemutation invariant training (PIT) Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
One-and-rest PIT Recursive Speech Separation for Unknown Number of Speakers
Probabilistic PIT Probabilistic Permutation Invariant Training for Speech Separation
Sinkhorn PIT Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm
Combination Loss All for One and One for All: Improving Music Separation by Bridging Networks

実行例

Open In Colab

Conv-TasNetによるLibriSpeechデータセットを用いた音源分離の例

<REPOSITORY_ROOT>/egs/tutorials/で他のチュートリアルも確認可能.

0. データセットの準備

cd <REPOSITORY_ROOT>/egs/tutorials/common/
. ./prepare_librispeech.sh --dataset_root <DATASET_DIR> --n_sources <#SPEAKERS>

1. 学習

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./train.sh --exp_dir <OUTPUT_DIR>

学習を途中から再開したい場合,

. ./train.sh --exp_dir <OUTPUT_DIR> --continue_from <MODEL_PATH>

2. 評価

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./test.sh --exp_dir <OUTPUT_DIR>

3. デモンストレーション

cd <REPOSITORY_ROOT>/egs/tutorials/conv-tasnet/
. ./demo.sh

事前学習済みモデル

事前学習済みモデルを次のようにダウンロードすることができます.

from models.conv_tasnet import ConvTasNet

model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100, target='vocals')
モデル データセット ダウンロードの例
LSTM-TasNet WSJ0-2mix model = LSTMTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2)
Conv-TasNet WSJ0-2mix model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2)
Conv-TasNet WSJ0-3mix model = ConvTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3)
Conv-TasNet MUSDB18 model = ConvTasNet.build_from_pretrained(task="musdb18", sample_rate=44100)
Conv-TasNet WHAM model = ConvTasNet.build_from_pretrained(task="wham/separate-noisy", sample_rate=8000)
Conv-TasNet WHAM model = ConvTasNet.build_from_pretrained(task="wham/enhance-single", sample_rate=8000)
Conv-TasNet WHAM model = ConvTasNet.build_from_pretrained(task="wham/enhance-both", sample_rate=8000)
Conv-TasNet LibriSpeech model = ConvTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2)
DPRNN-TasNet WSJ0-2mix model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2)
DPRNN-TasNet WSJ0-3mix model = DPRNNTasNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=3)
DPRNN-TasNet LibriSpeech model = DPRNNTasNet.build_from_pretrained(task="librispeech", sample_rate=16000, n_sources=2)
MMDenseLSTM MUSDB18 model = MMDenseLSTM.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals")
Open-Unmix MUSDB18 model = OpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals")
Open-Unmix MUSDB18-HQ model = OpenUnmix.build_from_pretrained(task="musdb18hq", sample_rate=44100, target="vocals")
DPTNet WSJ0-2mix model = DPTNet.build_from_pretrained(task="wsj0-mix", sample_rate=8000, n_sources=2)
CrossNet-Open-Unmix MUSDB18 model = CrossNetOpenUnmix.build_from_pretrained(task="musdb18", sample_rate=44100)
D3Net MUSDB18 model = D3Net.build_from_pretrained(task="musdb18", sample_rate=44100, target="vocals")

egs/tutorials/hub/pretrained.ipynbを見るか, Open In Colabにとんでください.

事前学習済みモデルによる話者分離の例

egs/tutorials/hub/speech-separation.ipynbを見るか, Open In Colabにとんでください.

事前学習済みモデルによる楽音分離の例

egs/tutorials/hub/music-source-separation.ipynbを見るか,Open In Colabにとんでください.

自前の音楽ファイルで分離を試したい場合は,以下を参照してください.

  • MMDenseLSTM: egs/tutorials/mm-dense-lstm/separate_music_ja.ipynbを見るか, Open In Colabにとんでください.
  • Conv-TasNet: egs/tutorials/conv-tasnet/separate_music_ja.ipynbを見るか, Open In Colabにとんでください.
  • UMX: egs/tutorials/umx/separate_music_ja.ipynbを見るか, Open In Colabにとんでください.
  • X-UMX: egs/tutorials/x-umx/separate_music_ja.ipynbを見るか, Open In Colabにとんでください.
  • D3Net: egs/tutorials/d3net/separate_music_ja.ipynbを見るか, Open In Colabにとんでください.