awesome-speaker-diarization

Content

Publications
Datasets
- Audio-Visual Datasets
Tutorials
Books
Acknowledgement

Publications

Reviews

Audio-only Speaker Diarization

2021

Online End-To-End Neural Diarization with Speaker-Tracing Buffer

2020

2019

2018

Fully Supervised Speaker Diarization
Neural speech turn segmentation and affinity propagation for speaker diarization
Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
Joint Speaker Diarization and Recognition Using Convolutional and Recurrent Neural Networks

2017

Multimodal Speaker Diarization

2020

Self-supervised learning for audio-visual speaker diarization (Ding Y, Xu Y, Zhang S X, et al. ICASSP, 2020) An audio-visual methods for speaker diarization, based one contrast learning.

2019

Joint Speech Recognition and Speaker Diarization via Sequence Transduction (Shafey L E, Soltau H, Shafran I) Based on RNN-T structure, combine text content and voice point information.
Look Who's Not Talking (Kwon Y, Heo H S, Huh J, et al. (VGG)) Since the speaker embeddings are able to discriminate one person's speech from another, it might be able to discriminate speech from non-speech.

2018

Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion

Other Audio-Visual Related Work

2020

Self-Supervised Learning of Audio-Visual Objects from Video (Afouras, Triantafyllos and Owens) Leverage cross modal attention to contrastive learning.
Multiple Sound Sources Localization from Coarse to Fine (Qian R, Di Hu H D, Wu M, et al.) Leverage CAM to sound source localization.

2019

Dual Attention Matching for Audio-Visual Event Localization (Wu Y, Zhu L, Yan Y, et al. IEEE, 2019) Combine local feature and global feature to estimate relevant frames.

2018

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Audio-Visual Event Localization in Unconstrained Videos (Tian Y, Shi J, Li B, et al. ECCV, 2018) Cross modal attention for localization. Aligned frames' features are closer.
Learning to Localize Sound Sources in Visual Scenes (Senocak A, Oh T H, Kim J, et al. IEEE, 2018) Cross modal attention and contrastive learning.

2017

VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency
Visual speech enhancement (Gabbay, Aviv and Shamir) Combine video frames and mix audio to generate clean audio.

Datasets

Audio-Visual Datasets

Spot the conversation: speaker diarisation in the wild (Chung J S, Huh J, Nagrani A, et al.(VGG)) A free speaker diarization dataset.(Large dataset with overlapping speeches and background noise)
VoxConverse VoxConverse is an audio-visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos.

Tutorials

pyannote audio: neural building blocks for speaker diarization by Hervé Bredin
Google's Diarization System: Speaker Diarization with LSTM by Google
Fully Supervised Speaker Diarization: Say Goodbye to clustering by Google
Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research
【机器之心&博文视点】入门声纹技术｜第二讲：声纹分割聚类与其他应用 by Quan Wang

Books

Voice Identity Techniques: From core algorithms to engineering practice (Chinese) by Quan Wang, 2020

Acknowledgement

Quan Wang's repo inspires us a lot. Many thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-speaker-diarization

Content

Publications

Reviews

Audio-only Speaker Diarization

2021

2020

2019

2018

2017

Multimodal Speaker Diarization

2020

2019

2018

Other Audio-Visual Related Work

2020

2019

2018

2017

Datasets

Audio-Visual Datasets

Tutorials

Books

Acknowledgement

About

Releases

Packages

Contributors 2

xyxCalvin/awesome-speaker-diarization

Folders and files

Latest commit

History

Repository files navigation

awesome-speaker-diarization

Content

Publications

Reviews

Audio-only Speaker Diarization

2021

2020

2019

2018

2017

Multimodal Speaker Diarization

2020

2019

2018

Other Audio-Visual Related Work

2020

2019

2018

2017

Datasets

Audio-Visual Datasets

Tutorials

Books

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages