Skip to content

Latest commit

 

History

History
45 lines (40 loc) · 1.76 KB

README.md

File metadata and controls

45 lines (40 loc) · 1.76 KB

LRW_ID

This repository contains the speaker labeled information of LRW audio-visual dataset, which is the outcome of the following paper:

Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim, Hyunjun Kim, and Yong Man Ro
[Paper]

Data structure

LRW_ID_v1.0.txt contains the speaker ID for each video. The total number of labeled speakers is 17,580 (0 ~ 17,579).
Each line contains speaker ID number and its corresponding video name.

#Example
4243 SERVICES/train/SERVICES_00456

The line represents that the speaker ID of video SERVICES/train/SERVICES_00456.mp4 is 4243.

LRW_ID data splits

In order to build unseen-speaker scenario (train and test speakers are not overlapped), 20 speakers are selected for test and validation (adaptation). The dataset splits can be found in the Splits directory.

Table 1. LRW_ID data splits.


Table 2. 20 speakers in the test set. They are not overlapped with the speakers in the train set.


For more information, please refer to our paper.

Contributing

For more accurate speaker information, we welcome your participation in improving label information.

Citation

If you use the identity labeled LRW dataset, LRW-ID, please cite the paper:

@inproceedings{kim2022speaker,
  title={Speaker-adaptive lip reading with user-dependent padding},
  author={Kim, Minsu and Kim, Hyunjun and Ro, Yong Man},
  booktitle={European Conference on Computer Vision},
  pages={576--593},
  year={2022},
  organization={Springer}
}