Marcus Chen, Marcelo Queiroz, Sylvia Yang, Wei Wang
This repo contains the code we developed for our final work of the W251 course.
The goal here were to train a deep learning model to do lipreading of a digit dataset [zero,...,nine] without the audio or any other context.
In the Articles and References directory you will find the articles we used to base this work.
In the Downloader directory you will find the scripts we developed to download and prepare the dataset to use in LipNet model
The LipNet directory was forked from the original code so we could update Tensorflow and Python code to the use in our training machines.
In Data_examples some files of the used training corpus after processed are available.
In docker we stored code to build the container for our still in progress attempt to implement the final model into the edge device NVIDIA Jetson TX2.
In images there are some resources used in for documentation here.
Navigate the directories for more detailed explanations and tutorials.
Thanks for accesing this code, and feell free to reach any of us out for questions and suggestions.
Special thanks to:
- Muhammad Rizki for the work on the LipNet model, our main resource here.
- the Multimedia Systems Department of Gdansk University of Technology who publically provided the MODALITY Corpus