Lip-Read-ML-Model

This is a simple machine learning model which can take a video of a person speaking and predict what it was.

Overview

Used tensorflow for building the model
Used keras for data processing and numpy for better array usage
Used sequential model for training and prediction
Used relu as the activation
I have used 3 conv3d, 2 bidirectional lstm layers for traning
Used adam optimizer and ctc loss for training
Used imageio for reading the video and cv2 for getting the frames

Basic Logic

Made two functions load_video and load_alignments for loading the video and the alignments
Video function convert the video frames to grayscale and crops to the mouth portions for lesser training time
The alignments function use the word outputs from the files and store it as tokens later convert them to numbers
Now we use a mappable function to get all the inputs to the function.
we build a sequential model with 3 conv3d layers and 2 bidirectional lstm layers using tensorflow keras layers
We use relu as the activation function and adam as the optimizer
We use ctc loss for training the model
we train the model using the fit function for particular epochs (over 90 epochs for better accuracy)
We use the model to predict the speech of the video

How to run

Download the dataset from the link given below
Extract the dataset and place it in the same folder as the code
Run the code using jupyter notebook or any other IDE
The code will train the model and predict the speech of the video

Code

model layers used in the code

model = Sequential()
# conv3D used for video processing input shape is the shape of each frame and 128 output filters and 3 is 3d kernel size
model.add(Conv3D(128,3,input_shape=(75,46,140,1),padding='same'))
# to get some non linearities
model.add(Activation('relu'))
# takes max values of each frame and condences into 2x2 kernel
model.add(MaxPool3D((1,2,2)))

# 2nd layer with 256 output filters
model.add(Conv3D(256,3,padding='same'))
model.add(Activation('relu'))
model.add(MaxPool3D((1,2,2)))

model.add(Conv3D(75,3,padding='same'))
model.add(Activation('relu'))
model.add(MaxPool3D((1,2,2)))

# flatten the output to feed into dense layer
model.add(TimeDistributed(Flatten()))

# 2 layer LSTM
# return_sequences=True means it will return the output of each time step
# dropout to prevent overfitting
# kernel_initializer='Orthogonal' to prevent vanishing gradient problem
# 128 is the number of hidden units
model.add(Bidirectional(LSTM(128, kernel_initializer='Orthogonal',return_sequences=True)))
model.add(Dropout(.5))

model.add(Bidirectional(LSTM(128, kernel_initializer='Orthogonal',return_sequences=True)))
model.add(Dropout(.5))

# dense layer with softmax activation
# output in the form of one hot encoding of the characters in the vocabulary + 1 for blank character
# using softmax activation to get the probability of each character then take the max probability using argmax
model.add(Dense(char_to_num.vocabulary_size()+1, kernel_initializer='he_normal',activation='softmax'))

basic imports

import gdown
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv3D, MaxPool3D, TimeDistributed, LSTM, Bidirectional
import numpy as np
import imageio
import matplotlib.pyplot as plt
import cv2

DataSet

https://drive.google.com/uc?id=1YlvpDLix3S-U8fd-gqRwPcWXAXm8JwjL

Basic commands

pip install tensorflow
pip install keras
pip install numpy
pip install imageio
pip install matplotlib
pip install cv2

References

Youtube

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
app		app
.gitignore		.gitignore
README.md		README.md
index.ipynb		index.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lip-Read-ML-Model

Overview

Basic Logic

How to run

Code

basic imports

DataSet

Basic commands

References

About

Releases

Packages

Languages

jevil25/Lip-Read-ML-Model

Folders and files

Latest commit

History

Repository files navigation

Lip-Read-ML-Model

Overview

Basic Logic

How to run

Code

basic imports

DataSet

Basic commands

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages