This repository contains scripts to identify the letters of the American Manual Alphabet (AMA).
You can either train & run it locally or head directly to this repositories GitHub Page to see a demonstration using your webcam.
This project was done using Python 3.8. The following packages were used:
- Extracting
- Training
- Running local inference
Two on kaggle published datasets from SigNN Team were used.
The first one only contains images from the alphabet excluding J and Z. The second dataset contains video files of the letters J and Z, because these signs involve movements.
To extract the landmarks, the solution MediaPipe Hands is used. Passing an image to MediaPipe it results a list of hand landmarks.
The figure above shows the resulting hand landmarks (MediaPipe Hands).
This project includes two script to extract landmarks from either image- or video-files. You can set the number of workers, to accelerate the extraction. Every worker processes one letter in the dataset and yields a CSV file.
If the extraction encounters an image or video with a left hand, it mirrors the x-axis of the landmarks, so it behaves like a right hand.
These resulting 26 files (A.csv, B.csv, ..., Z.csv) then can be merged into one single CSV file and used for training a model.
This project includes Jupyter Notebooks to train two different models. Both notebooks take the same extracted dataset CSV file.
- train_catboost.ipynb trains a CatBoostClassifier.
- train_neuralnetwork.ipynb trains a Multilayer perceptron using TensorFlow 2.
The CatBoostClassifier converges quickly and yields great accuracy. However, while developing this project, there was this idea to include a model into a single webpage, ideally with no Python backend. So I decided to train a Multilayer perceptron with TensorFlow. The trained model then can be converted for the TensorFlow.js library and included directly in JavaScript without the need of a Python backend server.
You can run your trained models by either running run_asl_catboost.py or run_asl_neuralnetwork.py.
To demonstrate and play with the trained model you can head to this repositories GitHub Page.
It loads the trained model, and uses the JavaScript capabilities of MediaPipe. The extracted landmarks from your webcam get passed to the Multilayer perceptron and the prediction is displayed on the screen.
The following dependencies are used for the web demo:
- @mediapipe/camera_utils
- @mediapipe/drawing_utils
- @mediapipe/hands
- @tensorflow/tfjs
- splitting
- Bootstrap - as CDN
- Gallaudet TrueType - a beautiful font displaying the letter signs
The modules are compiled using webpack.js., the source files can be found here.