Optimizer is a seven-segment digits OCR class project carried out by Alex, Priscille, Charlotte and Sacha.
The aim of the project is to digitize the monitoring of mines activities. We focused on the gas and lubricant consumption of vehicles within the mines. The idea is to build computer vision model that would enable operators to take a picture of the gas pump with their smartphones, and automatically log the value of the gas transaction. We were given ~850 pictures (of varying quality) of the gas pump with their associated values.We tried 2 different approaches:
-
The "digit-per-digit" approach
- Image processing: identify the screen, crop the picture, grayscale, thresholding, localize digits and crop them.
- Learning phase: learn a "MNIST" model that predicts each digit individually.
- Inference phase: pass each cropped digit to the "MNIST" model, and append the results.
-
The "end-to-end" approach
- Image processing: identify the screen, crop the picture, grayscale and thresholding.
- Learning phase: learn a model that predicts all digits at once. We based our model on the "Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks" paper by Goodfellow & al. The idea is to build a ConvNet that simultaneously learns (i) the digits and (ii) where to look for them.
TBC...
#1 Preprocessing python frame_extractor.py
#2 Preprocessing python digits_cut.py
#3 Model python main.py