In this project, I will use TRODO dataset to use different classification algorithms to categorize analog and digital odometers. Moreover, I will use object detection algorithms to shrink regions of interest to extract odometer values with an Optical Character Recognition (OCR) algorithm.
The dataset consists of 2389 raw images with proper ground truth labels. the dimensions of the images have a distribution of different sizes. The most common sizes are 768x1024, 576x1024 and 1200x1600 with a frequency of 468, 389 and 317 respectively. Since machine learning models require fixed-size input, these different-sized images will be resized to the same shape as a pre- processing step.
The number of analogs is 759 while the number of digitals is 1391. Thus, digital values are more dominant when training models.
Before training my model, I applied some preprocessing steps. I resized the images to 256x256 because both machine learning models require fixed-size inputs and a larger number of parameters makes it difficult for the model to run fast. Then, I normalised the value of the images by dividing by 255, which is the largest value of a pixel. Also, the images are 3D arrays but K-NN, decision trees and fully connected neural networks require 1D vector inputs to be trained. Therefore, I flattened these arrays, except CNNs, by multiplying the dimensions, 256x256x3 = 196608 features, to feed the models. Since machine learning models do not know how to handle string values such as analog and digital, I converted these strings to 0 and 1 as labels, respectively. Finally, I split 90% (2150) and 10% (239) of my dataset into training and test sets.
Object detection requires more effort because the algorithms have special data format so I needed to convert these values. Furthermore, I split 80% (1911), 10% (239) and 10% (239) of my dataset into training, validation and test sets to fine-tune hyperparameters and monitor the performance of the model. As a object detection algorithm, I preferred to use the YOLOv8 algorithm to obtain the best results.
An OCR software extracts and reuses data from scanned documents, camera images and image-only PDFs. OCR software selects letters in the image, converts them into words and then converts words into sentences, thus providing access and editing of the original content. In my project, I used this technology to try to extract mileage from bounding boxes in odometer images. The tool I used for milage extraction is EacyOCR , which is one of the most popular, accurate and easy-to-use OCR tools. I adjusted this tool to extract only digits instead of alphanumeric values so the name of weekdays, ’km’ or other characters won’t be taken into account.