This project was built for the course - "Introduction to Image Analysis" (1MD110) at Uppsala University
The objective is to accurately solve noisy CAPTCHA images (distorted images containing letters and digits used in cyber-security). In this task, each CAPTCHA image is extremely noisy and consists of 3 digits in very erratic orientations as well as several stray marks.
Result of Pre-Processing (Example):
The set of features used to train the model are as follows:
- Circularity
- Area
- Centroid
- Orientation
- Solidity
Each training image is split into 3 distinct props (digits) and the above mentioned features are extracted for each prop. Following is the result of splitting into 3 props:
Each prop returns a 1 x 6
feature vector
Each image returns a 3 x 1 x 6
feature vector (each dimension corresponds to each digit)
Training images - 1100
Validation images - 100
3 digits are extracted from each image which corresponds to 3300 training samples
3 models were trained and the results are reported below:
- KNN (k=3)
- Linear SVM
- Decision Trees with Adaptive Boosting (maxSplits=30)
Best results were obtained by using Decision Trees with Adaptive Boosting (maxSplits=30) with the following metrics:
- A training accuracy of ~97% was obtained
- Validation accuracy of ~82% was obtained (better evaluation can be performed using cross-validation)
- Accuracy of ~61% was obtained on a Hidden Test Set
- Splitting of Digits can be optimized for overlapping digits by conducting repeated (and controlled) Erosion followed by Dilation to break connected components
- Resize image to the same size before feature extraction for consistency (or flatten the image itself)
- Train a CNN architecture to improve accuracy and performance
- Perform cross-validation for better evaluation