Captcha Classification

This project was built for the course - "Introduction to Image Analysis" (1MD110) at Uppsala University

The objective is to accurately solve noisy CAPTCHA images (distorted images containing letters and digits used in cyber-security). In this task, each CAPTCHA image is extremely noisy and consists of 3 digits in very erratic orientations as well as several stray marks.

Input Examples

Pre-Processing Pipeline

Result of Pre-Processing (Example):

Feature Selection

The set of features used to train the model are as follows:

Circularity
Area
Centroid
Orientation
Solidity

General Flow

Each training image is split into 3 distinct props (digits) and the above mentioned features are extracted for each prop. Following is the result of splitting into 3 props:

Each prop returns a 1 x 6 feature vector

Each image returns a 3 x 1 x 6 feature vector (each dimension corresponds to each digit)

Training and Evaluation

Training images - 1100

Validation images - 100

3 digits are extracted from each image which corresponds to 3300 training samples

3 models were trained and the results are reported below:

KNN (k=3)
Linear SVM
Decision Trees with Adaptive Boosting (maxSplits=30)

Results

Best results were obtained by using Decision Trees with Adaptive Boosting (maxSplits=30) with the following metrics:

A training accuracy of ~97% was obtained
Validation accuracy of ~82% was obtained (better evaluation can be performed using cross-validation)
Accuracy of ~61% was obtained on a Hidden Test Set

Future work

Splitting of Digits can be optimized for overlapping digits by conducting repeated (and controlled) Erosion followed by Dilation to break connected components
Resize image to the same size before feature extraction for consistency (or flatten the image itself)
Train a CNN architecture to improve accuracy and performance
Perform cross-validation for better evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
FeatureExtraction.m		FeatureExtraction.m
Mdl.mat		Mdl.mat
README.md		README.md
evaluate_classifier.m		evaluate_classifier.m
labels.txt		labels.txt
myclassifier.m		myclassifier.m
training.m		training.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Captcha Classification

Input Examples

Pre-Processing Pipeline

Feature Selection

General Flow

Training and Evaluation

Results

Future work

About

Releases

Packages

Languages

ananyaroy1011/Captcha-Classification

Folders and files

Latest commit

History

Repository files navigation

Captcha Classification

Input Examples

Pre-Processing Pipeline

Feature Selection

General Flow

Training and Evaluation

Results

Future work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages