Labeled Ears in the Wild (Inspired by LFW (Labeled Face in the Wild), a public benchmark for face verification).
Ears validation problem was succesfuly solved with CNN
model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 78, 46, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 39, 23, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 37, 21, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 18, 10, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 8, 64) 36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 4, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 2048) 0
_________________________________________________________________
dense (Dense) (None, 64) 131136
_________________________________________________________________
dense_1 (Dense) (None, 51) 3315
=================================================================
Total params: 190,771
Trainable params: 190,771
Non-trainable params: 0
_________________________________________________________________
Make sure you have Python 3.8 or later.
$ python3 -V
Python 3.8.1
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
My current dataset is an agregate of:
- AMI Ear Database
- data from Hitesh-Valecha/Ear_Biometric_System project
- EarVN1.0
- and crawled images from Google
The one on which model was trained you can find here. Most training images are in grayscale (one color channel), with size 82 × 116 pixels.
Trainng subset contains 51 classes:
- 50 classes of defined users
- 1 class for unknown users
Each class contains images of left and right ears. Normaly there is 6 images of left ear and 1 image of right ear. Total number of images: 2382
Validation subset contains images of another 50 users which should not be verified among with other sample images of ears.
Every image is scaled to size 48 × 80. For maining more data augmentation is aplied, from each input image we create more transformed images:
- rotated anticlockwise on agnle 0-180
- rotated clockwise on angle 0-180
- flipped verticaly
- flipped horizontaly
- blured with Gaussian filter
- changed brightness level
Use train.py
for trainig new model.
Trained model will be saved in .h5
file.
Use predict.py
for getting model predictions. There is a checkpoint lew_cnn.h5
of model trained on hold out split with test size 0.33
, 20 epochs with batch size 32, the checkpoint stats are:
train | validation | |
---|---|---|
acc | 0.845 | 0.905 |
f1 | 0.844 | 0.907 |
There is also production checkpoint lew_cnn_all.h5
of the model trained on all training samples, 20 epochs with batch size 32, the checkpoint stats are:
train | validation | |
---|---|---|
acc | 0.949 | 0.928 |
f1 | 0.949 | 0.929 |
I used nested convention for keeping dataset:
dataset_root
|_ classes
|_ class1
| |_ haar_detect
| |_ left_ear
| | |_ img1.png
| | |_ img2.png
| | ...
| |_ right_ear
| |_ img1.png
| |_ img2.png
| ...
...
In "wild" conditions other objects are captured on image among with ear, these objects is noise for our model, it's possible that there is no single ear on an image at all. Also, I wanted to protect the model from malicious attacks, namely make verification robust against objects which seems like ear (mushrooms, shells etc.). Thus as cleaning method I decided to use Haar Cascades. Though Haar Cascades succesfuly find ears on image, sometimes it hard for them to find ear which is captured in not vertical position. Sometimes head is leaned forward or backward. It's seems natural for me, so to improve ear detection we rotate input image clockwise by 1 degree and checking if Haar Cascades can detect any ears on the image. If right or left ear is detected, input image is saved in a grayscale to appropriate folder. I wanted to save cutted ear from the image, but I found that Haar Cascades used to detect only a part of ears.
As an idea of more secure pipeline I wanted to use Haar Cascades for ears as first step of validation. With time I realized that model should deal with incorrect images if it was trained on proper data, also cascades tended to detect only part of the ear, which looks like problem of dealing with high resolution and luck of negative samples. However, Haar Cascades are still powerfull enough for preparing better dataset.