Skip to content

Latest commit



132 lines (101 loc) · 5.43 KB

File metadata and controls

132 lines (101 loc) · 5.43 KB



Labeled Ears in the Wild (Inspired by LFW (Labeled Face in the Wild), a public benchmark for face verification). Ears validation problem was succesfuly solved with CNN model:

Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 78, 46, 32)        896       
max_pooling2d (MaxPooling2D) (None, 39, 23, 32)        0         
conv2d_1 (Conv2D)            (None, 37, 21, 64)        18496     
max_pooling2d_1 (MaxPooling2 (None, 18, 10, 64)        0         
conv2d_2 (Conv2D)            (None, 16, 8, 64)         36928     
max_pooling2d_2 (MaxPooling2 (None, 8, 4, 64)          0         
flatten (Flatten)            (None, 2048)              0         
dense (Dense)                (None, 64)                131136    
dense_1 (Dense)              (None, 51)                3315      
Total params: 190,771
Trainable params: 190,771
Non-trainable params: 0


Make sure you have Python 3.8 or later.

$ python3 -V
Python 3.8.1
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt


My current dataset is an agregate of:

The one on which model was trained you can find here. Most training images are in grayscale (one color channel), with size 82 × 116 pixels.

Trainng subset contains 51 classes:

  • 50 classes of defined users
  • 1 class for unknown users

Each class contains images of left and right ears. Normaly there is 6 images of left ear and 1 image of right ear. Total number of images: 2382

Validation subset contains images of another 50 users which should not be verified among with other sample images of ears.



Every image is scaled to size 48 × 80. For maining more data augmentation is aplied, from each input image we create more transformed images:

  • rotated anticlockwise on agnle 0-180
  • rotated clockwise on angle 0-180
  • flipped verticaly
  • flipped horizontaly
  • blured with Gaussian filter
  • changed brightness level

Use for trainig new model. Trained model will be saved in .h5 file.


Use for getting model predictions. There is a checkpoint lew_cnn.h5 of model trained on hold out split with test size 0.33, 20 epochs with batch size 32, the checkpoint stats are:

train validation
acc 0.845 0.905
f1 0.844 0.907

There is also production checkpoint lew_cnn_all.h5 of the model trained on all training samples, 20 epochs with batch size 32, the checkpoint stats are:

train validation
acc 0.949 0.928
f1 0.949 0.929

Prepare dataset

I used nested convention for keeping dataset:

|_ classes
   |_ class1
   |  |_ haar_detect
   |     |_ left_ear
   |     |  |_ img1.png
   |     |  |_ img2.png
   |     |  ...
   |     |_ right_ear
   |        |_ img1.png
   |        |_ img2.png
   |        ...

In "wild" conditions other objects are captured on image among with ear, these objects is noise for our model, it's possible that there is no single ear on an image at all. Also, I wanted to protect the model from malicious attacks, namely make verification robust against objects which seems like ear (mushrooms, shells etc.). Thus as cleaning method I decided to use Haar Cascades. Though Haar Cascades succesfuly find ears on image, sometimes it hard for them to find ear which is captured in not vertical position. Sometimes head is leaned forward or backward. It's seems natural for me, so to improve ear detection we rotate input image clockwise by 1 degree and checking if Haar Cascades can detect any ears on the image. If right or left ear is detected, input image is saved in a grayscale to appropriate folder. I wanted to save cutted ear from the image, but I found that Haar Cascades used to detect only a part of ears.


As an idea of more secure pipeline I wanted to use Haar Cascades for ears as first step of validation. With time I realized that model should deal with incorrect images if it was trained on proper data, also cascades tended to detect only part of the ear, which looks like problem of dealing with high resolution and luck of negative samples. However, Haar Cascades are still powerfull enough for preparing better dataset.