Keras overlap chart classifier

Deep learning image classifier written in Keras, used for detection of chimeric signals in genome overlap charts.

Created as a part of the master thesis project at the Faculty of Electrical Engineering & Computing, University of Zagreb. Many thanks to Lovro Vrček, mag. phys. and prof. dr. sc. Mile Šikić for their help & guidance.

To-do

Implement some kind of cross validation to substantiate the results & tweak hyperparameters
Clean up the data (there seem to be some incorrectly marked images in the original set)

Data preprocessing

As the original data is stored in 3 separate folders, sorted by their overlap type (see original_data folder), the first step is to split it into separate training & test data sets. As the idea of the whole network is to find out only if an image represents a chimeric overlap or not, the regular & repeat overlaps are stored in an adjoining folder (labeled non_chimeric), with the chimeric overlaps remaining separate (labeled chimeric). The training/test split percentage is set to 75/25 (75% of data for training, 25% for testing).

It should be noted that the data is randomly re-sorted with each new training run.

Data generators

The sorted data is fed to the model via the Keras ImageDataGenerator class (docs). Besides feeding the data to the model in training, the generators have an additional task of modifying the data by invoking a series of transformations:

Horizontal flip (Generating double the amount of original data by mirroring the image along it's Y axis)
Rescaling (The original images are 8-bit RGB encoded, resulting in too much data, so the color channels are scaled to the [0-1] range)
Scaling (Shrinking the images from the original 750 x 500 px to 224 x 224 px)
Recoloring (Merging the RGB color channels into a single greyscale channel)

The resulting images are now ready to be loaded into the model.

Model summary

The learning model is based on the AlexNet image classification network (Original author's presentation from ImageNet), although it has been a bit trimmed to bring down the number of training parameters (currently at ~63.000.000 trainable params).

If you're looking for a more verbose summary, you can have a look at the model output in the image below:

Training & evaluation

Training is done by fitting the model to the data received from the image generators. The generators create batches of 32 images per epoch. To match the generated batches, there are 32 steps in every epoch of the fitting. The collected metrics are visualized on a plot at the end of the evaluation.

Loss function = Categorical cross-entropy
Optimizer = RMS prop with a learning rate of e^-4
Metrics = Loss value, accuracy

Results after 5 epochs with 50 steps each (5 epochs only due to lack of testing hardware)

Hardware configuration

Apple Macbook Pro (2017):

Intel I7-7567U
16 GB RAM
No dedicated GPU

Training time rounds up to ~ 27s per single epoch step (5 epochs add up to about 2h runtime).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
original_data		original_data
.gitignore		.gitignore
DataGenerator.py		DataGenerator.py
DataSplitter.py		DataSplitter.py
DataTypes.py		DataTypes.py
LICENSE		LICENSE
NetModel.py		NetModel.py
README.md		README.md
Visualizer.py		Visualizer.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keras overlap chart classifier

To-do

Data preprocessing

Data generators

Model summary

Training & evaluation

Results after 5 epochs with 50 steps each (5 epochs only due to lack of testing hardware)

Hardware configuration

About

Releases

Packages

Languages

License

ffloreani/keras-chart-classifier

Folders and files

Latest commit

History

Repository files navigation

Keras overlap chart classifier

To-do

Data preprocessing

Data generators

Model summary

Training & evaluation

Results after 5 epochs with 50 steps each (5 epochs only due to lack of testing hardware)

Hardware configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages