Transfer Learning X-Ray Classification for Pneumonia Detection

Summary

Imbalanced distribution of classes among train, val, and test sets needed to be redistributed for training and validation to work properly.
Four base models for transer learning CNN frameworks were used to start: VGG-16, InceptionV3, ResNet-50, DenseNet-201
Best base model was DenseNet-201 was used while searching hyperparameters: learning rate, # of hidden units, # of Dense layers, learning rate decay, batch size, and momentum
Final model performs with publication level results:
- AUC = 0.9895
- F1 = 0.9717
- Recall = 0.97055
- Precision = 0.9729

Exploratory Data Analysis

From the notebook transfer-learning-x-ray-eda

Training, Validation, and Test Set Counts

Comparison of training, validation, and test set counts shows that ~89% of the images are in the training set. While only 0.27% of the images are in validation, and ~11% are in the test set. For a dataset of 5856 images, which is quite small for image classification, the percent of images in the training, validation, and test set should be closer to 60%-20%-20%.

Step 1: Randomly redistribute the images to 60:20:20

Class Distribution Among Sets

Comparison of the distribution of Pneumonia and Normal images between the training, validation, and test set shows that there is a significant imbalance between the groups.

The biggest issue is that the validation set is not the same distribution as the test set, which means that a model that performs well on the validation set may not do well on the test set. Through the random redistribution of images in Step1, the image class distribution should be very similar between training, validation, and test sets after that step. While the class imbalance of Pneumonia to Normal is significant, it can be accounted for by applying weights to the training to penalize the model more for wrong of the less-represented class (Normal). This should mitigate the class imbalance issue, since it is not a severe imbalance.

Step 2: Apply weights to the classes for training

Pixel Intensities Between Pneumonia and Normal

It becomes clear from the pixel intensities below that there can be a significant difference based on how bright the X-ray is. Since these are from real X-rays, it can be assumed that the model will have to perform well on dark and bright X-rays. The best scenario is to make sure the distribution of images is very well shuffled and distributed proportionally across all sets of the groups.

Make sure the redistribution of images is random to have a mix of pixel intensities in each set

Choosing a Transfer Model

Compare base models of VGG-16, InceptionV3, ResNet50, and DenseNet201 for performance. Use the best performing model for fine tuning.

Pre-Processing

The X-ray image files were combined and randomly split into the into the train, val, and test sets at a 60:20:20 ratio.

After randomly splitting the images, the distribution of Pneumonia and Normal was extremely similar between all sets of images.

Based on the distribution imbalance, weights were applied to have the underrepresented class (Normal) have more of a penalty for wrong predictions.

Comparing Base Models

For the following models (VGG-16, InceptionV3, ResNet-50, DenseNet-201):

Transfer learning models did not include the final dense layer and the pre-trained layers were frozen. A 2D Global Averaging layer followed by a 30% Dropout layer followed by a Dense layer with one node of sigmoid activation were added to do binary classification of the features created by the pre-trained model.

Chosen hyperparameter values listed below, all others were defaults.

Optimzer: Adam
Loss Function: Binary Crossentropy
Batch Size: 512
Epochs: 50
Early Stopping: 10 patience

The best performance for this base model comparison was from DenseNet-201 with an F1 Score of 0.96 for the holdout test set

Below are the test results for the other base models:

VGG-16

InceptionV3

ResNet-50

HyperParameter Tuning of DenseNet-201

Step 1: Tune Learning Rate
Step 2: Tune Hidden Units and Batch Size
Step 3: Tune Number of Layers
Step 4: Tune Learning Rate Decay
Step 5: Tune Momentum
Step 6: Final random search around best performing hyperparameters

Learning Rate

Tested 10 randomly selected learning rates between 1.0 and 0.0001. The trend of performance was best in the range of 0.010 to 0.017 for the learning rate. The F1 test score was 0.96019 from learning rate 0.010395

Hidden Units and Batch Size

A grid search of both number of hidden units and the batch size was does together.

A single Dense layer with the hidden units followed by a Batch Normalization layer followed by a Dropout of 30% was added between the pre-trained model and the output layer.
5 hidden unit values were tried [4096, 2048, 1024, 512, 256] for the single Dense layer
3 batch sizes of [32, 64, 128] * 8 TPU replicas
Test results improved to F1 score of 0.9699 for hidden units 4096 and mini batch size 32

Number of Hidden Layers

Test out hidden layers by adding one layer at a time with each new layer containing half the number of nodes, with Batch Norm and Dropout at 30%.
Test out hidden layers by starting with two Dense layers of 4096 nodes and adding one layer at time each with half the number of nodes as the previous layer.
Adding more layers did not improve the F1 score. 1 trainable dense layer of 4096 was selected for next steps.

Learning Rate Decay

8 learning rate decay values randomly selected
- The formula for the decay is learning_rate * 0.1 ** (epoch / s) and the s value was randomly selected between 10 and 80.
Test results improved to F1 score 0.9711 using s value 31

Momentum Values

8 momentum values between 0.7 and 0.99
No improvement from trying different momentum values. Best F1 score was 0.9699

Best Results From Hyperparameter Random Tuning

Learning Rate: 0.01120581
Hidden Units: 4358
Batch Size: 32 * 8 TPUs
Learning Rate Decay: s value of 34
Momentum: 0.78

Resulting in F1 Score of 0.9699 which was not beter than learning rate decay results

Best Results From Hyperparameter Tuning

Learning Rate: 0.01120581
Hidden Units: 4096
Batch Size: 32 * 8 TPUs
Learning Rate Decay: s value of 31
Momentum: 0.9 (default)

Fine Tune Based on Individual Results

A final search of hyperparameters Learning Rate, Mometum, and Learning Rate decay was performed. The trainable layer was one Dense layer with 4096 hidden units and 30% dropout. The Batch size was held consistent at 32 * 8 TPUs for 256 batch size.

Learning Rate Search: Random search between 0.1 and 0.01
Momentum Search: Grid search of 0.76, 0.78, 0.80, 0.82, 0.84
Learning Rate Decay Search: Random search between 25 to 40 for the s value

Best Hyper-Parameters

Learning Rate: 0.07943
Hidden Units: 4096
Batch Size: 32 * 8 TPUs
Learning Rate Decay: s value of 29
Momentum: 0.78
Test Statistics:
- AUC = 0.9895
- F1 = 0.9717
- Recall = 0.97055
- Precision = 0.9729

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
images		images
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transfer Learning X-Ray Classification for Pneumonia Detection

Summary

Exploratory Data Analysis

Training, Validation, and Test Set Counts

Class Distribution Among Sets

Pixel Intensities Between Pneumonia and Normal

Choosing a Transfer Model

Pre-Processing

Comparing Base Models

VGG-16

InceptionV3

ResNet-50

HyperParameter Tuning of DenseNet-201

Learning Rate

Hidden Units and Batch Size

Number of Hidden Layers

Learning Rate Decay

Momentum Values

Best Results From Hyperparameter Random Tuning

Best Results From Hyperparameter Tuning

Fine Tune Based on Individual Results

Best Hyper-Parameters

About

Releases

Packages

Languages

dagartga/Transfer_Learning_X-Ray_Classification

Folders and files

Latest commit

History

Repository files navigation

Transfer Learning X-Ray Classification for Pneumonia Detection

Summary

Exploratory Data Analysis

Training, Validation, and Test Set Counts

Class Distribution Among Sets

Pixel Intensities Between Pneumonia and Normal

Choosing a Transfer Model

Pre-Processing

Comparing Base Models

VGG-16

InceptionV3

ResNet-50

HyperParameter Tuning of DenseNet-201

Learning Rate

Hidden Units and Batch Size

Number of Hidden Layers

Learning Rate Decay

Momentum Values

Best Results From Hyperparameter Random Tuning

Best Results From Hyperparameter Tuning

Fine Tune Based on Individual Results

Best Hyper-Parameters

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages