Hematologic-Cell-Identification

This repo contains the project code for the CS5242 (Neural Networks & Deep Learning) course taken @ NUS

Objective

The primary objective of this project is to develop a classification system capable of distinguishing between five classes of white blood cells: basophil, eosinophil, lymphocyte, monocyte, and neutrophil.

We have 3 datasets:

pRCC and Camelyon16 dataset (for pre-training)
- Also contains annotation masks in some cases
WBC dataset (for actual classification)
- Use 100%,50%,10% & 1% for training (with the pretraining and without the pretraining)

Here are some traits of each dataset:

The pRCC dataset has no label so some kind of unsupervised learning needs to happen there
The Camelyon16 dataset and WBC dataset both have masks as well for segmentation
The Camelyon16 dataset has normal and tumour but not related to WBC from the looks of it.

Approach

On WBC use the segmentation masks to create new augmented data and train a model for classification
On pRCC train an autoencoder but get the features from the encoder
On Camelyon16 train yet another model for classification
For end to end training use pRCC encoder (non trainable) + Camelyon16 (non trainable) + Classifier from WBC (weights)(trainable)

For more details refer to the report.

Code organization

./config: Contains a file where the global constants are defined
./data/balancing: Contains code needed for balancing the wbc dataset by adding augmented datapoints into the dataset
./data/datasets: Contains the main dataset class needed for running each of the models
./data/debug: Contains util code for taking a subset of the dataset size for testing the model training locally
./data/maskification: Contains code needed for applying the mask onto the original dataset images to create new dataset points
./data/move: Contains a util wrapper class which can move tensors to the cuda/cpu device
./details: Contains the description of the problem statement
./loss: Contains a custom loss class used when training the pRCC autoencoder
./models: Contains the model architectures for each experiment conducted
./utils: Contains code needed for plotting the training and testing graphs
./experiments/base: Contains the base class for the generic trainer
./experiments/classify: Contains the trainer class for classification tasks which subclasses the generic trainer
./experiments/cam_classifier: Contains the trainer class for Camelyon 16 classification
./experiments/pRCC_autoencoder: Contains the trainer class for pRCC Autoencoder
./experiments/wbc_classifier: Contains the trainer class for WBC classification
./experiments/wbc_pretrained: Contains the trainer class for WBC classification with pretraining from the pRCC and the Cam16 model
./resources: Screenshots taken from the various ipynb notebooks with plots and other metrics

NOTE: All of the jupyter notebooks were run on colab therefore all the classes defined in other directories needed to be explicitly copy-pasted due to issues importing the relevant python files in the google colab environment. If the reader chooses to run this locally feel free to remove the class definitions and just import the relevant classes instead.

Datasets used

Raw Datasets

Use this dropbox link for downloading the datasets

Processed Datasets & Model weights

This google drive link contains both the weights of all models trained along with the preprocessed datasets uploaded as zip files (which can be directly usable in colab)

How to run each experiment

Since each notebook is a colab notebook it is better to upload the datasets to your google drive and then use google colab to run the model.

The file "How to run in colab.txt" mentions how to setup your datasets & google drive in order to be able to execute the notebooks in google colab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hematologic-Cell-Identification

Objective

Approach

Code organization

Datasets used

Raw Datasets

Processed Datasets & Model weights

How to run each experiment

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
config		config
data		data
details		details
experiments		experiments
loss		loss
models		models
resources		resources
utils		utils
CS5242 Project Readme.pdf		CS5242 Project Readme.pdf
CS5242 Project Report.pdf		CS5242 Project Report.pdf
README.md		README.md

ParasharaRamesh/Hematologic-Cell-Identification

Folders and files

Latest commit

History

Repository files navigation

Hematologic-Cell-Identification

Objective

Approach

Code organization

Datasets used

Raw Datasets

Processed Datasets & Model weights

How to run each experiment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages