This repo contains the project code for the CS5242 (Neural Networks & Deep Learning) course taken @ NUS
The primary objective of this project is to develop a classification system capable of distinguishing between five classes of white blood cells: basophil, eosinophil, lymphocyte, monocyte, and neutrophil.
We have 3 datasets:
- pRCC and Camelyon16 dataset (for pre-training)
- Also contains annotation masks in some cases
- WBC dataset (for actual classification)
- Use 100%,50%,10% & 1% for training (with the pretraining and without the pretraining)
Here are some traits of each dataset:
- The pRCC dataset has no label so some kind of unsupervised learning needs to happen there
- The Camelyon16 dataset and WBC dataset both have masks as well for segmentation
- The Camelyon16 dataset has normal and tumour but not related to WBC from the looks of it.
- On WBC use the segmentation masks to create new augmented data and train a model for classification
- On pRCC train an autoencoder but get the features from the encoder
- On Camelyon16 train yet another model for classification
- For end to end training use pRCC encoder (non trainable) + Camelyon16 (non trainable) + Classifier from WBC (weights)(trainable)
For more details refer to the report.
- ./config: Contains a file where the global constants are defined
- ./data/balancing: Contains code needed for balancing the wbc dataset by adding augmented datapoints into the dataset
- ./data/datasets: Contains the main dataset class needed for running each of the models
- ./data/debug: Contains util code for taking a subset of the dataset size for testing the model training locally
- ./data/maskification: Contains code needed for applying the mask onto the original dataset images to create new dataset points
- ./data/move: Contains a util wrapper class which can move tensors to the cuda/cpu device
- ./details: Contains the description of the problem statement
- ./loss: Contains a custom loss class used when training the pRCC autoencoder
- ./models: Contains the model architectures for each experiment conducted
- ./utils: Contains code needed for plotting the training and testing graphs
- ./experiments/base: Contains the base class for the generic trainer
- ./experiments/classify: Contains the trainer class for classification tasks which subclasses the generic trainer
- ./experiments/cam_classifier: Contains the trainer class for Camelyon 16 classification
- ./experiments/pRCC_autoencoder: Contains the trainer class for pRCC Autoencoder
- ./experiments/wbc_classifier: Contains the trainer class for WBC classification
- ./experiments/wbc_pretrained: Contains the trainer class for WBC classification with pretraining from the pRCC and the Cam16 model
- ./resources: Screenshots taken from the various ipynb notebooks with plots and other metrics
NOTE: All of the jupyter notebooks were run on colab therefore all the classes defined in other directories needed to be explicitly copy-pasted due to issues importing the relevant python files in the google colab environment. If the reader chooses to run this locally feel free to remove the class definitions and just import the relevant classes instead.
Use this dropbox link for downloading the datasets
This google drive link contains both the weights of all models trained along with the preprocessed datasets uploaded as zip files (which can be directly usable in colab)
Since each notebook is a colab notebook it is better to upload the datasets to your google drive and then use google colab to run the model.
The file "How to run in colab.txt" mentions how to setup your datasets & google drive in order to be able to execute the notebooks in google colab.