Associated Repositories:
- dermatology: An image data repository of the original and augmented images.
- augmentation: This repository's package creates augmentations of the original images.
Tools
- reader: In progress. The docker image of this repository will be used to run containers that download, and dearchive if necessary, data sets into a volume [for this repository, the augmentation repository, and any other]
Associated Colab Notebook:
- preliminary: A preliminary assessment of the raw images in dermatology
Note, a link to a Colab interface is upcoming. Colab offers access to GPU machines; the times per epoch are superb, hence prototyping is continuing within Colab.
This repository uses the wonderful continuous integration & delivery tool GitHub Actions. Hence, a variety of tests are conducted continuously. The badges below will continuously highlight the state of each repository branch w.r.t. GitHub Action's actions.
branch | state |
---|---|
develop | |
master | |
codebuild develop |
Sometimes the models are run in an AWS machine via docker images; greyhypotheses @ Docker Hub.
An instance, i.e., container, of the image greyhypotheses/derma:importing
serves dermatoscopic images to the deep
learning model/s; importing will be replaced with reader
# Import greyhypotheses/derma:importing from Docker Hub.
sudo docker pull greyhypotheses/derma:importing
# Running docker package greyhypotheses/derma:importing
sudo docker run -v ~/images:/app/images greyhypotheses/derma:importing
The feature extraction deep learning model
# Import greyhypotheses/derma:FeatureExtractionDL from Docker Hub.
sudo docker pull greyhypotheses/derma:FeatureExtractionDL
# Runs the FeatureExtractionDL model. It requires one string argument; the string
# must be a URL oF A YAML file of hyperparameters, e.g.,
# https://raw.githubusercontent.com/discourses/derma/develop
# /resources/hyperparameters/pattern.yml
sudo docker run -v ~/images:/app/images -v ~/checkpoints:/app/checkpoints
greyhypotheses/derma:FeatureExtractionDL src/main.py $1
- Local operating system: Windows 7
- Cloud test machine: GitHub Actions Ubuntu
Locally, the python environment was created via venv
>> python -m venv env
This virtual environment can be deleted via the command rm -r env
(Cygwin). The environment is activated via
>> env\Scripts\activate.bat
within a Windows operating system; deactivated via the command env\Scripts\deactivate.bat
. The command
>> env\Scripts\pip list
is used to list the set of directly & indirectly installed packages. Always remember to upgrade pip before populating the environment
>> python -m pip install --upgrade pip==21.3.1
The requirements document lists the directly installed packages and their versions; and a few indirectly installed pckages. Thus far, the TensorFlow version used by this package/repository is TensorFlow 2.5.0
>> env\Scripts\pip install --upgrade tensorflow==2.7.0
The TensorFlow installation step installs numpy & requests, and the rest
pip install --upgrade pandas
pip install --upgrade scikit-learn
pip install --upgrade pytest coverage pytest-cov pylint flake8
pip install --upgrade PyYAML
The Python version is can be checked via python --version
. Finally, the requirements document was/is created via
env\Scripts\pip freeze -r docs/filter.txt > requirements.txt
It is edited -> the packages above the line ## The following requirements were added by pip freeze: are the directly installed packages.
Via Dermoscopic Images of Cancerous/Pre-cancerous Skin Lesions
The World Health Organisation lists cancer as the second leading cause of death globally; the 2018 death estimate is 9.6 million. And, early diagnosis or effective assessment is usually critical to effective treatment and survival. One common tool for early diagnosis, cancer precursor investigations, and/or tumour assessment is medical imaging. For example, magnetic resonance imaging for brain tumours, chest radiographs for investigating symptoms suggestive of lung cancer, mammography for breast cancer, etc. A challenge, as the mammography paper illustrates, is accurate interpretation of medical images.
This project is focused on image classification for cancer diagnostics, it is specifically focused on the International Skin Imaging Collaboration’s dermoscopic images of skin lesions. The aim is the
Automatic classification of dermoscopic images according to 9 diagnostic classes: Melanoma, Melanocytic Nevus, Basal Cell Carcinoma, Actinic Keratosis, Benign Keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis), Dermatofibroma, Vascular Lesion, Squamous Cell Carcinoma, Unknown
This project has been chosen as a precursor to applying bayesian deep learning, amongst other bayesian techniques, to diagnostic, prognostic, and pathogenetic challenges in medicine. Uncertainty is an inherent aspect of medical and health diagnostics, but deep learning methods that consider uncertainty are rarely used due to the scalability challenges of such methods. A key example being bayesian deep learning methods.
The first objective of this project is to
-
Apply deep learning, amongst other methods, to the stated problem within an engineering design/prototype that is not constrained by scalability.
-
Investigate and apply interpretability options.
Note: non-bayesian deep convolutional neural networks has been applied to skin cancer images.
As noted above, this project's modelling challenge is focused on the International Skin Imaging Collaboration’s (ISIC's) dermoscopic images of skin lesions. It is specifically using a subset of the images of the ISIC 2019 Challenge, i.e.,
file | description | size |
---|---|---|
ISIC_2019_Training_Input.zip | 25,331 JPEG images of skin lesions | ~9GB |
ISIC_2019_Training_Metadata.csv | 25,331 metadata entries of age, sex, general anatomic site, and common lesion identifier | 1.15MB |
ISIC_2019_Training_GroundTruth.csv | 25,331 entries of gold standard lesion diagnoses | 1.23MB |
To ensure availability these three data files are also stored in a GitHub repository. The images are either the same as those hosted by the ISIC Archive API or down-sampled versions. Future modelling projects might involve re-visiting the original images of the ISIC Archive API. The API is documented at ISIC Archive API Documentation. The data set outlined below might be used if the ground truths are released in time.
- ISIC_2019_Test_Input.zip: 8,238 JPEG images of skin lesions
- ISIC_2019_Test_Metadata.csv: 8,238 metadata entries of age, sex, and general anatomic site
A preliminary analysis of the metadata is hosted in the notebook preliminary.ipynb.
Details: https://challenge2019.isic-archive.com/data.html
The images and metadata of the "ISIC 2019: Training" data used herein are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC). The copyright holders are:
- BCN_20000 Dataset: © Department of Dermatology, Hospital Clínic de Barcelona, https://arxiv.org/abs/1908.02288 4
- HAM10000 Dataset: © ViDIR Group, Department of Dermatology, Medical University of Vienna, https://www.nature.com/articles/sdata2018161 1
- MSK Dataset: © Anonymous; https://arxiv.org/abs/1710.05006, https://arxiv.org/abs/1902.03368 2, 3
References
- P. Tschandl, C. Rosendahl, H. Kittler: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Scietific Data, Volume 5, Article Number: 180161, 2018, doi:10.1038/sdata.2018.161
- Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern: Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC), 2018, arXiv:1710.05006
- Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael A. Marchetti, Harald Kittler, Allan Halpern: Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC), 2019, arXiv:1902.03368
- Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Cristina Carrera, Alicia Barreiro, Allan C. Halpern, Susana Puig, Josep Malvehy: BCN20000: Dermoscopic Lesions in the Wild, 2019, arXiv:1908.02288