Skip to content

multimodal autoencoders(MAE) is used to predict different clinical status of breast cancer patients based on multiplatformic genomics data. The MAE is trained with genomics data such as DNA methylation, gene expression, miRNA expressionfrom, and clinical outcomes from The Cancer Genome Atlas(TCGA).

License

Notifications You must be signed in to change notification settings

samisayed0611/prediction-of-breast-cancer-using-multimodal-autoencoders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal autoencoders for subtypes and survival prediction of breast cancer

Implementation of our paper titled "Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data" submitted to IEEE Access journal, August 2019. In this implementation, a multimodal autoencoders(MAE) is used to predict different clinical status of breast cancer patients based on multiplatformic genomics data. The MAE is trained with genomics data such as DNA methylation, gene expression, miRNA expressionfrom, and clinical outcomes from The Cancer Genome Atlas(TCGA).

Predicted clinical status

  1. Breast cancer subtypes which is determined by the estrogen receptor (ER), progesterone receptor (PGR), and HER2/neu status
  2. Survival rate (0-1, with 1 being the best chance of survival).

Requirements

  • Python 3
  • TensorFlow
  • Keras.

About the the dataset used

DATASET_IDX Data Types Data size(GB)
1 DNA Methylation 148
2 Gene Expression 9
3 miRNA Expression 0.24
4 Gene Expression + miRNA Expression 10
5 DNA Methylation + Gene Expression + miRNA Expression 162

Training the neural networks

  • python3 main_run.py <options>, with the below supported options:
Option Values Details Required
-p PLATFORM
--platform PLATFORM
int [1-2] [1] Tensorflow, [2] Theano yes
-t TYPE
--type TYPE
int [1-2] [1] Breast cancer type classification
[2] Survival rate regression
yes
-d DATASET
--dataset DATASET
int [1-15] [1] DNA Methylation GPL8490
[2] DNA Methylation GPL16304
[3] Gene Expression Count
[4] Gene Expression FPKM
[5] Gene Expression FPKM-UQ
[6] miRNA Expression
[7] Gene Expression Count + miRNA Expression
[8] Gene Expression FPKM + miRNA Expression
[9] Gene Expression FPKM-UQ + miRNA Expression
[10] DNA Met GPL8490 + Gene Count + miRNA
[11] DNA Met GPL16304 + Gene Count + miRNA
[12] DNA Met GPL8490 + Gene FPKM + miRNA
[13] DNA Met GPL16304 + Gene FPKM + miRNA
[14] DNA Met GPL8490 + Gene FPKM-UQ + miRNA
[15] DNA Met GPL16304 + Gene FPKM-UQ + miRNA
yes
--pretrain_epoch PRE_EPOCH int Pre-training epoch. Default = 100 no
--train_epoch TRAIN_EPOCH int Training epoch. Default = 100 no
--batch BATCH int Batch size for pre-training and training. Default = 10 no
--pre_lr PRE_LR int Pre-training learning rate. Default = 0.01 no
--train_lr TRAIN_LR int Training learning rate. Default = 0.1 no
--dropout DROPOUT int Dropout rate. Default = 0.2 no
--pca PCA int [1-2] [1] Use PCA
[2] Don't use PCA
Default = [2] Don't use
no
--optimizer OPTIMIZER int [1-3] [1] Stochastic gradient descent
[2] RMSProp
[3] Adam
Default = [1] Stochastic gradient descent
no

Example

If we want to perform breast cancer subtype classification based on the dime sion reduced DNA methylation dataset using PCA on TensorFlow platform, one can issue the following command from the terminal: python3 main_run.py --platform 1 --type 1 --dataset 1 --batch 10 --pretrain_epoch 5 --train_epoch 5 --pca 1 --optimizer 3

In the preceding command, we define: -- 10 as the batch size -- 5 as the number of pretraining epoch -- 5 is the fine tuning epoch -- 3 is the idx for the Adam optimizer.

About

multimodal autoencoders(MAE) is used to predict different clinical status of breast cancer patients based on multiplatformic genomics data. The MAE is trained with genomics data such as DNA methylation, gene expression, miRNA expressionfrom, and clinical outcomes from The Cancer Genome Atlas(TCGA).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages