Background · Research Questions · Workflow · Contributors
This repository contains the scripts that were used for the thesis project DNA Methylation-based Prediction of Midlife Dementia risk as part of the Master Systems Biology at Maastricht University.
For the early identification of people with increased dementia risk, LIBRA and CAIDE midlife dementia risk scores have been developed. As DNA methylation may act as the molecular link between lifestyle/environment and the biological processes governing health and disease, DNA methylation data might be utilized as an alternative to predict midlife dementia risk. However, the large dimensionality of DNA methylation data makes parameter optimization and model training often computationally infeasible without prior feature selection.
- Can a robust and computationally feasible feature selection method be established to reduce the dimensionality of the data?
- Can reliable DNA methylation-based models for the prediction of a person’s LIBRA, CAIDE1, and CAIDE2 scores and risk factors be constructed in a general population cohort?
- Does the extension of the dementia risk score models with polygenetic risk scores (PGSs) of dementia risk factors, subtypes, and comorbidities improve predictive power?
- Can the LIBRA, CAIDE1, CAIDE2, and risk factor models be used to predict dementia and cognitive impairment status in dementia-associated cohorts??
- What biological processes are captured by the most important features of the risk factor models?
An overview of the applied methodological workflow is shown in figure below and encompasses of the following steps:
- Pre-processing of phenotype data (PhenotypeProcessing)
- Pre-processing of genomics data (GenotypeProcessing)
- Pre-processing of DNA methylation data (MethylationProcessing)
Evaluation of eight different feature selection methods (FeatureSelection):
- Variance-based feature selection
- β-values
- M-values
- β-values corrected for cell type composition
- M-values corrected for cell type composition
- S-score-based feature selection
- PCA-based feature selection
- Kennard-Stone-like feature selection
- Correlation-based feature selection
Prediction of the dementia risk scores, categories, and factors (MachineLearning).
- Biological interpretation of the best-performing risk factor models (ModelInterpretation)
- Validation of the established models in independent dementia-associated cohorts (ModelValidation).
Jarno Koetsier1*, Rachel Cavill2, and Ehsan Pishva3
1 Faculty of Science and Engineering (FSE), Maastricht University
2 Department of Advanced Computing Sciences (DACS), Faculty of Science and Engineering (FSE), Maastricht University
3 Department of Psychiatry and Neuropsychology, School for Mental Health and Neuroscience (MHeNS), Faculty of Health, Medicine and Life Sciences (FHML), Maastricht University
*Feel free to contact me via email: j.koetsier@student.maastrichtuniversity.nl