Skip to content

Intro to ML for neurogenomics

Nathan Skene edited this page Nov 28, 2023 · 14 revisions

General machine learning skills

Coding: Python skills and Deep Learning frameworks (Pytorch and possibly Tensorflow/Keras). Know how to create and efficiently use data loaders and model architecture.

How to efficiently use GPUs for model training - Linux programming skills, distributed training with multiplr GPUs & physical data location in relation to the GPUs

Model training monitoring: weights and biases

Machine learning for genomics

How to propose the problem and correctly create a training, validation and test set to show appropriate performance and avoid data leakage. Jacob Schreiber has a good paper on this: Navigating the pitfalls of applying machine learning in genomics

Access to GPUs

We have access to three main sets of GPUs:

  • Our private cloud

Our recommended approach is: Test changes and small runs on the private cluster GPU so that you get instant results If multiple GPUs are needed swap to submitting the 'full' job on the HPC. Caveat to this is that the HPC has a pretty short time limit (72 hours) so if you need to train for longer, use the GPU on the private cluster

Clone this wiki locally