-
Notifications
You must be signed in to change notification settings - Fork 0
Intro to ML for neurogenomics
Coding: Python skills and Deep Learning frameworks (Pytorch and possibly Tensorflow/Keras). Know how to create and efficiently use data loaders and model architecture.
How to efficiently use GPUs for model training - Linux programming skills, distributed training with multiplr GPUs & physical data location in relation to the GPUs
Model training monitoring: weights and biases
How to propose the problem and correctly create a training, validation and test set to show appropriate performance and avoid data leakage. Jacob Schreiber has a good paper on this: Navigating the pitfalls of applying machine learning in genomics
We have access to three main sets of GPUs:
- Our private cloud
Our recommended approach is: Test changes and small runs on the private cluster GPU so that you get instant results If multiple GPUs are needed swap to submitting the 'full' job on the HPC. Caveat to this is that the HPC has a pretty short time limit (72 hours) so if you need to train for longer, use the GPU on the private cluster
- Home
- Useful Info
- To do list for new starters
- Recommended Reading
-
Computing
- Our Private Cloud System
- Cloud Computing
- Docker
- Creating a Bioconductor package
- PBS example scripts for the Imperial HPC
- HPC Issues list
- Nextflow
- Analysing TIP-seq data with the nf-core/cutandrun pipeline
- Shared tools on Imperial HPC
- VSCode
- Working with Google Cloud Platform
- Retrieving raw sequence data from the SRA
- Submitting read data to the European Nucleotide Archive
- R markdown
- Lab software
- Genetics
- Reproducibility
- The Lab Website
- Experimental
- Lab resources
- Administrative stuff