-
Notifications
You must be signed in to change notification settings - Fork 0
PBS example scripts for the Imperial HPC
Brian's PBS scripts for running the rare disease cell typing jobs in parallel are here: https://github.com/neurogenomics/rare_disease_celltyping/tree/master/pbs
There are four types of pbs scripts that one can use on the Imperial HPC. To submit a job to the cluster use the following:
qsub path/to/pbs_script.pbs
Note that the fodler you're in when you submit the job is important as this will be the working directory for the job. I like to structure this as follows:
project_folder/pbs/pbs_script.pbs
and then submit the job from project_folder
The four types of jobs are single and parallel, CPU and GPU jobs. Note that these can all be submitted to different queues so it's worth inspecting the queues to get more info on which is most suitable.
These are the most common job submissions you will use. Think of the pbs
script as a normal bash script with all the commands necessary to run the job you want. For example, you can tell it to activate a conda environment and run a python script. The only difference to a normal bash script is that you need these commands at the top of the script:
#PBS -l walltime=72:00:00
#PBS -l select=1:ncpus=40:mem=96gb
module load anaconda3/personal
source activate enformer
cd $PBS_O_WORKDIR
The first two lines gives the upper limit of run time for the job, the second line specifies the RAM and CPUs of the machine you want. The next two lines load your personal HPC conda and then activate an environment (change this as necessary). The final line moves you to the working directory where you submitted the job.
See cpu_single.pbs for an example of this type of pbs script.
The HPC cluster at imperial has multiple GPUs (A30's and A100's - although the A100's aren't for general release at the time of writing this but you can still get access as a beta tester). See gpu_single.pbs for an example of this type of pbs script.
Parallel jobs allow you to submit to multiple virtual machines in the cluster (max of 50) at one time. You control the different variables that go to each machine with a separate text file. The pbs commands at the top of your script will be slightly different but that's all that changes:
#PBS -l walltime=72:00:00
#PBS -l select=1:ncpus=40:mem=96gb
#PBS -J 1-20
The new line (third above) specifies how many machines you want to use, here we say 20. See cpu_parallel.pbs and parallel.txt for an example of this type of pbs script. Note that each line of parallel.txt contains the variable(s) that go to each job. These are taken from that txt file in the pbs script with the lines:
#read line from txt file
run=$(head -$PBS_ARRAY_INDEX pbs/parallel.txt | tail -1)
#split line into two variables by '-'
IFS=- read var1_i var2_i <<< $run
Note that parallel.txt has to be separated by -
for this approach to work. This allows you to use spaces in the variable values. You can now use these variables var1_i
and var2_i
as a normal bash variable.
Note that if you can get access to the med-bio queue on the HPC cluster you will get a lot more machines for a lot longer a timeframe with very little wait time (~150 machines at once, 128gb RAM CPUs). This is where I was able to run a lot of predictions with Enformer Celltyping. The pbs script would look like:
#PBS -l walltime=168:00:00
#PBS -l select=1:ncpus=40:mem=128gb -q med-bio
#PBS -J 1-150
Similar to CPUs, we can submit multiple GPU jobs. Note that there are 16, A30 GPUs in the HPC cluster so this is the max you will be able to get access to at once. Also a lot of people use this queue (this is why I opted for the CPU med-bio queue for Enformer Celltyping predictions). See gpu_parallel.pbs and parallel.txt for an example of this type of pbs script.
- Home
- Useful Info
- To do list for new starters
- Recommended Reading
-
Computing
- Our Private Cloud System
- Cloud Computing
- Docker
- Creating a Bioconductor package
- PBS example scripts for the Imperial HPC
- HPC Issues list
- Nextflow
- Analysing TIP-seq data with the nf-core/cutandrun pipeline
- Shared tools on Imperial HPC
- VSCode
- Working with Google Cloud Platform
- Retrieving raw sequence data from the SRA
- Submitting read data to the European Nucleotide Archive
- R markdown
- Lab software
- Genetics
- Reproducibility
- The Lab Website
- Experimental
- Lab resources
- Administrative stuff