jobengine

Manage MD simulations across multiple computing clusters

Installation

pip install git+https://github.com/jandom/jobengine.git

Sanity check

$ python -c "import jobengine; print(jobengine)"
<module 'jobengine' from '/private/tmp/test/venv/lib/python3.9/site-packages/jobengine/__init__.py'>

Examples

Submit a job to a remote cluster, given a .tpr file in your current working directory

qsubmit.py --cluster my_cluster --topol topol.tpr --nodes 16
qsubmit.py --cluster my_cluster

Fetch some results from a remote cluster, manually

qfetch.py

Cancel a job, using a default -w/-workdir or a list of workdirs

qcancel.py
qcancel.py -w workdir__older
qcancel.py --workdir workdir1 workdir2 workdir3

Re-submit a job, using either a default -w/workdir argument. Re-submit to a different partition, using a different number of nodes

qsubmit.py

One can override the partition or node number used to inintialize the original job

qsubmit.py --partion new_partition
qsubmit.py --nodes 32

Background

Add to cron tab to re-submit broken jobs (every 30 mins) and fetch results (every 60 mins),

Edit crontable using

crontab -e

And add

@hourly python ~/.local/bin/qserver.py --action fetch --cluster biowulf2
@hourly python ~/.local/bin/qserver.py --cluster biowulf2
*/30 * * * * qserver.py
*/60 * * * * qserver.py --action fetch

Advanced

Architecture

There are two types of thing in the codebase

Cluster entities, representing different HPC environments in use. One cluster can have multiple partitions
Job entities, one job can only belong to a single cluster but can be moved between partitions

Clusters exist in two flavors

"abstract" clusters such as a PBSCluster or a SlurmCluster, one per queueing engine used in a cluster
"concrete" clusters such as Arcus, Biowulf and similar, which are actual existing clusters that you have access to.

Adding your own clusters

TODO

Example ssh config file

# Place inside ~/.ssh/config

Host my_cluster
   HostName my_cluster.university.edu
   User my_username_on_cluster
   # Optional arguments beloww
   # LogLevel QUIET
   # IdentityFile ~/.ssh/id_dsa
   # ProxyCommand ssh my_username_at_gateway@gateway_computer.university.edu exec nc %h %p

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
src/jobengine		src/jobengine
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
cover.jpeg		cover.jpeg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jobengine

Installation

Sanity check

Examples

Background

Advanced

Architecture

Adding your own clusters

Example ssh config file

About

Releases

Packages

Languages

jandom/jobengine

Folders and files

Latest commit

History

Repository files navigation

jobengine

Installation

Sanity check

Examples

Background

Advanced

Architecture

Adding your own clusters

Example ssh config file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages