Skip to content

Compute Canada

John Giorgi edited this page Mar 18, 2020 · 30 revisions

Compute Canada

This page serves as internal documentation for setting up the project on one of Compute Canada's clusters.

Install

The following bash script can be saved to setup.sh and run on a Compute Canada cluster to set the project up. Once the script has completed running, you will have installed the project and all dependencies to $WORK, and the virtual environment you created at $ENV will be active.

Note: For the time being, you will need to comment out the transformers dependency of the setup.py file of AllenNLP before calling pip install --editable .


setup.sh

#!/bin/bash
ENV="$HOME/t2t"
WORK="$SCRATCH/t2t"
mkdir -p $WORK
​
module load python/3.7 cuda/10.1
​
# Create and activate a virtual environment
virtualenv --no-download $ENV
source $ENV/bin/activate
pip install --no-index --upgrade pip
​
# (TEMP) Install Transformers manually
pip install transformers==2.3.0
​
# Install AllenNLP from source
cd $WORK
git clone https://github.com/allenai/allennlp.git
cd allennlp
# *YOU NEED TO VIM SETUP.PY AND COMMENT OUT THE "TRANSFORMERS" DEPENDENCY*
pip install --editable .
cd ../
​
# Install the project
git clone https://github.com/JohnGiorgi/t2t.git
cd t2t
pip install --editable .

Training

Once setup.sh has been run successfully, you can submit train.sh to train the model. Note that all hyperparameters are selected in the JSON file at $CONFIG_FILEPATH, and all output will be saved to $OUTPUT. Tensorboard logs exist at $OUTPUT/log, so you can call tensorboard --log-dir $OUTPUT/log to view them.

Note: because compute nodes are airgapped, you will need to copy $OUTPUT/log to a login node, or your local computer, before calling tensorflow.


train.sh

#!/bin/bash
# Requested resources
#SBATCH --mem=32G
#SBATCH --cpus-per-task=10
#SBATCH --gres=gpu:1
# Wall time and job details
#SBATCH --time=24:00:00
#SBATCH --job-name=t2t-train
# Emails me when job starts, ends or fails
#SBATCH --mail-user=user@example.com
#SBATCH --mail-type=FAIL
# Use this command to run the same job interactively
# salloc --mem=32G --cpus-per-task=10 --gres=gpu:1 --time=3:00:00
​
PROJECT_NAME="t2t"
ENV="$HOME/$PROJECT_NAME"
OUTPUT="$SCRATCH/$PROJECT_NAME"
WORK="$SCRATCH/$PROJECT_NAME/$PROJECT_NAME"# Path to the AllenNLP config
CONFIG_FILEPATH="$WORK/configs/contrastive.jsonnet"
# Directory to save model, vocabulary and training logs
SERIALIZED_DIR="$OUTPUT/tmp"# Load the required modules and activate the environment
module load python/3.7 cuda/10.1
source "$ENV/bin/activate"
cd $WORK# Run the job
allennlp train $CONFIG_FILEPATH \
	--serialization-dir $SERIALIZED_DIR \
	--include-package t2t
Clone this wiki locally