Standard ResNet training on image classification benchmarks. Modified from the original tensorflow version.
Custom paths first in setup.sh
(data folder, model save folder, etc.).
git clone --recursive git://github.com/renmengye/resnet.git
cd resnet
./setup.sh
./run_cifar_exp.py --dataset cifar-10 --model resnet-32
# Run training.
./run_imagenet_exp.py --model resnet-50
# Evaluate a trained model. Launch this on a separate GPU.
./run_imagenet_eval.py --id [EXPERIMENT ID]
SSH into the slurm manager node first, and then launch jobs there.
# Launch a recurring training job, 30K steps per job, for total 600K steps.
./run_imagenet_exp_sched.py --model resnet-50 --max_num_steps 30000 --max_max_steps 600000
# Launch a recurring evaluation job every 2 hours.
./run_imagenet_eval_sched.py --id [EXPERIMENT ID] --min_interval 7200
See resnet/configs/cifar_exp_config.py
and resnet/configs/imagenet_exp_config.py