Skip to content

Commit

Permalink
use slurm checkpoint dir
Browse files Browse the repository at this point in the history
  • Loading branch information
vzhong committed Oct 31, 2024
1 parent 4197cc8 commit 441e409
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 0 deletions.
1 change: 1 addition & 0 deletions wrangl/conf/wrangl_experiment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ val_check_interval: 100
flush_logs_every_n_steps: '${log_every_n_steps}'
test_only: false
autoresume: false
use_slurm_checkpoint_dout: false # set to True to use /checkpoint/$USER/$SLURM_JOB_ID for storage
ckpt_path: 'latest.ckpt'
val_sample_size: 100

Expand Down
2 changes: 2 additions & 0 deletions wrangl/learn/supervised.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,8 @@ def run_train_test(cls, cfg: OmegaConf, train_dataset: torch.utils.data.Dataset,
model_kwargs = model_kwargs or {}
L.seed_everything(seed=cfg.seed, workers=True)
dout = os.getcwd()
if cfg.use_slurm_checkpoint_dout:
dout = '/checkpoint/{}/{}'.format(os.environ['USER'], os.environ['SLURM_JOB_ID'])

logger = logging.getLogger(name='{}:train_test'.format(cls.__name__))
logger.info('Logging to {}'.format(dout))
Expand Down

0 comments on commit 441e409

Please sign in to comment.