[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

michaelzhiluo · 2022-02-20T07:49:47Z

In Slurm, users can specify their GPUs for their batch jobs:

#SBATCH --account=def-someuser
#SBATCH --gres=gpu:1              # Number of GPUs (per node)
#SBATCH --mem=4000M               # memory (per node)
#SBATCH --time=0-03:00            # time (DD-HH:MM)
./program                         # you can use 'nvidia-smi' for a test

For Sky, the Slurm syntax is equivalent to V100:8. While this feature is available for the CLI (e.g. sky exec mycluster --gpus V100:1 -d -- python train.py --lr 1e-3), the YAML does not support Slurm syntax, in particular, the resources/accelerators field. This example should work below once SLURM syntax is implemented,

resources:
  cloud: aws
  accelerators: 'V100:8'

The text was updated successfully, but these errors were encountered:

michaelzhiluo added the Initial-User-Issue label Feb 20, 2022

concretevitamin mentioned this issue Feb 20, 2022

Support 'name:cnt' accelerators spec in YAML #396

Merged

concretevitamin closed this as completed in #396 Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

michaelzhiluo commented Feb 20, 2022 •

edited

Loading

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

Comments

michaelzhiluo commented Feb 20, 2022 • edited Loading

michaelzhiluo commented Feb 20, 2022 •

edited

Loading