Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

Closed
michaelzhiluo opened this issue Feb 20, 2022 · 0 comments · Fixed by #396
Closed

[Sky Launch] Support Slurm Syntax for GPUs in YAML #386

michaelzhiluo opened this issue Feb 20, 2022 · 0 comments · Fixed by #396

Comments

@michaelzhiluo
Copy link
Collaborator

michaelzhiluo commented Feb 20, 2022

In Slurm, users can specify their GPUs for their batch jobs:

#SBATCH --account=def-someuser
#SBATCH --gres=gpu:1              # Number of GPUs (per node)
#SBATCH --mem=4000M               # memory (per node)
#SBATCH --time=0-03:00            # time (DD-HH:MM)
./program                         # you can use 'nvidia-smi' for a test

For Sky, the Slurm syntax is equivalent to V100:8. While this feature is available for the CLI (e.g. sky exec mycluster --gpus V100:1 -d -- python train.py --lr 1e-3), the YAML does not support Slurm syntax, in particular, the resources/accelerators field. This example should work below once SLURM syntax is implemented,

resources:
  cloud: aws
  accelerators: 'V100:8'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant