Skip to content

Latest commit

 

History

History
158 lines (112 loc) · 6.63 KB

quickstart.rst

File metadata and controls

158 lines (112 loc) · 6.63 KB

Quickstart

Sky is a tool to run any workload seamlessly across different cloud providers through a unified interface. No knowledge of cloud offerings is required or expected -- you simply define the workload and its resource requirements, and Sky will automatically execute it on AWS, Google Cloud Platform or Microsoft Azure.

Please follow the installation instructions before continuing with this guide.

Key Features

  • Run your code on the cloud with zero code changes
  • Easy provisionioning of VMs across multiple cloud platforms (AWS, Azure or GCP)
  • Easy management of multiple clusters to handle different projects
  • Fast and iterative development with quick access to cloud instances for prototyping
  • Store your datasets on the cloud and access them like you would on a local filesystem
  • No cloud lock-in - transparently run your code across AWS, Google Cloud, and Azure

Provisioning your first cluster

We'll start by launching our first cluster on Sky using an interactive node. Interactive nodes are standalone machines that can be used like any other VM instance, but are easy to configure without any additional setup. Sky also handles provisioning these nodes with your specified resources as cheaply and quickly as possible using an :ref:`auto-failover provisioner <auto-failover>`.

Let's provision an instance with a single K80 GPU.

# Provisions/reuses an interactive node with a single K80 GPU.
# Any of the interactive node commands (gpunode, tpunode, cpunode)
# will automatically log in to the cluster.
sky gpunode -c mygpu --gpus K80

Last login: Wed Feb 23 22:35:47 2022 from 136.152.143.101
ubuntu@ip-172-31-86-108:~$ gpustat
ip-172-31-86-108     Wed Feb 23 22:42:43 2022  450.142.00
[0] Tesla K80        | 31°C,   0 % |     0 / 11441 MB |
ubuntu@ip-172-31-86-108:~$
^D

# View the machine in the cluster table.
sky status

NAME   LAUNCHED        RESOURCES                     COMMAND                          STATUS
mygpu  a few secs ago  1x Azure(Standard_NC6_Promo)  sky gpunode -c mygpu --gpus K80  UP

After you are done, run sky down mygpu to terminate the cluster. Find more details on managing the lifecycle of your cluster :ref:`here <interactive-nodes>`.

Sky can also provision interactive CPU and TPU nodes with cpunode and tpunode. Please see our :ref:`CLI reference <cli>` for all configuration options. For more information on using and managing interactive nodes, check out our :ref:`reference documentation <interactive-nodes>`.

Hello, Sky!

You can also define tasks to be executed by Sky. We'll define our very first task to be a simple hello world program.

We can specify the following task attributes with a YAML file:

  • resources (optional): what cloud resources the task must be run on (e.g., accelerators, instance type, etc.)
  • workdir (optional): specifies working directory containing project code that is synced with the provisioned instance(s).
  • setup (optional): commands that must be run before the task is executed
  • run (optional): specifies the commands that must be run as the actual ask

Note

Sky does not currently support large, multi-gigabyte workdirs (e.g. do not store your large datasets in your working directory) as the files are synced to the remote VM with rsync. Please consider using :ref:`Sky Storage <sky-storage>` to transfer large datasets and files.

Below is a minimal task YAML that prints "hello sky!" and shows installed Conda environments, requiring an NVIDIA Tesla K80 GPU on AWS. See more example yaml files in the repo, with a fully-complete example documented :ref:`here <yaml-spec>`.

# hello_sky.yaml

resources:
  # Optional; if left out, pick from the available clouds.
  cloud: aws

  # Get 1 K80 GPU.  Use <name>:<n> to get more (e.g., "K80:8").
  accelerators: K80

# Working directory (optional) containing the project codebase.
# This directory will be synced to ~/sky_workdir on the provisioned cluster.
workdir: .

# Typical use: pip install -r requirements.txt
setup: |
  echo "running setup"
  # If using a `my_setup.sh` script that requires conda,
  # invoke it as below to ensure `conda activate` works:
  # bash -i my_setup.sh

# Typical use: make use of resources, such as running training.
run: |
  echo "hello sky!"
  conda env list
  # If using a `my_run.sh` script that requires conda,
  # invoke it as below to ensure `conda activate` works:
  # `bash -i my_run.sh`

Sky handles selecting an appropriate VM based on user-specified resource constraints, launching the cluster on an appropriate cloud provider, and executing the task.

To launch a task based on our above YAML spec, we can use sky launch.

$ sky launch -c mycluster hello_sky.yaml

The -c option allows us to specify a cluster name. If a cluster with the same name already exists, Sky will reuse that cluster. If no such cluster exists, a new cluster with that name will be provisioned. If no cluster name is provided, (e.g., sky launch hello_sky.yaml), a cluster name will be autogenerated.

We can view our existing clusters by running sky status:

$ sky status

This may show multiple clusters, if you have created several:

NAME       LAUNCHED     RESOURCES             COMMAND                                 STATUS
gcp        1 day ago    1x GCP(n1-highmem-8)  sky cpunode -c gcp --cloud gcp          STOPPED
mycluster  12 mins ago  1x AWS(p2.xlarge)     sky launch -c mycluster hello_sky.yaml  UP

If you would like to log into the a cluster, Sky provides convenient SSH access via ssh <cluster_name>:

$ ssh mycluster

If you would like to transfer files to and from the cluster, rsync or scp can be used:

$ rsync -Pavz /local/path/source mycluster:/remote/dest  # copy files to remote VM
$ rsync -Pavz mycluster:/remote/source /local/dest       # copy files from remote VM

After you are done, run sky down mycluster to terminate the cluster. Find more details on managing the lifecycle of your cluster :ref:`here <interactive-nodes>`.

Sky is more than a tool for easily provisioning and managing multiple clusters on different clouds. It also comes with features for :ref:`storing and moving data <sky-storage>`, :ref:`queueing multiple jobs <job-queue>`, :ref:`iterative development <iter-dev>`, and :ref:`interactive nodes <interactive-nodes>` for debugging.