Skip to content

Commit

Permalink
working docs
Browse files Browse the repository at this point in the history
  • Loading branch information
romilbhardwaj committed Apr 3, 2022
1 parent 443134c commit c43121e
Showing 1 changed file with 30 additions and 29 deletions.
59 changes: 30 additions & 29 deletions docs/source/reference/storage.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.. _sky-storage:

Sky Storage
=======
===========

A Sky Storage object represents an abstract data store containing large data
files required by the task. Compared to file_mounts, storage is faster and
Expand All @@ -11,13 +12,13 @@ to a backing object store in a particular cloud (S3/GCS/Azure Blob).
A storage object is used by "mounting" it to a task. On mounting, the data
specified in the source becomes available at the destination mount path.

A storage object can used in either MOUNT mode or COPY mode.
A storage object can used in either :code:`MOUNT` mode or :code:`COPY` mode.

* In MOUNT mode, the backing store is directly "mounted" to the remote VM.
* In :code:`MOUNT` mode, the backing store is directly "mounted" to the remote VM.
I.e., files are fetched when accessed by the task and files written to the
mount path are also written to the remote store.

* In COPY mode, the files are pre-fetched and cached on the local disk.
* In :code:`COPY` mode, the files are pre-fetched and cached on the local disk.
Writes are not replicated on the remote store.

.. note::
Expand All @@ -36,43 +37,41 @@ enabling persistence, file_mount sync can be made significantly faster.

Your usage of sky storage can fall under four broad use cases:

1. **You want to upload your local data to remote VM -** specify the name and
1. **You want to upload your local data to remote VM -** specify the name and
source fields. Name sets the bucket name that will be used, and source
specifies the local path to be uploaded.

2. **You want to mount an existing S3/GCS bucket to your remote VM -** specify
2. **You want to mount an existing S3/GCS bucket to your remote VM -** specify
just the source field (e.g., s3://my-bucket/)

3. **You want to have a write-able path to directly write files to S3 buckets
3. **You want to have a write-able path to directly write files to S3 buckets
-** specify a name (to create a bucket if it doesn't exist) and set the mode
to MOUNT. This is useful for writing code outputs, such as checkpoints or
logs directly to a S3 bucket.

4. **You want to have a shared file-system across workers running on different
4. **You want to have a shared file-system across workers running on different
nodes -** specify a name (to create a bucket if it doesn't exist) and set
the mode to MOUNT. This will create an empty scratch space that workers
can write to. Any writes will show up on all worker's mount points.

When specifying a storage object, you can specify either of two modes:
- **COPY** mode:

This mode copies your files from a remote bucket to the specified path on
VM's disk during the file_mounting phase of launching your job. Note that
in this mode, any writes to the mount path are not replicated to the
source bucket.
When specifying a storage object, you can specify either of two modes:

- **MOUNT** mode:
- :code:`mode: MOUNT` (default)
This mode directly mounts the bucket at the specified path on the VM.
In effect, files are streamed from the backing source bucket as and when
they are accessed by applications. This mode also allows applications to
write to the mount path. All writes are replicated to remote bucket (and
any other VMs mounting the same bucket). Please note that this mode
uses a close-to-open consistency model, so file writes are committed only
when the file is closed.

This mode directly mounts the bucket at the specified path on the VM.
In effect, files are streamed from the backing source bucket as and when
they are accessed by applications. This mode also allows applications to
write to the mount path. All writes are replicated to remote bucket (and
any other VMs mounting the same bucket). Please note that this mode
uses a close-to-open consistency model, so file writes are committed only
when the file is closed.
- :code:`mode: COPY`
This mode pre-fetches your files from remote storage and caches them on the
local disk. Note that in this mode, any writes to the mount path are not
replicated to the source bucket.

Here are a few examples covering a range of use cases for sky file_mounts
and storage mounting
and storage mounting:

.. code-block:: yaml
Expand Down Expand Up @@ -169,7 +168,7 @@ and storage mounting
For mounts not using Sky Storage (e.g., those using rsync) the symbolic links are directly copied.
Their targets must be separately mounted or else the symlinks may break.

Creating a shared File System
Creating a shared file system
-----------------------------

Sky Storage can also be used to create a shared file-system that multiple tasks
Expand All @@ -187,7 +186,7 @@ and use mount mode when attaching it to your tasks like so:
mode: MOUNT
Here is a `simple example <https://github.com/sky-proj/sky/blob/master/examples/storage/pingpong.yaml/>`_
Here is a `simple example <https://github.com/sky-proj/sky/blob/master/examples/storage/pingpong.yaml>`_
using sky storage to perform communication between processes using files.


Expand All @@ -197,17 +196,19 @@ Using Sky Storage CLI tools
To manage persistent Storage objects, the sky CLI provides two useful commands -
:code:`sky storage ls` and :code:`sky storage delete`.

1. :code:`sky storage ls` shows the currently provisioned Storage objects.
1. :code:`sky storage ls` shows the currently provisioned Storage objects.

.. code-block:: console
$ sky storage ls
NAME CREATED STORE COMMAND STATUS
sky-dataset-romil 3 mins ago S3 sky launch -c demo examples/storage_demo.yaml READY
2. :code:`sky storage delete` allows you to delete any Storage objects managed by
sky.
2. :code:`sky storage delete` allows you to delete any Storage objects managed
by sky.

.. code-block:: console
$ sky storage delete sky-dataset-romil
Deleting storage object sky-dataset-romil...
I 04-02 19:42:24 storage.py:336] Detected existing storage object, loading Storage: sky-dataset-romil
Expand Down

0 comments on commit c43121e

Please sign in to comment.