Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Kubernetes Docs #2324

Merged
merged 26 commits into from
Sep 16, 2023
Merged

[k8s] Kubernetes Docs #2324

merged 26 commits into from
Sep 16, 2023

Conversation

romilbhardwaj
Copy link
Collaborator

@romilbhardwaj romilbhardwaj commented Jul 29, 2023

Adds documentation for kubernetes. Can be improved iteratively with user feedback.

Tested (run the relevant ones):

  • Rendered locally

@romilbhardwaj romilbhardwaj marked this pull request as ready for review September 5, 2023 21:06
@romilbhardwaj
Copy link
Collaborator Author

This is ready for review!

@romilbhardwaj romilbhardwaj added this to the 0.4 milestone Sep 11, 2023
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romilbhardwaj! Did a pass, some comments.



.. note::
To cleanup any leftover jobs from the GPU labelling process, run ``python -m sky.utils.kubernetes.gpu_labeler --cleanup``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, wondering what are these leftover jobs.

Copy link
Collaborator Author

@romilbhardwaj romilbhardwaj Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching - reworded now.

@romilbhardwaj
Copy link
Collaborator Author

Thanks for the review @concretevitamin! Ready for another look.

@@ -16,6 +16,7 @@ Install SkyPilot using pip:
$ pip install skypilot
$ # pip install "skypilot[gcp]"
$ # pip install "skypilot[azure]"
$ # pip install "skypilot[kubernetes]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider placing it after L23 since it's alpha?

$ mkdir -p ~/.kube
$ cp /path/to/kubeconfig ~/.kube/config

See :ref:`SkyPilot on Kubernetes <kubernetes-overview>` for more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to fix title

* CPU Tasks - ✅ Available
* Auto-down - ✅ Available
* Storage mounting - ✅ Available on x86_64 clusters
* GPU Tasks - ✅ Available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that L146 GPU Tasks should be moved to the top of the bullet list.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romilbhardwaj! Just some minor nits.

@@ -0,0 +1,152 @@
.. _kubernetes-overview:

Deploying on Kubernetes (Alpha)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploying sounds like "deploying skypilot (as a long-running service)". Wdyt about "Running on Kubernetes"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent between "Deploying on Kubernetes" and "Running on Kubernetes". I've changed to "Running on Kubernetes".

Personally, I prefer "SkyPilot on Kubernetes", since it captures both - deployment (initial setup and considerations involved in SkyPilot on Kubernetes) and running (ongoing tasks, benefits, and features of using SkyPilot with Kubernetes) tasks. Ray also adopts a similar title "Ray on Kubernetes".

The Kubernetes cluster gets added to the list of "clouds" in SkyPilot and SkyPilot
tasks can be submitted to your Kubernetes cluster just like any other cloud provider.

**Benefits of bringing your Kubernetes cluster to SkyPilot:**
**Benefits of using SkyPilot to run jobs on your Kubernetes cluster:**

* Get SkyPilot features (setup management, job execution, queuing, logging, SSH access) on your Kubernetes resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Get SkyPilot features (setup management, job execution, queuing, logging, SSH access) on your Kubernetes resources
* Get ergonomic SkyPilot features (setup management, job execution, queuing, logging, SSH access) on your Kubernetes resources

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about adding ergonomic - sounds a little fluffy. The convenience is covered in the examples in brackets and the next bullet point.


Once your cluster administrator has :ref:`setup a Kubernetes cluster <kubernetes-setup>` and provided you with a kubeconfig file:

0. Make sure `kubectl <https://kubernetes.io/docs/tasks/tools/>`_, ``socat`` and ``lsof`` are installed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best way but wearing a user's hat, having socat as a requirement is barrier to entry? Mac doesn't come with it. Can we skip it from docs?

@@ -1,3 +1,5 @@
:orphan:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide users should use older versions on readthedocs, may as well remove local/*?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removing local/*.

@romilbhardwaj
Copy link
Collaborator Author

Thanks @concretevitamin, PTAL.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @romilbhardwaj!

@@ -82,7 +91,7 @@ Once your cluster administrator has :ref:`setup a Kubernetes cluster <kubernetes

.. code-block:: console

$ sky launch --cpus 2+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is task.yaml optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's optional - sky launch --cpus 2+ also works. However, I added task.yaml to avoid confusing users ("Where do I specify my task yaml? Is Kubernetes any different from regular SkyPilot in this regard?").

@romilbhardwaj
Copy link
Collaborator Author

Thanks @concretevitamin! Merging now.

@romilbhardwaj romilbhardwaj merged commit 3cd931f into master Sep 16, 2023
@romilbhardwaj romilbhardwaj deleted the k8s_docs branch September 16, 2023 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants