Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-11839: Updates the download instructions for the demo notebooks #435

Merged
merged 7 commits into from
Sep 12, 2024
5 changes: 4 additions & 1 deletion assemblies/running-distributed-workloads.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@ ifdef::context[:parent-context: {context}]
[role='_abstract']

In {productname-short}, you can run a distributed workload from a notebook or from a pipeline.
You can also run distributed workloads in a disconnected environment if you have access to all of the required software.

You can run distributed workloads in a disconnected environment if you can access all of the required software from that environment.
For example, you must be able to access a Ray cluster image, and the data sets and Python dependencies used by the workload, from the disconnected environment.

include::modules/downloading-the-demo-notebooks-from-the-codeflare-sdk.adoc[leveloffset=+1]
include::modules/running-distributed-data-science-workloads-from-notebooks.adoc[leveloffset=+1]
include::modules/running-distributed-data-science-workloads-from-ds-pipelines.adoc[leveloffset=+1]
ifdef::self-managed[]
Expand Down
80 changes: 80 additions & 0 deletions modules/downloading-the-demo-notebooks-from-the-codeflare-sdk.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
:_module-type: PROCEDURE

[id="downloading-the-demo-notebooks-from-the-codeflare-sdk_{context}"]
= Downloading the demo notebooks from the CodeFlare SDK

[role='_abstract']
If you want to run distributed workloads from notebooks, the demo notebooks from the CodeFlare SDK provide guidelines on how to use the CodeFlare stack in your own notebooks.

If you do not want to run distributed workloads from notebooks, you can skip this section.

.Prerequisites
ifndef::upstream[]
* You can access a data science cluster that is configured to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]
ifdef::upstream[]
* You can access a data science cluster that is configured to run distributed workloads as described in link:{odhdocshome}/working-with-distributed-workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]

ifndef::upstream[]
* You can access a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook.
For information about projects and workbenches, see link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects[Working on data science projects].
endif::[]
ifdef::upstream[]
* You can access a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook.
For information about projects and workbenches, see link:{odhdocshome}/working-on-data-science-projects[Working on data science projects].
endif::[]

* You have Admin access for the data science project.
** If you created the project, you automatically have Admin access.
** If you did not create the project, your cluster administrator must give you Admin access.

* You have logged in to {productname-long}.
* You have launched your notebook server and logged in to your notebook editor.
The examples in this procedure refer to the JupyterLab integrated development environment (IDE).

.Procedure
. In the JupyterLab interface, click *File > New > Notebook*, and then click *Select*.
+
A new notebook is created in a `.ipynb` file.
. Add the following code to a cell in the new notebook:
+
.Code to download the demo notebooks
[source,bash]
----
from codeflare_sdk import copy_demo_nbs
copy_demo_nbs()
----

. Select the cell, and click *Run > Run selected cell*.
+
After a few seconds, the `copy_demo_nbs()` function copies the demo notebooks that are packaged with the currently installed version of the CodeFlare SDK, and clones them into the `demo-notebooks` folder.

. In the left navigation pane, right-click the new notebook and click *Delete*.
. Click *Delete* to confirm.


.Verification
Locate the downloaded demo notebooks in the JupyterLab interface, as follows:

. In the left navigation pane, double-click *demo-notebooks*.
. Double-click *additional-demos* and verify that the folder contains several demo notebooks.
. Click *demo-notebooks*.
. Double-click *guided-demos* and verify that the folder contains several demo notebooks.

ifndef::upstream[]
You can run these demo notebooks as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/running-distributed-workloads_distributed-workloads#running-distributed-data-science-workloads-from-notebooks_distributed-workloads[Running distributed data science workloads from notebooks].
endif::[]
ifdef::upstream[]
You can run these demo notebooks as described in link:{odhdocshome}/working_with_distributed_workloads/#running-distributed-data-science-workloads-from-notebooks_distributed-workloads[Running distributed data science workloads from notebooks].
endif::[]


////
[role='_additional-resources']
.Additional resources
<Do we want to link to additional resources?>


* link:https://url[link text]
////
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@
= Running distributed data science workloads from data science pipelines

[role='_abstract']
To run a distributed data science workload from a data science pipeline, you must first update the pipeline to include a link to your Ray cluster image.
To run a distributed workload from a pipeline, you must first update the pipeline to include a link to your Ray cluster image.

.Prerequisites
ifndef::upstream[]
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
* You can access a data science cluster that is configured to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]
ifdef::upstream[]
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{odhdocshome}/working-with-distributed-workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
* You can access a data science cluster that is configured to run distributed workloads as described in link:{odhdocshome}/working-with-distributed-workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]

ifndef::upstream[]
Expand Down Expand Up @@ -48,21 +48,28 @@ If your cluster administrator does not define a default local queue, you must sp
====
endif::[]


* You have access to S3-compatible object storage.
* You have logged in to {productname-long}.
* You can access the following software from your data science cluster:
** A Ray cluster image that is compatible with your hardware architecture
** The data sets and models to be used by the workload
** The Python dependencies for the workload, either in a Ray image or in your own Python Package Index (PyPI) server

ifndef::upstream[]
* You have created a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook. For information about how to create a project, see link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects/using-data-science-projects_projects#creating-a-data-science-project_projects[Creating a data science project].
* You can access a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook.
For information about projects and workbenches, see link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects[Working on data science projects].
endif::[]
ifdef::upstream[]
* You have created a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook. For information about how to create a project, see link:{odhdocshome}/working-on-data-science-projects/#creating-a-data-science-project_projects[Creating a data science project].
* You can access a data science project that contains a workbench, and the workbench is running a default notebook image that contains the CodeFlare SDK, for example, the *Standard Data Science* notebook.
For information about projects and workbenches, see link:{odhdocshome}/working-on-data-science-projects[Working on data science projects].
endif::[]

* You have Admin access for the data science project.
** If you created the project, you automatically have Admin access.
** If you did not create the project, your cluster administrator must give you Admin access.

* You have access to S3-compatible object storage.
* You have logged in to {productname-long}.


.Procedure
ifndef::upstream[]
. Create a data connection to connect the object storage to your data science project, as described in link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects/using-data-connections_projects#adding-a-data-connection-to-your-data-science-project_projects[Adding a data connection to your data science project].
Expand Down Expand Up @@ -178,7 +185,9 @@ if __name__ == '__main__': <14>
If no accelerators are required, set the value to 0 or omit the line.
Note: To specify the requested accelerators for the Ray cluster, use the `worker_extended_resource_requests` parameter instead of the deprecated `num_gpus` parameter.
For more details, see the link:https://github.com/project-codeflare/codeflare-sdk/blob/v0.18.0/src/codeflare_sdk/cluster/config.py#L43-L73[CodeFlare SDK documentation].
<6> Specifies the location of the Ray cluster image. If you are running this code in a disconnected environment, replace the default value with the location for your environment.
<6> Specifies the location of the Ray cluster image.
The default Ray image is an AMD64 image, which might not work on other architectures.
If you are running this code in a disconnected environment, replace the default value with the location for your environment.
<7> Specifies the local queue to which the Ray cluster will be submitted. If a default local queue is configured, you can omit this line.
<8> Creates a Ray cluster by using the specified image and configuration.
<9> Waits until the Ray cluster is ready before proceeding.
Expand Down
Loading