Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-6697: Fixes broken links, prereqs, YAML refs, and adds note about… #281

Merged
merged 2 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ ifdef::cloud-service[]
* You have downloaded and installed the OpenShift command-line interface (CLI). See link:https://docs.openshift.com/dedicated/cli_reference/openshift_cli/getting-started-cli.html#installing-openshift-cli[Installing the OpenShift CLI] (Red Hat OpenShift Dedicated) or link:https://docs.openshift.com/rosa/cli_reference/openshift_cli/getting-started-cli.html#installing-openshift-cli[Installing the OpenShift CLI] (Red Hat OpenShift Service on AWS).
endif::[]

ifndef::upstream[]
* You have enabled the required distributed workloads components as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-the-distributed-workloads-components_distributed-workloads[Configuring the distributed workloads components].
endif::[]
ifdef::upstream[]
* You have enabled the required distributed workloads components as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-the-distributed-workloads-components_distributed-workloads[Configuring the distributed workloads components].
endif::[]

* You have sufficient resources. In addition to the base {productname-short} resources, you need 1.6 vCPU and 2 GiB memory to deploy the distributed workloads infrastructure.

* The resources are physically available in the cluster.
Expand Down
9 changes: 8 additions & 1 deletion modules/configuring-the-codeflare-operator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ ifdef::cloud-service[]
* You have logged in to OpenShift with the `cluster-admin` role.
endif::[]

ifndef::upstream[]
* You have enabled the required distributed workloads components as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-the-distributed-workloads-components_distributed-workloads[Configuring the distributed workloads components].
endif::[]
ifdef::upstream[]
* You have enabled the required distributed workloads components as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-the-distributed-workloads-components_distributed-workloads[Configuring the distributed workloads components].
endif::[]


.Procedure
ifdef::upstream,self-managed[]
Expand All @@ -33,7 +40,7 @@ endif::[]
. Search for the *codeflare-operator-config* config map, and click the config map name to open the *ConfigMap details* page.

. Click the *YAML* tab to show the config map specifications.
. In the `data` > `config.yaml` > `kuberay` section, you can edit the following entries:
. In the `data:config.yaml:kuberay` section, you can edit the following entries:
+
ingressDomain::
This configuration option is null (`ingressDomain: ""`) by default.
Expand Down
11 changes: 3 additions & 8 deletions modules/configuring-the-distributed-workloads-components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,6 @@ Instead, users must configure the Ray job specification to set `submissionMode=H
* You have access to the data sets and models that the distributed workload uses.
* You have access to the Python dependencies for the distributed workload.

ifndef::upstream[]
* You have created the required Kueue resources as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads].
endif::[]
ifdef::upstream[]
* You have created the required Kueue resources as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads].
endif::[]

ifndef::upstream[]
* You have removed any previously installed instances of the CodeFlare Operator, as described in the Knowledgebase solution link:https://access.redhat.com/solutions/7043796[How to migrate from a separately installed CodeFlare Operator in your data science cluster].
endif::[]
Expand Down Expand Up @@ -138,7 +131,9 @@ endif::[]
. Click the *Data Science Cluster* tab.
. Click the default instance name (for example, *default-dsc*) to open the instance details page.
. Click the *YAML* tab to show the instance specifications.
. In the `spec.components` section, ensure that the `managementState` field is set correctly for the required components depending on whether the distributed workload is run from a pipeline or notebook or both, as shown in the following table.
. Enable the required distributed workloads components.
In the `spec:components` section, set the `managementState` field correctly for the required components.
The list of required components depends on whether the distributed workload is run from a pipeline or notebook or both, as shown in the following table.
+
.Components required for distributed workloads
[cols="34,20,20,26"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ To run a distributed data science workload in a disconnected environment, you mu
* You have created a data science project.

.Procedure
. Configure the disconnected data science cluster to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
. Configure the disconnected data science cluster to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
. In the `ClusterConfiguration` section of the notebook or pipeline, ensure that the `image` value specifies a Ray cluster image that can be accessed from the disconnected environment:
* Notebooks use the Ray cluster image to create a Ray cluster when running the notebook.
* Pipelines use the Ray cluster image to create a Ray cluster during the pipeline run.
Expand All @@ -33,7 +33,7 @@ PIP_TRUSTED_HOST: pypi-notebook.apps.mylocation.com
where
* `PIP_INDEX_URL` specifies the base URL of your private PyPI server (the default value is https://pypi.org).
* `PIP_TRUSTED_HOST` configures Python to mark the specified host as trusted, regardless of whether that host has a valid SSL certificate or is using a secure channel.
. Run the distributed data science workload, as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#running-distributed-data-science-workloads-from-notebooks_distributed-workloads[Running distributed data science workloads from notebooks] or link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#running-distributed-data-science-workloads-from-ds-pipelines_distributed-workloads[Running distributed data science workloads from data science pipelines].
. Run the distributed data science workload, as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#running-distributed-data-science-workloads-from-notebooks_distributed-workloads[Running distributed data science workloads from notebooks] or link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#running-distributed-data-science-workloads-from-ds-pipelines_distributed-workloads[Running distributed data science workloads from data science pipelines].

.Verification
The notebook or pipeline run completes without errors:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
:_module-type: PROCEDURE

[id="running-distributed-data-science-workloads-from-ds-pipeline_{context}"]
[id="running-distributed-data-science-workloads-from-ds-pipelines_{context}"]
= Running distributed data science workloads from data science pipelines

[role='_abstract']
Expand All @@ -15,14 +15,21 @@ ifdef::cloud-service[]
endif::[]

ifndef::upstream[]
* You have created the required Kueue resources as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads].
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]
ifdef::upstream[]
* You have created the required Kueue resources as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads].
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]

ifndef::upstream[]
* You have created the required Kueue resources as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed-workloads[Configuring quota management for distributed workloads].
endif::[]
ifdef::upstream[]
* You have created the required Kueue resources as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed-workloads[Configuring quota management for distributed workloads].
endif::[]

ifndef::upstream[]
* Optional: You have defined a _default_ local queue for the Ray cluster by creating a `LocalQueue` resource and adding the following annotation to the configuration details for that `LocalQueue` resource, as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads]:
* Optional: You have defined a _default_ local queue for the Ray cluster by creating a `LocalQueue` resource and adding the following annotation to the configuration details for that `LocalQueue` resource, as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed-workloads[Configuring quota management for distributed workloads]:
+
[source,bash]
----
Expand All @@ -35,7 +42,7 @@ If you do not create a default local queue, you must specify a local queue in ea
====
endif::[]
ifdef::upstream[]
* Optional: You have defined a _default_ local queue for the Ray cluster by creating a `LocalQueue` resource and adding the following annotation to the configuration details for that `LocalQueue` resource, as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed_workloads[Configuring quota management for distributed workloads]:
* Optional: You have defined a _default_ local queue for the Ray cluster by creating a `LocalQueue` resource and adding the following annotation to the configuration details for that `LocalQueue` resource, as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-quota-management-for-distributed-workloads_distributed-workloads[Configuring quota management for distributed workloads]:
+
[source,bash]
----
Expand All @@ -48,13 +55,6 @@ If you do not create a default local queue, you must specify a local queue in ea
====
endif::[]

ifndef::upstream[]
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{rhoaidocshome}{default-format-url}/working_with_distributed_workloads/working-with-distributed-workloads_distributed-workloads#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]
ifdef::upstream[]
* You have access to a data science cluster that is configured to run distributed workloads as described in link:{odhdocshome}/working_with_distributed_workloads/#configuring-distributed-workloads_distributed-workloads[Configuring distributed workloads].
endif::[]

* You have access to S3-compatible object storage.
* You have logged in to {productname-long}.
* You have created a data science project.
Expand Down Expand Up @@ -175,6 +175,10 @@ if __name__ == '__main__':
<2> Authenticates with the cluster by using values that you specify when creating the pipeline run
// Commenting out second part of callout 2 until RHOAIENG-880 is fixed
//; you can omit this section if the Ray cluster is configured to use the same namespace as the data science project
[NOTE]
----
If your cluster uses self-signed certificates, include `ca-cert-path=__<path>__` in the `TokenAuthentication` parameter list, where `__<path>__` is the path to the cluster-wide Certificate Authority (CA) bundle that contains the self-signed certificates.
bredamc marked this conversation as resolved.
Show resolved Hide resolved
----
<3> Specifies the Ray cluster configuration: replace these example values with the values for your Ray cluster
<4> Specifies the location of the Ray cluster image: if using a disconnected environment, replace the default value with the location for your environment
<5> Specifies the local queue to which the Ray cluster will be submitted: you can omit this line if you configured a default local queue
Expand Down Expand Up @@ -209,10 +213,10 @@ ifdef::upstream[]
endif::[]

ifndef::upstream[]
. When the pipeline run is complete, confirm that it is included in the list of triggered pipeline runs, as described in link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects/working-with-data-science-pipelines_ds-pipelines#viewing-triggered-pipeline-runs_ds-pipelines[Viewing triggered pipeline runs].
. When the pipeline run is complete, confirm that it is included in the list of triggered pipeline runs, as described in link:{rhoaidocshome}{default-format-url}/working_on_data_science_projects/working-with-data-science-pipelines_ds-pipelines#viewing-the-details-of-a-pipeline-run_ds-pipelines[Viewing the details of a pipeline run].
endif::[]
ifdef::upstream[]
. When the pipeline run is complete, confirm that it is included in the list of triggered pipeline runs, as described in link:{odhdocshome}/working_on_data_science_projects/#viewing-triggered-pipeline-runs_ds-pipelines[Viewing triggered pipeline runs].
. When the pipeline run is complete, confirm that it is included in the list of triggered pipeline runs, as described in link:{odhdocshome}/working_on_data_science_projects/#viewing-the-details-of-a-pipeline-run_ds-pipelines[Viewing the details of a pipeline run].
endif::[]


Expand Down
Loading