Skip to content

CodeFlare Operator Installation

James Busche edited this page Sep 25, 2023 · 21 revisions

CodeFlare Operator Installation

Taken from: https://github.com/opendatahub-io/distributed-workloads/blob/main/Quick-Start.md

0. Pre-reqs:

0.1 Assumes you have an OpenShift Cluster

0.2 It assumes you're logged into the OpenShift Console of your OpenShift Cluster, to be able to install the ODH and CodeFlare operators. (Applying a subscription from the terminal is available if you don't have the OpenShift UI)

0.3 It assumes you've already used oc login to log into your OpenShift cluster from a terminal.

0.4 It also assumes you have a default storage class already set up. For the IBM Fyre clusters, I'm using "PortWorx" storage and have defined a default storageclass:

oc get sc |grep default
portworx-watson-assistant-sc (default)   kubernetes.io/portworx-volume   Retain          Immediate           true                   3h50m

1. Install ODH in openshift-operators using the OpenShift UI console.

1.1 Using your Console, navigate to Operators --> OperatorHub and filter for Open Data Hub Operator

1.2 Press Install, accept all the defaults and then press Install again.

Optionally, you could have issued the subscription from the terminal with this:

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: opendatahub-operator
  namespace: openshift-operators
spec:
  channel: rolling
  name: opendatahub-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
  startingCSV: opendatahub-operator.v1.9.0
EOF

1.3 Using your terminal, you can see that the ODH operator is running by:

oc get pods -n openshift-operators

and you'll see that it has started:

NAME                                                       READY   STATUS    RESTARTS   AGE
opendatahub-operator-controller-manager-84858b8998-7nd6q   2/2     Running   0          87s

2. Install the CodeFlare Operator into openshift-operators namespace using the OpenShift UI console:

2.1 Using your Console, navigate to Operators --> OperatorHub and filter for CodeFlare Operator

2.2 Press Install, accept all the defaults and then press Install again.

Optionally, you could have issued the subscription from the terminal with this:

cat << EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: codeflare-operator
  namespace: openshift-operators
spec:
  channel: alpha
  name: codeflare-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual #ManualAutomatic
  startingCSV: codeflare-operator.v1.0.0-rc.1
EOF

2.3 Using your terminal, you can see that the CodeFlare operator is running by:

oc get pods -n openshift-operators

and you'll see that it has started:

NAME                                                       READY   STATUS    RESTARTS   AGE
codeflare-operator-controller-manager-8594c586f4-rlbbv     2/2     Running   0          100s
opendatahub-operator-controller-manager-84858b8998-7nd6q   2/2     Running   0          2m24s

3. If you want to run GPU enabled workloads, you will need to install the Node Feature Discovery Operator and the NVIDIA GPU Operator from the OperatorHub.

3.1 Using your Console, navigate to Operators --> OperatorHub and filter for Node Feature Discovery. Select the Operator that's from the Red Hat catalog, not the Community)

3.2 Press Install, accept all the defaults and then press Install again.

3.3 Using your terminal, you can see that the Node Feature Discovery Operator is running by:

oc get pods -n openshift-nfd

and you'll see that it has started:

NAME                                     READY   STATUS    RESTARTS   AGE
nfd-controller-manager-b767b964c-sl7j2   2/2     Running   0          12s

3.4 Using your Console, navigate to Operators --> OperatorHub and filter for NVIDIA GPU Operator.

3.5 Press Install, accept all the defaults and then press Install again.

3.6 Using your terminal, you can see that the NVIDIA GPU Operator is running by:

oc get pods -n nvidia-gpu-operator

and you'll see that it has started:

NAME                            READY   STATUS    RESTARTS   AGE
gpu-operator-868867dbdb-2nd9s   1/1     Running   0          4m11s

4. Now with the Codeflare and ODH operators installed, (and the GPU operators installed if you have GPUs) you can deploy the kfdefs which will install the underlying stack to the opendatahub namespace:

4.1 Create the opendatahub namespace with the following command:

oc create ns opendatahub

4.2 Apply the odh-core kfdef with this command:

oc apply -f https://raw.githubusercontent.com/opendatahub-io/odh-manifests/master/kfdef/odh-core.yaml -n opendatahub

4.3 Create the CodeFlare-Stack kfdef with this command:

oc apply -f https://raw.githubusercontent.com/opendatahub-io/distributed-workloads/main/codeflare-stack-kfdef.yaml -n opendatahub

Note: The older version of the KFDEF without the latest CRD changes would be:

oc apply -f TBD

4.4 Check that everything is running in opendatahub with this command:

oc get pods -n opendatahub

It should look like this:

NAME                                                              READY   STATUS    RESTARTS   AGE
data-science-pipelines-operator-controller-manager-5fbfdc8x5wnx   1/1     Running   0          3m39s
etcd-85c59bc4d6-wn777                                             1/1     Running   0          3m41s
grafana-deployment-6cf577dbb6-ptcjp                               1/1     Running   0          3m35s
grafana-operator-controller-manager-54fbd5b876-zfbvz              2/2     Running   0          4m4s
instascale-instascale-66587c96f5-28chv                            1/1     Running   0          4m34s
kuberay-operator-67d58795bf-h8hwt                                 1/1     Running   0          4m31s
mcad-controller-mcad-5f5cb64ddb-mhf5p                             1/1     Running   0          4m34s
modelmesh-controller-5588b58d79-c46g5                             1/1     Running   0          3m41s
modelmesh-controller-5588b58d79-tn4rt                             1/1     Running   0          3m41s
modelmesh-controller-5588b58d79-wz82x                             1/1     Running   0          3m41s
notebook-controller-deployment-5c565c4c75-2pbzg                   1/1     Running   0          3m50s
odh-dashboard-7f46945556-kd7l5                                    2/2     Running   0          4m37s
odh-dashboard-7f46945556-vsg4m                                    2/2     Running   0          4m37s
odh-model-controller-79c67bc689-5559f                             1/1     Running   0          3m41s
odh-model-controller-79c67bc689-9q9ss                             1/1     Running   0          3m41s
odh-model-controller-79c67bc689-vnfbh                             1/1     Running   0          3m41s
odh-notebook-controller-manager-5cf77fdc56-s4cm6                  1/1     Running   0          3m50s
prometheus-odh-model-monitoring-0                                 3/3     Running   0          3m39s
prometheus-odh-model-monitoring-1                                 3/3     Running   0          3m39s
prometheus-odh-model-monitoring-2                                 3/3     Running   0          3m39s
prometheus-odh-monitoring-0                                       2/2     Running   0          3m58s
prometheus-odh-monitoring-1                                       2/2     Running   0          3m58s
prometheus-operator-779f765944-p2nbf                              1/1     Running   0          4m9s

5. Access the spawner page by going to your Open Data Hub dashboard. It'll be in the format of:

https://odh-dashboard-$ODH_NAMESPACE.apps.<your cluster's uri>

5.1 You can find it with this command:

oc get route -n opendatahub |grep dash |awk '{print $2}'

For example:

odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com

5.2 Put that in your browser. For example: https://odh-dashboard-opendatahub.apps.jimbig412.cp.fyre.ibm.com

- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well

5.3 Click on the link "Launch application" in the Jupyter tile.

5.4 Choose CodeFlare Notebook, and click "Start server"

4.5 Note, if this is the first time, it'll take awhile to pull the new container. You can watch it start from the terminal by issuing this:

oc get pods -n opendatahub |grep jupyter

And it'll show if the pod is starting or has started. For example:

jupyter-nb-kube-3aadmin-0                                         0/2     ContainerCreating   0          89s
and then a few minutes later:
jupyter-nb-kube-3aadmin-0                                         2/2     Running   0          2m30s

4.6 Note, It's also using a pvc:

oc get pvc
NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                   AGE
jupyterhub-nb-kube-3aadmin-pvc   Bound    pvc-28c725bd-6ba8-4bf4-92fe-b88b82b58fc6   1Gi        RWO            portworx-watson-assistant-sc   3m32s

6. In the Jupyter Notebook:

6.1 Click either "Open in a new tab" or "Open in current tab"

- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well

6.2 Click on the "+" to open up a new window, select terminal Inside this terminal, do this:

git clone https://github.com/project-codeflare/codeflare-sdk.git

Then you can close the terminal

6.2 On the far left, navigate to: codeflare-sdk --> demo-notebooks --> guided-demos

7. Then walk through the various guided Jupyter Notebook examples one-by-one to see what you can do with CodeFlare

Hint 1: If you don't want to reveal your OC token in your Jupyter notebook, you can use the terminal to oc login instead of the skip auth = TokenAuthentication step.

Hint 2: When you do a cluster.up() it defaults to the default namespace. You can see your cluster start like this:

oc get pods -n default                                                                                                       api.jim412.cp.fyre.ibm.com: Wed May 24 17:41:17 2023

NAME                                           READY   STATUS              RESTARTS   AGE
mnisttest-head-zgpvt                           0/1     ContainerCreating   0          60s
mnisttest-worker-small-group-mnisttest-lqbr8   0/1     PodInitializing     0          60s
mnisttest-worker-small-group-mnisttest-ztr27   0/1     PodInitializing     0          60s

The first time, it has to pull the pods and takes a few minutes. Future runs, the images will be cached.

Cleaning up the CodeFlare Install

To completely clean up all the CodeFlare components after an install, follow these steps:

  1. No appwrappers should be left running:

    oc get appwrappers -A

    If any are left, you'd want to delete them

  2. Remove the notebook and notebook pvc:

    oc delete notebook jupyter-nb-kube-3aadmin -n opendatahub
    oc delete pvc jupyterhub-nb-kube-3aadmin-pvc -n opendatahub
  3. Remove the codeflare-stack kfdef: (Removes MCAD, InstaScale, KubeRay and the Notebook image)

    oc delete kfdef codeflare-stack -n opendatahub
  4. Remove the CodeFlare Operator csv and subscription: (Removes the CodeFlare Operator from the OpenShift Cluster)

    oc delete sub codeflare-operator -n openshift-operators
    oc delete csv `oc get csv -n opendatahub |grep codeflare-operator |awk '{print $1}'` -n openshift-operators
  5. Remove the CodeFlare CRDs

    oc delete crd instascales.codeflare.codeflare.dev mcads.codeflare.codeflare.dev schedulingspecs.mcad.ibm.com appwrappers.mcad.ibm.com quotasubtrees.ibm.com