Kubeflow Operator helps deploy, monitor and manage the lifecycle of Kubeflow. Built using the Operator Framework which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators.
The Operator is currently in incubation phase and is based on this design doc. It is built on top of kfdef CR, and uses kfctl as the nucleus for Controller. Current roadmap for this Operator is listed here.
- Clone this repository and deploy the CRD and controller
# git clone https://github.com/kubeflow/kfctl.git && cd kfctl
OPERATOR_NAMESPACE=operators
kubectl create ns ${OPERATOR_NAMESPACE}
kubectl create -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml
kubectl create -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE}
kubectl create clusterrolebinding kubeflow-operator --clusterrole cluster-admin --serviceaccount=${OPERATOR_NAMESPACE}:kubeflow-operator
kubectl create -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE}
- Deploy KfDef. You can optionally apply ResourceQuota if your Kubernetes version is 1.15+, which will allow only one kfdef instance or one deployment of Kubeflow on this cluster, which follows the singleton model. we use ResourceQuota to provide constraints that only one instance of kfdef is allowed within the Kubeflow namespace.
KUBEFLOW_NAMESPACE=kubeflow
kubectl create ns ${KUBEFLOW_NAMESPACE}
# kubectl create -f deploy/crds/kfdef_quota.yaml -n ${KUBEFLOW_NAMESPACE} # only deploy this if the k8s cluster is 1.15+ and has resource quota support
kubectl create -f <kfdef> -n ${KUBEFLOW_NAMESPACE}
above can point to a remote URL or to a local kfdef file. For e.g. for IBM Cloud, command will be
kubectl create -f https://raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_ibm.yaml -n ${KUBEFLOW_NAMESPACE}
One of the major benefits of using kfctl as an Operator is to leverage the functionalities around being able to watch and reconcile your Kubeflow deployments. The Operator is watching all the resources with the kfctl
label. If one of the resources is deleted,
the reconciler will be triggered and re-apply the kfdef to the Kubernetes Cluster.
- Check the tf-job-operator deployment is running.
kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
# NAME READY UP-TO-DATE AVAILABLE AGE
# tf-job-operator 1/1 1 1 7m15s
- Delete the tf-job-operator deployment
kubectl delete deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
# deployment.extensions "tf-job-operator" deleted
- Wait for 10 to 15 seconds, then check the tf-job-operator deployment again. You will be able to see that the deployment is being recreated by the Operator's reconciliation logic
kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator
# NAME READY UP-TO-DATE AVAILABLE AGE
# tf-job-operator 0/1 0 0 10s
Delete KubeFlow deployment
kubectl delete kfdef -n ${KUBEFLOW_NAMESPACE} --all
Delete KubeFlow Operator
kubectl delete -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE}
kubectl delete clusterrolebinding kubeflow-operator
kubectl delete -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE}
kubectl delete -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml
kubectl delete ns ${OPERATOR_NAMESPACE}
Please follow the instructions here to register your Operator to OLM if you are using that to install and manage the Operator.
- When deleting the KubeFlow deployment, it's using kfctl delete in the background where it only deletes the deployment namespace. This will make some of KubeFlow pod deployments hanging because mutatingwebhookconfigurations are cluster-wide resources and some of the webhooks are watching every pod deployment. Therefore, we need to remove all the mutatingwebhookconfigurations so that pod deployments will not be hanging after deleting KubeFlow.
kubectl delete mutatingwebhookconfigurations admission-webhook-mutating-webhook-configuration
kubectl delete mutatingwebhookconfigurations inferenceservice.serving.kubeflow.org
kubectl delete mutatingwebhookconfigurations istio-sidecar-injector
kubectl delete mutatingwebhookconfigurations katib-mutating-webhook-config
kubectl delete mutatingwebhookconfigurations mutating-webhook-configurations
- Install operator-sdk
- Install golang
These steps are based on the operator-sdk with modifications that are specific for this KubeFlow operator.
- Clone this repository under your
$GOPATH
. (e.g.~/go/src/github.com/kubeflow/
)
git clone https://github.com/kubeflow/kfctl
cd kfctl
- Build and push the operator
export OPERATOR_IMG=<docker_username>/kubeflow-operator
make build-operator
make push-operator