Merge pull request #7444 from nunnatsa/book-kubevirt

📖 Add the quickstart details for KubeVirt
kubernetes-sigs · Nov 17, 2022 · cfdc9cd · cfdc9cd
2 parents 3ab9a01 + c1fe228
commit cfdc9cd
Showing 1 changed file with 253 additions and 14 deletions.
diff --git a/docs/book/src/user/quick-start.md b/docs/book/src/user/quick-start.md
@@ -61,7 +61,7 @@ a target [management cluster] on the selected [infrastructure provider].
    The installation procedure depends on the version of kind; if you are planning to use the Docker infrastructure provider,
    please follow the additional instructions in the dedicated tab:
 
-   {{#tabs name:"install-kind" tabs:"Default,Docker"}}
+   {{#tabs name:"install-kind" tabs:"Default,Docker,KubeVirt"}}
    {{#tab Default}}
 
    Create the kind cluster:
@@ -93,6 +93,57 @@ a target [management cluster] on the selected [infrastructure provider].
    Then follow the instruction for your kind version using  `kind create cluster --config kind-cluster-with-extramounts.yaml`
    to create the management cluster using the above file.
 
+   {{#/tab }}
+   {{#tab KubeVirt}}
+
+   #### Create the Kind Cluster
+   [KubeVirt][KubeVirt] is a cloud native virtualization solution. The virtual machines we're going to create and use for
+   the workload cluster's nodes, are actually running within pods in the management cluster. In order to communicate with
+   the workload cluster's API server, we'll need to expose it. We are using Kind which is a limited environment. The
+   easiest way to expose the workload cluster's API server (a pod within a node running in a VM that is itself running
+   within a pod in the management cluster, that is running inside a docker container), is to use a LoadBalancer service.
+
+   To allow using a LoadBalancer service, we can't use the kind's default CNI (kindnet), but we'll need to install
+   another CNI, like Calico. In order to do that, we'll need first to initiate the kind cluster with two modifications:
+   1. Disable the default CNI
+   2. Add the docker credentials to the cluster, to avoid the docker hub pull rate limit of the calico images; read more
+      about it in the [docker documentation](https://docs.docker.com/docker-hub/download-rate-limit/), and in the
+      [kind documentation](https://kind.sigs.k8s.io/docs/user/private-registries/#mount-a-config-file-to-each-node).
+
+   Create a configuration file for kind. Please notice the docker config file path, and adjust it to your local setting:
+   ```bash
+   cat <<EOF > kind-config.yaml
+   kind: Cluster
+   apiVersion: kind.x-k8s.io/v1alpha4
+   networking:
+   # the default CNI will not be installed
+     disableDefaultCNI: true
+   nodes:
+   - role: control-plane
+     extraMounts:
+      - containerPath: /var/lib/kubelet/config.json
+        hostPath: <YOUR DOCKER CONFIG FILE PATH>
+   EOF
+   ```
+   Now, create the kind cluster with the configuration file:
+   ```bash
+   kind create cluster --config=kind-config.yaml
+   ```
+   Test to ensure the local kind cluster is ready:
+   ```bash
+   kubectl cluster-info
+   ```
+
+   #### Install the Calico CNI
+   Now we'll need to install a CNI. In this example, we're using calico, but other CNIs should work as well. Please see
+   [calico installation guide](https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico)
+   for more details (use the "Manifest" tab). Below is an example of how to install calico version v3.24.4.
+
+   Use the Calico manifest to create the required resources; e.g.:
+   ```bash
+   kubectl create -f  https://raw.githubusercontent.com/projectcalico/calico/v3.24.4/manifests/calico.yaml
+   ```
+
    {{#/tab }}
    {{#/tabs }}
 
@@ -202,7 +253,7 @@ Additional documentation about experimental features can be found in [Experiment
 Depending on the infrastructure provider you are planning to use, some additional prerequisites should be satisfied
 before getting started with Cluster API. See below for the expected settings for common providers.
 
-{{#tabs name:"tab-installation-infrastructure" tabs:"AWS,Azure,CloudStack,DigitalOcean,Docker,Equinix Metal,GCP,Hetzner,IBM Cloud,KubeKey,Kubevirt,Metal3,Nutanix,OCI,OpenStack,Outscale,VCD,vcluster,Virtink,vSphere"}}
+{{#tabs name:"tab-installation-infrastructure" tabs:"AWS,Azure,CloudStack,DigitalOcean,Docker,Equinix Metal,GCP,Hetzner,IBM Cloud,KubeKey,KubeVirt,Metal3,Nutanix,OCI,OpenStack,Outscale,VCD,vcluster,Virtink,vSphere"}}
 {{#tab AWS}}
 
 Download the latest binary of `clusterawsadm` from the [AWS provider releases].
@@ -442,9 +493,61 @@ clusterctl init --infrastructure kubekey
 ```
 
 {{#/tab }}
-{{#tab Kubevirt}}
+{{#tab KubeVirt}}
+
+Please visit the [KubeVirt project][KubeVirt provider] for more information.
 
-Please visit the [Kubevirt project][Kubevirt provider].
+As described above, we want to use a LoadBalancer service in order to expose the workload cluster's API server. In the
+example below, we will use [MetalLB](https://metallb.universe.tf/) solution to implement load balancing to our kind
+cluster. Other solution should work as well.
+
+#### Install MetalLB for load balancing
+Install MetalLB, as described [here](https://metallb.universe.tf/installation/#installation-by-manifest); for example:
+```bash
+METALLB_VER=$(curl "https://api.github.com/repos/metallb/metallb/releases/latest" | jq -r ".tag_name")
+kubectl apply -f "https://raw.githubusercontent.com/metallb/metallb/${METALLB_VER}/config/manifests/metallb-native.yaml"
+kubectl wait pods -n metallb-system -l app=metallb,component=controller --for=condition=Ready --timeout=10m
+kubectl wait pods -n metallb-system -l app=metallb,component=speaker --for=condition=Ready --timeout=2m
+```
+
+Now, we'll create the `IPAddressPool` and the `L2Advertisement` custom resources. The script below creates the CRs with
+the right addresses, that match to the kind cluster addresses:
+```bash
+GW_IP=$(docker network inspect -f '{{range .IPAM.Config}}{{.Gateway}}{{end}}' kind)
+NET_IP=$(echo ${GW_IP} | sed -E 's|^([0-9]+\.[0-9]+)\..*$|\1|g')
+cat <<EOF | sed -E "s|172.19|${NET_IP}|g" | kubectl apply -f -
+apiVersion: metallb.io/v1beta1
+kind: IPAddressPool
+metadata:
+  name: capi-ip-pool
+  namespace: metallb-system
+spec:
+  addresses:
+  - 172.19.255.200-172.19.255.250
+---
+apiVersion: metallb.io/v1beta1
+kind: L2Advertisement
+metadata:
+  name: empty
+  namespace: metallb-system
+EOF
+```
+
+#### Install KubeVirt on the kind cluster
+```bash
+# get KubeVirt version
+KV_VER=$(curl "https://api.github.com/repos/kubevirt/kubevirt/releases/latest" | jq -r ".tag_name")
+# deploy required CRDs
+kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${KV_VER}/kubevirt-operator.yaml"
+# deploy the KubeVirt custom resource
+kubectl apply -f "https://github.com/kubevirt/kubevirt/releases/download/${KV_VER}/kubevirt-cr.yaml"
+kubectl wait -n kubevirt kv kubevirt --for=condition=Available --timeout=10m
+```
+
+#### Initialize the management cluster with the KubeVirt Provider
+```bash
+clusterctl init --infrastructure kubevirt
+```
 
 {{#/tab }}
 {{#tab Metal3}}
@@ -601,7 +704,7 @@ before configuring a cluster with Cluster API. Instructions are provided for com
 Otherwise, you can look at the `clusterctl generate cluster` [command][clusterctl generate cluster] documentation for details about how to
 discover the list of variables required by a cluster templates.
 
-{{#tabs name:"tab-configuration-infrastructure" tabs:"AWS,Azure,CloudStack,DigitalOcean,Docker,Equinix Metal,GCP,IBM Cloud,KubeKey,Kubevirt,Metal3,Nutanix,OpenStack,Outscale,VCD,vcluster,Virtink,vSphere"}}
+{{#tabs name:"tab-configuration-infrastructure" tabs:"AWS,Azure,CloudStack,DigitalOcean,Docker,Equinix Metal,GCP,IBM Cloud,KubeKey,KubeVirt,Metal3,Nutanix,OpenStack,Outscale,VCD,vcluster,Virtink,vSphere"}}
 {{#tab AWS}}
 
 ```bash
@@ -820,15 +923,14 @@ export CONTROL_PLANE_ENDPOINT_IP=<your-control-plane-virtual-ip>
 Please visit the [KubeKey provider] for more information.
 
 {{#/tab }}
-{{#tab Kubevirt}}
-
-A ClusterAPI compatible image must be available in your Kubevirt image library. For instructions on how to build a compatible image
-see [image-builder](https://image-builder.sigs.k8s.io/capi/capi.html).
+{{#tab KubeVirt}}
 
-To see all required Kubevirt environment variables execute:
 ```bash
-clusterctl generate cluster --infrastructure kubevirt --list-variables capi-quickstart
+export CAPK_GUEST_K8S_VERSION="v1.23.10"
+export CRI_PATH="/var/run/containerd/containerd.sock"
+export NODE_VM_IMAGE_TEMPLATE="quay.io/capk/ubuntu-2004-container-disk:${CAPK_GUEST_K8S_VERSION}"
 ```
+Please visit the [KubeVirt project][KubeVirt provider] for more information.
 
 {{#/tab }}
 {{#tab Metal3}}
@@ -1007,7 +1109,7 @@ For more information about prerequisites, credentials management, or permissions
 
 For the purpose of this tutorial, we'll name our cluster capi-quickstart.
 
-{{#tabs name:"tab-clusterctl-config-cluster" tabs:"Docker, vcluster, others..."}}
+{{#tabs name:"tab-clusterctl-config-cluster" tabs:"Docker, vcluster, KubeVirt, others..."}}
 {{#tab Docker}}
 
 <aside class="note warning">
@@ -1042,6 +1144,22 @@ clusterctl generate cluster ${CLUSTER_NAME} \
     --target-namespace ${CLUSTER_NAMESPACE} | kubectl apply -f -
 ```
 
+{{#/tab }}
+{{#tab KubeVirt}}
+
+As we described above, in this tutorial, we will use a LoadBalancer service in order to expose the API server of the
+workload cluster, so we want to use the load balancer (lb) template (rather than the default one). We'll use the
+clusterctl's `--flavor` flag for that:
+```bash
+clusterctl generate cluster capi-quickstart \
+  --infrastructure="kubevirt" \
+  --flavor lb \
+  --kubernetes-version ${CAPK_GUEST_K8S_VERSION} \
+  --control-plane-machine-count=1 \
+  --worker-machine-count=1 \
+  > capi-quickstart.yaml
+```
+
 {{#/tab }}
 {{#tab others...}}
 
@@ -1151,7 +1269,7 @@ Note: To use the default clusterctl method to retrieve kubeconfig for a workload
 
 Calico is used here as an example.
 
-{{#tabs name:"tab-deploy-cni" tabs:"Azure,vcluster,others..."}}
+{{#tabs name:"tab-deploy-cni" tabs:"Azure,vcluster,KubeVirt,others..."}}
 {{#tab Azure}}
 
 Azure [does not currently support Calico networking](https://docs.projectcalico.org/reference/public-cloud/azure). As a workaround, it is recommended that Azure clusters use the Calico spec below that uses VXLAN.
@@ -1173,6 +1291,126 @@ kubectl --kubeconfig=./capi-quickstart.kubeconfig get nodes
 
 Calico not required for vcluster.
 
+{{#/tab }}
+{{#tab KubeVirt}}
+
+Before deploying the Calico CNI, make sure the VMs are running:
+```bash
+kubectl get vm
+```
+
+If our new VMs are running, we should see a response similar to this:
+
+```text
+NAME                                  AGE    STATUS    READY
+capi-quickstart-control-plane-7s945   167m   Running   True
+capi-quickstart-md-0-zht5j            164m   Running   True
+```
+
+We can also read the virtual machine instances:
+```bash
+kubectl get vmi
+```
+The output will be similar to:
+```text
+NAME                                  AGE    PHASE     IP             NODENAME             READY
+capi-quickstart-control-plane-7s945   167m   Running   10.244.82.16   kind-control-plane   True
+capi-quickstart-md-0-zht5j            164m   Running   10.244.82.17   kind-control-plane   True
+```
+
+Since our workload cluster is running within the kind cluster, we need to prevent conflicts between the kind
+(management) cluster's CNI, and the workload cluster CNI. The following modifications in the default Calico settings
+are enough for these two CNI to work on (actually) the same environment.
+
+* Change the CIDR to a non-conflicting range
+* Change the value of the `CLUSTER_TYPE` environment variable to `k8s`
+* Change the value of the `CALICO_IPV4POOL_IPIP` environment variable to `Never`
+* Change the value of the `CALICO_IPV4POOL_VXLAN` environment variable to `Always`
+* Add the `FELIX_VXLANPORT` environment variable with the value of a non-conflicting port, e.g. `"6789"`.
+
+The following script downloads the Calico manifest and modifies the required field. The CIDR and the port values are examples.
+```bash
+curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.4/manifests/calico.yaml -o calico-workload.yaml
+
+sed -i -E 's|^( +)# (- name: CALICO_IPV4POOL_CIDR)$|\1\2|g;'\
+'s|^( +)# (  value: )"192.168.0.0/16"|\1\2"10.243.0.0/16"|g;'\
+'/- name: CLUSTER_TYPE/{ n; s/( +value: ").+/\1k8s"/g };'\
+'/- name: CALICO_IPV4POOL_IPIP/{ n; s/value: "Always"/value: "Never"/ };'\
+'/- name: CALICO_IPV4POOL_VXLAN/{ n; s/value: "Never"/value: "Always"/};'\
+'/# Set Felix endpoint to host default action to ACCEPT./a\            - name: FELIX_VXLANPORT\n              value: "6789"' \
+calico-workload.yaml
+```
+Now, deploy the Calico CNI on the workload cluster:
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig create -f calico-workload.yaml
+```
+
+After a short while, our nodes should be running and in `Ready` state, let’s check the status using `kubectl get nodes`:
+
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig get nodes
+```
+
+<aside class="note">
+
+<h1>Troubleshooting</h1>
+
+If the nodes don't become ready after a long period, read the pods in the `kube-system` namespace
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig get pod -n kube-system
+```
+
+If the Calico pods are in image pull error state (`ErrImagePull`), it's probably because of the docker hub pull rate limit.
+We can try to fix that by adding a secret with our docker hub credentials, and use it;
+see [here](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#registry-secret-existing-credentials)
+for details.
+
+First, create the secret. Please notice the docker config file path, and adjust it to your local setting.
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig create secret generic docker-creds \
+    --from-file=.dockerconfigjson=<YOUR DOCKER CONFIG FILE PATH> \
+    --type=kubernetes.io/dockerconfigjson \
+    -n kube-system
+```
+
+Now, if the `calico-node` pods are with status of `ErrImagePull`, patch their DaemonSet to make them use the new secret to pull images:
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig patch daemonset \
+    -n kube-system calico-node \
+    -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"docker-creds"}]}}}}'
+```
+
+After a short while, the calico-node pods will be with `Running` status. Now, if the calico-kube-controllers pod is also
+in `ErrImagePull` status, patch its deployment to fix the problem:
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig patch deployment \
+    -n kube-system calico-kube-controllers \
+    -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"docker-creds"}]}}}}'
+```
+
+Read the pods again
+```bash
+kubectl --kubeconfig=./capi-quickstart.kubeconfig get pod -n kube-system
+```
+
+Eventually, all the pods in the kube-system namespace will run, and the result should be similar to this:
+```text
+NAME                                                          READY   STATUS    RESTARTS   AGE
+calico-kube-controllers-c969cf844-dgld6                       1/1     Running   0          50s
+calico-node-7zz7c                                             1/1     Running   0          54s
+calico-node-jmjd6                                             1/1     Running   0          54s
+coredns-64897985d-dspjm                                       1/1     Running   0          3m49s
+coredns-64897985d-pgtgz                                       1/1     Running   0          3m49s
+etcd-capi-quickstart-control-plane-kjjbb                      1/1     Running   0          3m57s
+kube-apiserver-capi-quickstart-control-plane-kjjbb            1/1     Running   0          3m57s
+kube-controller-manager-capi-quickstart-control-plane-kjjbb   1/1     Running   0          3m57s
+kube-proxy-b9g5m                                              1/1     Running   0          3m12s
+kube-proxy-p6xx8                                              1/1     Running   0          3m49s
+kube-scheduler-capi-quickstart-control-plane-kjjbb            1/1     Running   0          3m57s
+```
+
+</aside>
+
 {{#/tab }}
 {{#tab others...}}
 
@@ -1244,7 +1482,8 @@ See the [clusterctl] documentation for more detail about clusterctl supported ac
 [Metal3 getting started guide]: https://github.com/metal3-io/cluster-api-provider-metal3/blob/master/docs/getting-started.md
 [Metal3 provider]: https://github.com/metal3-io/cluster-api-provider-metal3/
 [KubeKey provider]: https://github.com/kubesphere/kubekey
-[Kubevirt provider]: https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/
+[KubeVirt provider]: https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/
+[KubeVirt]: https://kubevirt.io/
 [oci-provider]: https://oracle.github.io/cluster-api-provider-oci/#getting-started
 [Equinix Metal getting started guide]: https://github.com/kubernetes-sigs/cluster-api-provider-packet#using
 [provider]:../reference/providers.md