Are deployments taken care of during a kubernetes upgrade? #11371

sakshiarora13 · 2024-07-11T05:00:51Z

I have a question regarding the Kubernetes upgrade process using Kubespray.

I am planning to upgrade my Kubernetes cluster with Kubespray and I am concerned about the impact on my running deployments during the upgrade.
Specifically, I would like to know:

Does Kubespray handle the graceful termination and rescheduling of pods during the upgrade process?
Are there any built-in mechanisms in Kubespray to ensure minimal disruption to my running deployments?
Should I manually handle draining and uncordoning of nodes, or does Kubespray manage this automatically during the upgrade?
What are the best practices recommended by the Kubespray team to ensure high availability of applications during the upgrade process?

Any insights or recommendations you can provide would be greatly appreciated.

Thanks!

juliohm1978 · 2024-07-13T19:49:06Z

I'm not part of the Kubespray dev team, but I've been using Kubespray on a regular basis for a number of production clusters.

In my experience, you have a few options.

By default, Kubespray will drain each node before upgrading it. That means eventually every Pod will be killed and recreated during the upgrade process. Kubespray will issue the equivalent of a kubectl drain... to the api server before running the upgrade playbook on the node. Graceful termintation of the Pods is handled by Kubernetes itself, which also depends on how your Pods are configured and how each container is capable of handling TERM and KILL signals.

You can disable node draining in Kubespray by adding drain_nodes: false to your k8s-cluster.yml. That means all Pods will keep running as the nodes are upgraded, but there is no guarantee they won't be killed and restarted on the same node.

During the upgrade, all basic components such as the kubelet, containerd, cni (ie: calico), kube-proxy... they all get restarted with their new versions. Network activity for your pods will be disrupted momentarily and this usually causes applications to crash or produce connectivity errors. You might feel that a node reboot after the upgrade could refresh and bring everything back to normal.

Draining nodes is a good choice for applications that have high availability, such as microservices with multiple instances running at all times. Whem some of their pods are killed and moved during the upgrade, users won't feel any downtime. Older, monolithic, traditional applications that can only exist 1 instance at a time will suffer no matter how you chose to upgrade. They will most likely be restarted and downtime should be expected.

You also have the option to run the upgrade in partial steps, ie: one node at a time. It takes a lot longer than usual (maybe days if you need to), but it will give those older applications the chance to move to other nodes in a more controlled fashion.

To do this, you can run upgrade-cluster.yml giving ansible the --limit command line paramater with a list of nodes. First, run the upgrade with --limit=kube_control_plane and wait for the master nodes to be upgraded. This should not disrupt the cluster at all.

After that, you can do the same for each node simply changing the --limit parameter, except now you can drain each node manually:

cordon node01 any time (no more pods scheduled here)
wait for a convenient time to upgrade node01
drain node01 manully with kubectl drain
run upgrade-cluster.yml with limit=node-01
reboot node01
uncordon node01

Repeat the same process for all nodes at your convenience.

The important thing to remember is to never skip minor kuberentes versions during the upgrade. For example, if you are using k8s v1.22, you should upgrade ALL NODES to v1.23, then to v1.24, then to v1.25, etc. Following that order is important because the K8s community is commited to maintaining backward compartibility between a couple of versions. If you skip too many versions, disruptions during the upgrade can get even messier. For example, the kubelet won't be able to communicate with the api-server if their versions are not compatible.

For Kubespray, most of the time, that just means follow each minor Kubespray version as you upgrade, using its default supported k8s version.

To be safe, whenever Kubespray has a minor release, like 2.24.0, just wait a few weeks before upgrading. There's a good chance that patch versions will follow up to fix any regressions or new bugs in the playbook (2.24.1, 2.24.2, 2.24.3 ...).

Other than that, it's been working great for a few years now.

VannTen · 2024-08-27T15:22:45Z

TL;DR:
1,2, and 4 are handled by Kubernetes, not by Kubespray itself (for 2 check PodDisruptionBudget)
3. This is handled by Kubespray when using the upgrade-cluster.yml playbook.
/kind support

VannTen closed this as completed Aug 27, 2024

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are deployments taken care of during a kubernetes upgrade? #11371

Are deployments taken care of during a kubernetes upgrade? #11371

sakshiarora13 commented Jul 11, 2024

juliohm1978 commented Jul 13, 2024 •

edited

Loading

VannTen commented Aug 27, 2024

Are deployments taken care of during a kubernetes upgrade? #11371

Are deployments taken care of during a kubernetes upgrade? #11371

Comments

sakshiarora13 commented Jul 11, 2024

juliohm1978 commented Jul 13, 2024 • edited Loading

VannTen commented Aug 27, 2024

juliohm1978 commented Jul 13, 2024 •

edited

Loading