Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are deployments taken care of during a kubernetes upgrade? #11371

Closed
sakshiarora13 opened this issue Jul 11, 2024 · 2 comments
Closed

Are deployments taken care of during a kubernetes upgrade? #11371

sakshiarora13 opened this issue Jul 11, 2024 · 2 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@sakshiarora13
Copy link

I have a question regarding the Kubernetes upgrade process using Kubespray.

I am planning to upgrade my Kubernetes cluster with Kubespray and I am concerned about the impact on my running deployments during the upgrade.
Specifically, I would like to know:

  1. Does Kubespray handle the graceful termination and rescheduling of pods during the upgrade process?
  2. Are there any built-in mechanisms in Kubespray to ensure minimal disruption to my running deployments?
  3. Should I manually handle draining and uncordoning of nodes, or does Kubespray manage this automatically during the upgrade?
  4. What are the best practices recommended by the Kubespray team to ensure high availability of applications during the upgrade process?

Any insights or recommendations you can provide would be greatly appreciated.

Thanks!

@juliohm1978
Copy link
Contributor

juliohm1978 commented Jul 13, 2024

I'm not part of the Kubespray dev team, but I've been using Kubespray on a regular basis for a number of production clusters.

In my experience, you have a few options.

By default, Kubespray will drain each node before upgrading it. That means eventually every Pod will be killed and recreated during the upgrade process. Kubespray will issue the equivalent of a kubectl drain... to the api server before running the upgrade playbook on the node. Graceful termintation of the Pods is handled by Kubernetes itself, which also depends on how your Pods are configured and how each container is capable of handling TERM and KILL signals.

You can disable node draining in Kubespray by adding drain_nodes: false to your k8s-cluster.yml. That means all Pods will keep running as the nodes are upgraded, but there is no guarantee they won't be killed and restarted on the same node.

During the upgrade, all basic components such as the kubelet, containerd, cni (ie: calico), kube-proxy... they all get restarted with their new versions. Network activity for your pods will be disrupted momentarily and this usually causes applications to crash or produce connectivity errors. You might feel that a node reboot after the upgrade could refresh and bring everything back to normal.

Draining nodes is a good choice for applications that have high availability, such as microservices with multiple instances running at all times. Whem some of their pods are killed and moved during the upgrade, users won't feel any downtime. Older, monolithic, traditional applications that can only exist 1 instance at a time will suffer no matter how you chose to upgrade. They will most likely be restarted and downtime should be expected.

You also have the option to run the upgrade in partial steps, ie: one node at a time. It takes a lot longer than usual (maybe days if you need to), but it will give those older applications the chance to move to other nodes in a more controlled fashion.

To do this, you can run upgrade-cluster.yml giving ansible the --limit command line paramater with a list of nodes. First, run the upgrade with --limit=kube_control_plane and wait for the master nodes to be upgraded. This should not disrupt the cluster at all.

After that, you can do the same for each node simply changing the --limit parameter, except now you can drain each node manually:

  • cordon node01 any time (no more pods scheduled here)
  • wait for a convenient time to upgrade node01
  • drain node01 manully with kubectl drain
  • run upgrade-cluster.yml with limit=node-01
  • reboot node01
  • uncordon node01

Repeat the same process for all nodes at your convenience.

The important thing to remember is to never skip minor kuberentes versions during the upgrade. For example, if you are using k8s v1.22, you should upgrade ALL NODES to v1.23, then to v1.24, then to v1.25, etc. Following that order is important because the K8s community is commited to maintaining backward compartibility between a couple of versions. If you skip too many versions, disruptions during the upgrade can get even messier. For example, the kubelet won't be able to communicate with the api-server if their versions are not compatible.

For Kubespray, most of the time, that just means follow each minor Kubespray version as you upgrade, using its default supported k8s version.

To be safe, whenever Kubespray has a minor release, like 2.24.0, just wait a few weeks before upgrading. There's a good chance that patch versions will follow up to fix any regressions or new bugs in the playbook (2.24.1, 2.24.2, 2.24.3 ...).

Other than that, it's been working great for a few years now.

@VannTen
Copy link
Contributor

VannTen commented Aug 27, 2024

TL;DR:
1,2, and 4 are handled by Kubernetes, not by Kubespray itself (for 2 check PodDisruptionBudget)
3. This is handled by Kubespray when using the upgrade-cluster.yml playbook.
/kind support

@VannTen VannTen closed this as completed Aug 27, 2024
@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

4 participants