From 18f16d9391813363a01e820e2b54dd4d0e7e5416 Mon Sep 17 00:00:00 2001 From: Suraj Deshmukh Date: Fri, 14 Aug 2020 14:50:01 +0530 Subject: [PATCH] docs: Add "How to upgrade etcd" Signed-off-by: Suraj Deshmukh --- docs/how-to-guides/upgrade-etcd.md | 119 +++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 docs/how-to-guides/upgrade-etcd.md diff --git a/docs/how-to-guides/upgrade-etcd.md b/docs/how-to-guides/upgrade-etcd.md new file mode 100644 index 000000000..5f202f51e --- /dev/null +++ b/docs/how-to-guides/upgrade-etcd.md @@ -0,0 +1,119 @@ +# Upgrading etcd + +## Contents + +- [Introduction](#introduction) +- [Steps](#steps) + - [Step 1: Find out the IP and SSH](#step-1-find-out-the-ip-and-ssh) + - [Step 2: Create necessary directories with correct permissions](#step-2-create-necessary-directories-with-correct-permissions) + - [Step 3: Upgrade Kubelet](#step-3-upgrade-kubelet) + - [Step 4: Verify upgrade](#step-4-verify-upgrade) + - [Step 5: Verify using `etcdctl`](#step-5-verify-using-etcdctl) + +## Introduction + +[Etcd](https://etcd.io/) is the most crucial component of a Kubernetes cluster. It stores the cluster state. + +This document will provide step by step guide on upgrading etcd in Lokomotive. + +## Steps + +Repeat the following steps on all the controller node one node at a time. + +### Step 1: Find out the IP and SSH + +Find the IP of the controller node by visiting the cloud provider dashboard and ssh into it. + +```bash +ssh core@ +``` + +### Step 2: Create necessary directories with correct permissions + +Latest etcd (`v3.4.10`) necessitates the data directory permissions to be `0700`, accordingly change the permissions. Verify the permissions are changed to `rwx------`. + +```bash +sudo chmod 0700 /var/lib/etcd/ +sudo ls -ld /var/lib/etcd/ +``` + +If the node reboots, we need the right settings in place so that `systemd-tmpfile` service does not alter the permissions of the data directory. To make the changes made above persistent run the following command: + +```bash +echo "d /var/lib/etcd 0700 etcd etcd - -" | sudo tee /etc/tmpfiles.d/etcd-wrapper.conf +``` + +### Step 3: Upgrade Kubelet + +Run the following commands: + +> **NOTE**: Before proceeding to other commands, set the `etcd_version` variable to the latest etcd version. + +```bash +export etcd_version= + +sudo sed -i "s|$(grep ETCD_IMAGE_TAG /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf)|Environment=\"ETCD_IMAGE_TAG=${etcd_version}\"|g" /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf +systemctl daemon-reload +systemctl restart etcd-member +``` + +### Step 4: Verify upgrade + +Verify that the etcd service is in `active (running)` state: + +```bash +systemctl status --no-pager etcd-member +``` + +Run the following command to see logs of the process since the last restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID=$(systemctl show -p InvocationID --value etcd-member.service) +``` + +Once you see the following log line, you can discern that the etcd daemon has come up without errors: + +```log +etcdserver: starting server... [version: 3.4.10, cluster version: to_be_decided] +``` + +Once you see the following log line, you can discern that the etcd has rejoined the cluster without issues: + +```log +embed: serving client requests on 10.88.81.1:2379 +``` + +### Step 5: Verify using `etcdctl` + +We can use `etcdctl` client to verify the state of etcd cluster. + +> **NOTE**: These commands can only be run on the `*controller-0` of your cluster because the required client certificates are readily available on that node. + +> **NOTE**: Before proceeding to other commands, set the `no_of_controller_nodes` variable to the number of controller nodes in the cluster. + +```bash +export no_of_controller_nodes= + +# Get the required certificates: +cd /opt/bootkube/assets/tls/ +export ETCDCTL_API=3 +export cacert=etcd-client-ca.crt +export cert=etcd-client.crt +export key=etcd-client.key + +# Find the endpoint of etcd0: +export endpoint=$(grep ETCD_ADVERTISE_CLIENT_URLS /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf | cut -d"=" -f3 | sed 's|"||g') +export endpoints="${endpoint}" + +# Create list of other endpoints: +for ((n = 1; n < no_of_controller_nodes; n++)); do + np=$(sed "s|etcd0|etcd${n}|g" <<< $endpoint) + endpoints="${endpoints},${np}" +done + +# Verify: +etcdctl member list --cacert=$cacert --cert=$cert --key=$key --endpoints=$endpoints +etcdctl endpoint health --cacert=$cacert --cert=$cert --key=$key --endpoints=$endpoints +``` + +The last command should report each node as healthy.