Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
docs: Add "How to upgrade etcd"
Browse files Browse the repository at this point in the history
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
  • Loading branch information
surajssd committed Aug 14, 2020
1 parent 9172b20 commit 18f16d9
Showing 1 changed file with 119 additions and 0 deletions.
119 changes: 119 additions & 0 deletions docs/how-to-guides/upgrade-etcd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Upgrading etcd

## Contents

- [Introduction](#introduction)
- [Steps](#steps)
- [Step 1: Find out the IP and SSH](#step-1-find-out-the-ip-and-ssh)
- [Step 2: Create necessary directories with correct permissions](#step-2-create-necessary-directories-with-correct-permissions)
- [Step 3: Upgrade Kubelet](#step-3-upgrade-kubelet)
- [Step 4: Verify upgrade](#step-4-verify-upgrade)
- [Step 5: Verify using `etcdctl`](#step-5-verify-using-etcdctl)

## Introduction

[Etcd](https://etcd.io/) is the most crucial component of a Kubernetes cluster. It stores the cluster state.

This document will provide step by step guide on upgrading etcd in Lokomotive.

## Steps

Repeat the following steps on all the controller node one node at a time.

### Step 1: Find out the IP and SSH

Find the IP of the controller node by visiting the cloud provider dashboard and ssh into it.

```bash
ssh core@<IP Address>
```

### Step 2: Create necessary directories with correct permissions

Latest etcd (`v3.4.10`) necessitates the data directory permissions to be `0700`, accordingly change the permissions. Verify the permissions are changed to `rwx------`.

```bash
sudo chmod 0700 /var/lib/etcd/
sudo ls -ld /var/lib/etcd/
```

If the node reboots, we need the right settings in place so that `systemd-tmpfile` service does not alter the permissions of the data directory. To make the changes made above persistent run the following command:

```bash
echo "d /var/lib/etcd 0700 etcd etcd - -" | sudo tee /etc/tmpfiles.d/etcd-wrapper.conf
```

### Step 3: Upgrade Kubelet

Run the following commands:

> **NOTE**: Before proceeding to other commands, set the `etcd_version` variable to the latest etcd version.
```bash
export etcd_version=<latest etcd version e.g. v3.4.10>

sudo sed -i "s|$(grep ETCD_IMAGE_TAG /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf)|Environment=\"ETCD_IMAGE_TAG=${etcd_version}\"|g" /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf
systemctl daemon-reload
systemctl restart etcd-member
```

### Step 4: Verify upgrade

Verify that the etcd service is in `active (running)` state:

```bash
systemctl status --no-pager etcd-member
```

Run the following command to see logs of the process since the last restart:

```bash
journalctl _SYSTEMD_INVOCATION_ID=$(systemctl show -p InvocationID --value etcd-member.service)
```

Once you see the following log line, you can discern that the etcd daemon has come up without errors:

```log
etcdserver: starting server... [version: 3.4.10, cluster version: to_be_decided]
```

Once you see the following log line, you can discern that the etcd has rejoined the cluster without issues:

```log
embed: serving client requests on 10.88.81.1:2379
```

### Step 5: Verify using `etcdctl`

We can use `etcdctl` client to verify the state of etcd cluster.

> **NOTE**: These commands can only be run on the `*controller-0` of your cluster because the required client certificates are readily available on that node.
> **NOTE**: Before proceeding to other commands, set the `no_of_controller_nodes` variable to the number of controller nodes in the cluster.
```bash
export no_of_controller_nodes=<no of controller nodes>

# Get the required certificates:
cd /opt/bootkube/assets/tls/
export ETCDCTL_API=3
export cacert=etcd-client-ca.crt
export cert=etcd-client.crt
export key=etcd-client.key

# Find the endpoint of etcd0:
export endpoint=$(grep ETCD_ADVERTISE_CLIENT_URLS /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf | cut -d"=" -f3 | sed 's|"||g')
export endpoints="${endpoint}"

# Create list of other endpoints:
for ((n = 1; n < no_of_controller_nodes; n++)); do
np=$(sed "s|etcd0|etcd${n}|g" <<< $endpoint)
endpoints="${endpoints},${np}"
done

# Verify:
etcdctl member list --cacert=$cacert --cert=$cert --key=$key --endpoints=$endpoints
etcdctl endpoint health --cacert=$cacert --cert=$cert --key=$key --endpoints=$endpoints
```

The last command should report each node as healthy.

0 comments on commit 18f16d9

Please sign in to comment.