Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
docs: Add "How to upgrade etcd"
Browse files Browse the repository at this point in the history
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
  • Loading branch information
surajssd committed Aug 14, 2020
1 parent 9172b20 commit f5d24e0
Showing 1 changed file with 117 additions and 0 deletions.
117 changes: 117 additions & 0 deletions docs/how-to-guides/upgrade-etcd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Upgrading etcd

## Contents

- [Introduction](#introduction)
- [Steps](#steps)
- [Step 1: Find out the IP and SSH](#step-1-find-out-the-ip-and-ssh)
- [Step 2: Create necessary directories with correct permissions](#step-2-create-necessary-directories-with-correct-permissions)
- [Step 3: Upgrade etcd](#step-3-upgrade-etcd)
- [Step 4: Verify upgrade](#step-4-verify-upgrade)
- [Step 5: Verify using `etcdctl`](#step-5-verify-using-etcdctl)

## Introduction

[Etcd](https://etcd.io/) is the most crucial component of a Kubernetes cluster. It stores the cluster state.

This document will provide step by step guide on upgrading etcd in Lokomotive.

## Steps

Repeat the following steps on all the controller node one node at a time.

### Step 1: Find out the IP and SSH

Find the IP of the controller node by visiting the cloud provider dashboard and ssh into it.

```bash
ssh core@<IP Address>
```

### Step 2: Create necessary directories with correct permissions

Latest etcd (`v3.4.10`) necessitates the data directory permissions to be `0700`, accordingly change the permissions. Verify the permissions are changed to `rwx------`.

```bash
sudo chmod 0700 /var/lib/etcd/
sudo ls -ld /var/lib/etcd/
```

If the node reboots, we need the right settings in place so that `systemd-tmpfile` service does not alter the permissions of the data directory. To make the changes made above persistent run the following command:

```bash
echo "d /var/lib/etcd 0700 etcd etcd - -" | sudo tee /etc/tmpfiles.d/etcd-wrapper.conf
```

### Step 3: Upgrade etcd

Run the following commands:

> **NOTE**: Before proceeding to other commands, set the `etcd_version` variable to the latest etcd version.
```bash
export etcd_version=<latest etcd version e.g. v3.4.10>

sudo sed -i "s,ETCD_IMAGE_TAG=.*,ETCD_IMAGE_TAG=${etcd_version}," \
/etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf
sudo systemctl daemon-reload
sudo systemctl restart etcd-member
```

### Step 4: Verify upgrade

Verify that the etcd service is in `active (running)` state:

```bash
sudo systemctl status --no-pager etcd-member
```

Run the following command to see logs of the process since the last restart:

```bash
sudo journalctl _SYSTEMD_INVOCATION_ID=$(sudo systemctl \
show -p InvocationID --value etcd-member.service)
```

Once you see the following log line, you can discern that the etcd daemon has come up without errors:

```log
etcdserver: starting server... [version: 3.4.10, cluster version: to_be_decided]
```

Once you see the following log line, you can discern that the etcd has rejoined the cluster without issues:

```log
embed: serving client requests on 10.88.81.1:2379
```

### Step 5: Verify using `etcdctl`

We can use `etcdctl` client to verify the state of etcd cluster.

> **NOTE**: Before proceeding to other commands, set the `no_of_controller_nodes` variable to the number of controller nodes in the cluster.
```bash
export no_of_controller_nodes=<no of controller nodes>

# Find the endpoint of etcd0:
export endpoint=$(grep ETCD_ADVERTISE_CLIENT_URLS /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf | cut -d"=" -f3 | tr -d '"')
export endpoints="${endpoint}"

# Create list of other endpoints:
for ((n = 1; n < no_of_controller_nodes; n++)); do
np=$(sed "s|etcd0|etcd${n}|g" <<< $endpoint)
endpoints="${endpoints},${np}"
done

export flags="--cacert=/etc/ssl/etcd/etcd-client-ca.crt \
--cert=/etc/ssl/etcd/etcd-client.crt \
--key=/etc/ssl/etcd/etcd-client.key \
--endpoints=${endpoints}"

# Verify:
sudo ETCDCTL_API=3 etcdctl member list $flags
sudo ETCDCTL_API=3 etcdctl endpoint health $flags
```

The last command should report each node as healthy.

0 comments on commit f5d24e0

Please sign in to comment.