Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine backup and restore documentation #518

Merged
merged 15 commits into from
Jun 2, 2019
2 changes: 1 addition & 1 deletion charts/tidb-backup/templates/backup-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Values.clusterName }}-{{ .Values.name }}
name: {{ .Values.clusterName }}-{{ tpl .Values.name . }}
labels:
app.kubernetes.io/name: {{ template "chart.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
Expand Down
3 changes: 2 additions & 1 deletion charts/tidb-backup/templates/backup-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ .Values.name }}
name: {{ tpl .Values.name . }}
labels:
app.kubernetes.io/name: {{ template "chart.name" . }}
app.kubernetes.io/managed-by: tidb-operator
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: backup
pingcap.com/backup-cluster-name: {{ .Values.clusterName }}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
spec:
accessModes:
Expand Down
2 changes: 1 addition & 1 deletion charts/tidb-backup/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ clusterName: demo

mode: backup # backup | restore
# name is the backup name
name: fullbackup-20190306
name: fullbackup-{{ date "200601021504" .Release.Time }}
image:
pullPolicy: IfNotPresent
binlog: pingcap/tidb-binlog:v2.1.8
Expand Down
1 change: 1 addition & 0 deletions charts/tidb-cluster/templates/scheduled-backup-pvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ metadata:
app.kubernetes.io/managed-by: tidb-operator
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/component: scheduled-backup
pingcap.com/backup-cluster-name: {{ template "cluster.name" . }}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
spec:
accessModes:
Expand Down
109 changes: 109 additions & 0 deletions docs/backup-restore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Backup and Restore TiDB Cluster
aylei marked this conversation as resolved.
Show resolved Hide resolved

## Overview
aylei marked this conversation as resolved.
Show resolved Hide resolved

TiDB Operator supports two kinds of backup:

* [Full backup](#full-backup)(scheduled or ad-hoc) via [`mydumper`](https://www.pingcap.com/docs/dev/reference/tools/mydumper/), which helps you logical backup the TiDB cluster.
aylei marked this conversation as resolved.
Show resolved Hide resolved
* [Incremental backup](#incremental-backup) via [`TiDB-Binlog`](https://www.pingcap.com/docs/dev/reference/tools/tidb-binlog/overview/), which helps you synchronize the data in the TiDB cluster to other database or backup the data at real-time.
aylei marked this conversation as resolved.
Show resolved Hide resolved

Currently, tidb-operator only supports automatic [restore operation](#restore) for full backup taken by `mydumper`. Restore the backup data captured by `TiDB-Binlog` requires human intervention.
aylei marked this conversation as resolved.
Show resolved Hide resolved

## Full backup

Full backup using `mydumper` to take the logical backup of TiDB cluster. The backup job will create a PVC to store backup data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Full backup using `mydumper` to take the logical backup of TiDB cluster. The backup job will create a PVC to store backup data.
Full backup uses `mydumper` to take the logical backup of a TiDB cluster. The backup job creates a PVC to store backup data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How to understand the to take here? It's a little bit confusing.
  • It would be better to add the full name to PVC by using parentheses, like PVC (x x x).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Full backup uses mydumper to make a logical backup of a TiDB cluster"

Is this clearer?


By default, the backup uses PV to store the backup data. You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/) by changing the configuration. This way the PV temporarily stores backup data before it is placed in object storage. Refer to [TiDB cluster Backup configuration](./references/tidb-backup-configuration.md) for full configuration guide of backup and restore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, the backup uses PV to store the backup data. You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/) by changing the configuration. This way the PV temporarily stores backup data before it is placed in object storage. Refer to [TiDB cluster Backup configuration](./references/tidb-backup-configuration.md) for full configuration guide of backup and restore.
By default, the backup uses PV (Persistent Volume) to store the backup data. You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/) by changing the configuration. In this way, the PV temporarily stores backup data before it is placed in object storage. Refer to [TiDB cluster Backup configuration](./references/tidb-backup-configuration.md) for full configuration guide of backup and restore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does This way mean the Google Cloud Storage or Ceph Object Storage way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


You can either setup a scheduled full backup or take a full backup in ad-hoc manner.
aylei marked this conversation as resolved.
Show resolved Hide resolved

### Scheduled full backup

Scheduled full backup is created along side the TiDB cluster, and it runs periodically like the crontab job.
aylei marked this conversation as resolved.
Show resolved Hide resolved

To configure a scheduled full backup, modify the `scheduledBackup` section in the `charts/tidb-cluster/values.yaml` of the tidb cluster:
aylei marked this conversation as resolved.
Show resolved Hide resolved

* Set `scheduledBackup.create` to `true`
* Set `scheduledBackup.storageClassName` to the PV storage class name used for backup data

> **Note:** You must set the scheduled full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.
aylei marked this conversation as resolved.
Show resolved Hide resolved

* Configure `scheduledBackup.schedule` in the [Cron](https://en.wikipedia.org/wiki/Cron) format to define the scheduling
aylei marked this conversation as resolved.
Show resolved Hide resolved
* `scheduledBakcup.user` and `scheduledBackup.password` must be set to the correct user which has the permission to read the database to be backuped.
aylei marked this conversation as resolved.
Show resolved Hide resolved
aylei marked this conversation as resolved.
Show resolved Hide resolved

Then, create a new cluster with the scheduled full backup configured by `helm install`, or enabling scheduled full backup for existing cluster by `helm upgrade`:
aylei marked this conversation as resolved.
Show resolved Hide resolved

```shell
$ helm upgrade ${RELEASE_NAME} charts/tidb-cluster -f charts/tidb-cluster/values.yaml
```

### Ad-Hoc full backup

Ad-hoc backup runs to complete for once. This functionality is encapsulated in another helm chart, `charts/tidb-backup`. According to the `mode` in `charts/tidb-backup/values.yaml`, this chart can perform either full backup or restore. We will cover restore operation in the [restore section](#restore) of this document.
aylei marked this conversation as resolved.
Show resolved Hide resolved

To create an ad-hoc full backup job, modify the `charts/tidb-backup/values.yaml`:
aylei marked this conversation as resolved.
Show resolved Hide resolved

* Set the `clusterName` to the target TiDB cluster name
aylei marked this conversation as resolved.
Show resolved Hide resolved
* Set `mode` to `backup`
* Set `storage.className` to the PV storage class name used for backup data
* Adjust the `storage.size` according to your database size

> **Note:** You must set the ad-hoc full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.
aylei marked this conversation as resolved.
Show resolved Hide resolved

Create a secret containing the user and password that has the permission to backup the database:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Create a secret containing the user and password that has the permission to backup the database:
Create a secret containing the user and password that has the privilege to back up the database:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a secret what? Readers will feel that you forget one word. Cloud you explain what secret means here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a noun in Kubernetes context, actually I mean "Create a Kubernetes Secret containing...".
I will make it clear.


```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```

Then run the following command to create an ad-hoc backup job:

```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```

### View backups

For backups stored in PV, you can view the PVs by the following command:
aylei marked this conversation as resolved.
Show resolved Hide resolved

```shell
$ kubectl get pvc -n ${namespace} -l app.kubernetes.io/component=backup,pingcap.com/backup-cluster-name=${cluster_name}
```

If you store your backup data to [Google Cloud Storage](https://cloud.google.com/storage/) or [Ceph Object Storage](https://ceph.com/ceph-storage/object-storage/), you may view the backups by the related GUI or CLI tool.
aylei marked this conversation as resolved.
Show resolved Hide resolved

## Restore

The helm chart `charts/tidb-backup` helps restoring a TiDB cluster using backup data. To perform a restore operation, modify the `charts/tidb-backup/values.yaml`:
aylei marked this conversation as resolved.
Show resolved Hide resolved

* Set the `clusterName` to the target TiDB cluster name
aylei marked this conversation as resolved.
Show resolved Hide resolved
* Set the `mode` to `restore`
* Set the `name` to the backup name you want to restore([view backups](#view-backups) helps you view all the backups available). If the backup is stored in `Google Cloud Storage` or `Ceph Object Storage`, you have to configure the corresponding section too(likely, you will continue to use the same configuration you set in the [adhoc full backup](#ad-hoc-full-backup)).
aylei marked this conversation as resolved.
Show resolved Hide resolved

Create a secret containing the user and password that has the permission to restore the database (skip this if you've already created one in the [adhoc full backup](#ad-hoc-full-backup) section):
aylei marked this conversation as resolved.
Show resolved Hide resolved

```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```

Then, restore the backup:
```shell
$ helm install charts/tidb-backup --namespace=${namespace}
```

## Incremental backup

Incremental backup leverage the [`TiDB-Binlog`](https://www.pingcap.com/docs/dev/reference/tools/tidb-binlog/overview/) tool to collect binlog data from TiDB and provide real-time backup and synchronization to downstream platforms.
aylei marked this conversation as resolved.
Show resolved Hide resolved

Incremental backup is disabled in the TiDB cluster by default. To create a TiDB cluster with incremental backup enabled or enable incremental backup in existing TiDB cluster, you have to modify the `charts/tidb-cluster/values.yaml`:
aylei marked this conversation as resolved.
Show resolved Hide resolved

* Set `binlog.pump.create` to `true`
* Set `binlog.drainer.create` to `true`
* Set `binlog.pump.storageClassName` and `binlog.drainer.storageClassName` to a proper `storageClass` available in your kubernetes cluster
* Set `binlog.drainer.destDBType` to your desired downstream, explained in detail below

There's three types of downstream available for incremental backup:
aylei marked this conversation as resolved.
Show resolved Hide resolved

* PersistenceVolume: default downstream. You may consider configuring a large PersistenceVolume for `drainer` (the `binlog.drainer.storage` variable) in this case
aylei marked this conversation as resolved.
Show resolved Hide resolved
* MySQL compatible database: enable by setting the `binlog.drainer.destDBType` to `mysql`. You have to configure the target address and credential in the `binlog.drainer.mysql` section too.
aylei marked this conversation as resolved.
Show resolved Hide resolved
* Kafka: enable by setting the `binlog.drainer.destDBType` to `kafka`. You have to configure the zookeeper address and kafka address in the `binlog.drainer.kafka` section too.
aylei marked this conversation as resolved.
Show resolved Hide resolved
66 changes: 3 additions & 63 deletions docs/operation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,68 +222,8 @@ To retrieve logs from multiple pods, [`stern`](https://github.com/wercker/stern)
$ stern -n ${namespace} tidb -c slowlog
```

## Backup
## Backup and Restore
aylei marked this conversation as resolved.
Show resolved Hide resolved

Currently, TiDB Operator supports two kinds of backup: incremental backup via binlog and full backup(scheduled or ad-hoc) via [Mydumper](https://github.com/maxbube/mydumper).
TiDB Operator provides highly automated backup and recovery operations for TiDB cluster. You can easily take full backup or setup incremental backup of TiDB cluster, and restore the TiDB cluster when the cluster fails.
aylei marked this conversation as resolved.
Show resolved Hide resolved

### Incremental backup

To enable incremental backup, set `binlog.pump.create` and `binlog.drainer.create` to `true`. By default the incremental backup data is stored in protobuffer format in a PV. You can change `binlog.drainer.destDBType` from `pb` to `mysql` or `kafka` and configure the corresponding downstream.

### Full backup

Currently, full backup requires a PersistentVolume. The backup job will create a PVC to store backup data.

By default, the backup uses PV to store the backup data.
> **Note:** You must set the ad-hoc full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.

You can also store the backup data to [Google Cloud Storage](https://cloud.google.com/storage/) bucket or [Ceph object storage](https://ceph.com/ceph-storage/object-storage/) by configuring the corresponding section in `values.yaml`. This way the PV temporarily stores backup data before it is placed in object storage.

The comments in `values.yaml` is self-explanatory for both GCP backup and Ceph backup.

### Scheduled full backup

Scheduled full backup can be ran periodically just like crontab job.

To create a scheduled full backup job, modify `scheduledBackup` section in `values.yaml` file.

* `create` must be set to `true`
* Set `storageClassName` to the PV storage class name used for backup data
* `schedule` takes the [Cron](https://en.wikipedia.org/wiki/Cron) format
* `user` and `password` must be set to the correct user which has the permission to read the database to be backuped.

> **Note:** You must set the scheduled full backup PV's [reclaim policy](https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy) to `Retain` to keep your backup data safe.


### Ad-Hoc full backup

> **Note:** The rest of the document will use `values.yaml` to reference `charts/tidb-backup/values.yaml`

Ad-Hoc full backup can be done once just like job.

To create an ad-hoc full backup job, modify `backup` section in `values.yaml` file.

* `mode` must be set to `backup`
* Set `storage.className` to the PV storage class name used for backup data

Create a secret containing the user and password that has the permission to backup the database:

```shell
$ kubectl create secret generic backup-secret -n ${namespace} --from-literal=user=<user> --from-literal=password=<password>
```

Then run the following command to create an ad-hoc backup job:

```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```

## Restore

Restore is similar to backup. See the `values.yaml` file for details.

Modified the variables in `values.yaml` and then create restore job using the following command:

```shell
$ helm install charts/tidb-backup --name=<backup-name> --namespace=${namespace}
```
For detail operation guides of backup and restore, please refer to [Backup and Restore TiDB Cluster](./backup-restore.md).
aylei marked this conversation as resolved.
Show resolved Hide resolved
76 changes: 76 additions & 0 deletions docs/references/tidb-backup-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# TiDB Backup Configuration Reference

`TiDB-Backup` is a helm chart designed for TiDB cluster backup and restore via the [`mydumper`](https://www.pingcap.com/docs/dev/reference/tools/mydumper/) and [`loader`](https://www.pingcap.com/docs-cn/tools/loader/). This documentation will explain the configurations of `TiDB-Backup`, you may refer to [Restore and Backup TiDB cluster](#tidb-backup-configuration-reference) for user guide with example.
aylei marked this conversation as resolved.
Show resolved Hide resolved

## Configurations
aylei marked this conversation as resolved.
Show resolved Hide resolved

### `mode`

- To choose the operation, backup or restore, required
- Default: "backup"

### `clusterName`

- The TiDB cluster name that should backup from or restore to, required
aylei marked this conversation as resolved.
Show resolved Hide resolved
- Default: "demo"

### `name`

- The backup name
- Default: "fullbackup-${date}", date is the start time of backup, accurate to minute

### `secretName`

- The name of the secret which stores user and password used for backup/restore
- Default: "backup-secret"
- You can create the secret by `kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password=<password>`
aylei marked this conversation as resolved.
Show resolved Hide resolved

### `storage.className`

- The storageClass used to store the backup data
- Default: "local-storage"

### `storage.size`

- The storage size of PersistenceVolume
- Default: "100Gi"

### `backupOptions`

- The options that passed to [`mydumper`](https://github.com/maxbube/mydumper/blob/master/docs/mydumper_usage.rst#options)
aylei marked this conversation as resolved.
Show resolved Hide resolved
- Default: "--chunk-filesize=100"

### `restoreOptions`

- The options that passed to [`loader`](https://www.pingcap.com/docs-cn/tools/loader/)
aylei marked this conversation as resolved.
Show resolved Hide resolved
- Default: "-t 16"

### `gcp.bucket`

- The GCP bucket name to store backup data
aylei marked this conversation as resolved.
Show resolved Hide resolved

> **Note**: Once you set any variables under `gcp` section, the backup data will be uploaded to Google Cloud Storage, namely, you have to keep the configuration intact.
aylei marked this conversation as resolved.
Show resolved Hide resolved

### `gcp.secretName`

- The name of the secret which stores the gcp service account credentials json file
- You can create the secret by `kubectl create secret generic gcp-backup-secret --from-file=./credentials.json`. To download credentials json, refer to [Google Cloud Documentation](https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually)
aylei marked this conversation as resolved.
Show resolved Hide resolved

### `ceph.endpoint`

- The endpoint of ceph object storage

> **Note**: Once you set any variables under `ceph` section, the backup data will be uploaded to ceph object storage, namely, you have to keep the configuration intact.
aylei marked this conversation as resolved.
Show resolved Hide resolved

### `ceph.bucket`

- The bucket name of ceph object storage

### `ceph.secretName`

- The name of the secret which stores ceph object store access key and secret key
- You can create the secret by:

```shell
aylei marked this conversation as resolved.
Show resolved Hide resolved
$ kubectl create secret generic ceph-backup-secret --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
```