Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add user guide #187

Merged
merged 7 commits into from
Nov 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,19 @@ Read the [Roadmap](./ROADMAP.md).

## Quick start

Read the [Deploy TiDB using Kubernetes on Your Laptop for development and testing](./docs/local-dind-tutorial.md), or follow a [tutorial](./docs/google-kubernetes-tutorial.md) to launch in Google Kubernetes Engine:
Choose one of the following tutorials:

[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/pingcap/tidb-operator&tutorial=docs/google-kubernetes-tutorial.md)
* [Deploy TiDB using Kubernetes on Your Laptop for deployment and testing](./docs/local-dind-tutorial.md)

* [Deploy TiDB by launching a Google Kubernetes Engine](./docs/google-kubernetes-tutorial.md):

[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/pingcap/tidb-operator&tutorial=docs/google-kubernetes-tutorial.md)

* [Deploy TiDB by launching an AWS EKS cluster](./docs/aws-eks-tutorial.md)

## User guide

Read the [user guide](./docs/user-guide.md).

## Contributing

Expand Down
139 changes: 139 additions & 0 deletions docs/operation-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# TiDB Cluster Operation Guide

TiDB Operator can manage multiple clusters in the same Kubernetes cluster. Clusters are qualified by `namespace` and `clusterName`, namely different clusters may have same `namespace` or `clusterName` but not both.

The default `clusterName` is `demo` which is defined in charts/tidb-cluster/values.yaml. The following variables will be used in the rest of the document:

```shell
$ releaseName="tidb-cluster"
$ namespace="tidb"
$ clusterName="demo" # Make sure this is the same as variable defined in charts/tidb-cluster/values.yaml
```

> **Note:** The rest of the document will use `values.yaml` to reference `charts/tidb-cluster/values.yaml`

## Deploy TiDB cluster

After TiDB Operator and Helm are deployed correctly, TiDB cluster can be deployed using following command:

```shell
$ helm install charts/tidb-cluster --name=${releaseName} --namespace=${namespace}
$ kubectl get po -n ${namespace} -l app.kubernetes.io/name=tidb-operator
```

The default deployment doesn't set CPU and memory requests or limits for any of the pods, and the storage used is `local-storage` with minimal size. These settings can make TiDB cluster run on a small Kubernetes cluster like DinD or the default GKE cluster for testing. But for production deployment, you would likely to adjust the cpu, memory and storage resources according to the [recommendations](https://github.com/pingcap/docs/blob/master/op-guide/recommendation.md).

The resource limits should be equal or bigger than the resource requests, it is suggested to set limit and request equal to get [`Guaranteed` QoS]( https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed).

For other settings, the variables in `values.yaml` are self-explanatory with comments. You can modify them according to your need before installing the charts.

## Access TiDB cluster

By default TiDB service is exposed using [`NodePort`](https://kubernetes.io/docs/concepts/services-networking/service/#nodeport). You can modify it to `ClusterIP` which will disable access from outside of the cluster. Or modify it to [`LoadBalancer`](https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer) if the underlining Kubernetes supports this kind of service.

By default TiDB cluster is deployed with a random generated password. You can specify a password by setting `tidb.password` in `values.yaml` before deploying. Whether you specify the password or not, you can retrieve the password through `Secret`:

```shell
$ PASSWORD=$(kubectl get secret -n ${namespace} ${clusterName}-tidb -ojsonpath="{.data.password}" base64 -c | awk '{print $6}')
$ echo ${PASSWORD}
$ kubectl get svc -n ${namespace} # check the available services
```

* Access inside of the Kubernetes cluster

When your application is deployed in the same Kubernetes cluster, you can access TiDB via domain name `demo-tidb.tidb.svc` with port `4000`. Here `demo` is the `clusterName` which can be modified in `values.yaml`. And the latter `tidb` is the namespace you specified when using `helm install` to deploy TiDB cluster.

* Access outside of the Kubernetes cluster

* Using kubectl portforward

```shell
$ kubectl port-forward -n ${namespace} svc/${clusterName}-tidb 4000:4000 &>/tmp/portforward-tidb.log
$ mysql -h 127.0.0.1 -P 4000 -u root -p
```

* Using LoadBalancer

When you set `tidb.service.type` to `LoadBalancer` and the underlining Kubernetes support LoadBalancer, then a LoadBalancer will be created for TiDB service. You can access it via the external IP with port `4000`. Some cloud platforms support internal load balancer via service annotations, for example you can add annotation `cloud.google.com/load-balancer-type: Internal` to `tidb.service.annotations` to create an internal load balancer for TiDB on GKE.

* Using NodePort

You can access TiDB via any node's IP with tidb service node port. The node port is the port after `4000`, usually greater than `30000`.

## Scale TiDB cluster
tennix marked this conversation as resolved.
Show resolved Hide resolved

TiDB Operator has full support of horizontal scaling. But for vertical scaling, if you're using local volumes for PD and TiKV, then scaling up may cause pod pending if the node doesn't have enough resources. So it's not recommended to do vertical scaling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about vertical scale-down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though scaling down is doable. Vertical scaling is not recommended, users have to adjust some configurations for TiDB when scaling vertically to make it have better performance. Besides scaling down will kill the pod one by one other than in-place scale down which might not what user expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about TiDB SQL? I should be able to scale those vertically?

I created a github issue focused on TiKV/PD here: #191

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the recommendations for TiDB cluster deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot more thorough now, thanks! I still don't know from the guide if I can safely vertical scale TIDB SQL.


To scale in/out TiDB cluster, just modify the `replicas` of PD, TiKV and TiDB in `values.yaml` file. And then run the following command:

```shell
$ helm upgrade ${releaseName} charts/tidb-cluster
```

tennix marked this conversation as resolved.
Show resolved Hide resolved
To scale up/down TiDB cluster, modify the cpu/memory limits and requests of PD, TiKV and TiDB in `values.yaml` file. And then run the same command as above. (Note: This may fail when using local volumes.)

## Upgrade TiDB cluster

Upgrade TiDB cluster is similar to scale TiDB cluster, but by changing `image` of PD, TiKV and TiDB to different image versions in `values.yaml`. And then run the following command:

```shell
$ helm upgrade ${releaseName} charts/tidb-cluster
```

## Destroy TiDB cluster

To destroy TiDB cluster, run the following command:

```shell
$ helm delete ${releaseName} --purge
```

The above command only delete the running pods, the data is persistent. If you do not need the data anymore, you can run the following command to clean the data:

```shell
$ kubectl delete pvc -n ${namespace} -l app.kubernetes.io/instance=${releaseName},app.kubernetes.io/managed-by=tidb-operator
$ kubectl get pv -l app.kubernetes.io/namespace=${namespace},app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/instance=${releaseName} -o name | xargs -I {} kubectl patch {} -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
```

> **Note:** the above command will delete the data permanently. Think twice before executing them.

## Monitor

TiDB cluster is monitored with Prometheus and Grafana. When TiDB cluster is created, a Prometheus and Grafana pod will be created and configured to scrape and visualize metrics.

By default the monitor data is not persistent, when the monitor pod is killed for some reason, the data will be lost. This can be avoided by specifying `monitor.persistent` to `true` in `values.yaml` file.

You can view the dashboard using `kubectl portforward`:

```shell
$ kubectl port-forward -n ${namespace} svc/${clusterName}-grafana 3000:3000 &>/tmp/portforward-grafana.log
```

Then open your browser at http://localhost:3000 The default username and password are both `admin`

The Grafana service is exposed as `NodePort` by default, you can change it to `LoadBalancer` if the underlining Kubernetes has load balancer support. And then view the dashboard via load balancer endpoint.

## Backup

Currently, TiDB Operator supports two kinds of backup: full backup via [Mydumper](https://github.com/maxbube/mydumper) and incremental backup via binlog.

### Full backup

Full backup can be done periodically just like crontab job. Currently, full backup requires a PersistentVolume, the backup job will create a PVC to store backup data.

To create a full backup job, modify `fullbackup` section in `values.yaml` file.

* `create` must be set to `true`
* Set `storageClassName` to the PV storage class name used for backup data
* `schedule` takes the [Cron](https://en.wikipedia.org/wiki/Cron) format
* `user` and `password` must be set to the correct user which has the permission to read the database to be backuped.

If TiDB cluster is running on GKE, the backup data can be uploaded to GCS bucket. A bucket name and base64 encoded service account credential that has bucket read/write access must be provided. The comments in `values.yaml` is self-explanatory for GCP backup.

### Incremental backup

To enable incremental backup, set `binlog.pump.create` and `binlog.drainer.create` to `true`. By default the incremental backup data is stored in protobuffer format in a PV. You can change `binlog.drainer.destDBType` from `pb` to `mysql` or `kafka` and configure the corresponding downstream.

## Restore

Currently, tidb-operator only supports restoring from full backup in GCS bucket. The `restore` section in `values.yaml` should have enough comments as document.
81 changes: 81 additions & 0 deletions docs/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# TiDB Operator Setup

## Requirements

Before deploying the TiDB Operator, make sure the following requirements are satisfied:

* Kubernetes v1.10 or later
* [DNS addons](https://kubernetes.io/docs/tasks/access-application-cluster/configure-dns-cluster/)
* [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
* [RBAC](https://kubernetes.io/docs/admin/authorization/rbac) enabled (optional)
* [Helm](https://helm.sh) v2.8.2 or later

> **Note:** Though TiDB Operator can use network volume to persist TiDB data, it is highly recommended to set up [local volume](https://kubernetes.io/docs/concepts/storage/volumes/#local) for better performance. Because TiDB already replicates data, network volume will add extra replicas which is redundant.

## Kubernetes

TiDB Operator runs on top of Kubernetes cluster, you can use one of the methods listed [here](https://kubernetes.io/docs/setup/pick-right-solution/) to set up a Kubernetes cluster. Just make sure the Kubernetes cluster version is equal or greater than v1.10. If you want to use AWS, GKE or local machine, there are quick start tutorials:

* [Local DinD tutorial](./local-dind-tutorial.md)
* [Google GKE tutorial](./google-kubernetes-tutorial.md)
* [AWS EKS tutorial](./aws-eks-tutorial.md)

If you want to use a different envirnoment, a proper DNS addon must be installed in the Kubernetes cluster. You can follow the [official documentation](https://kubernetes.io/docs/tasks/access-application-cluster/configure-dns-cluster/) to set up a DNS addon.

TiDB Operator uses [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) to persist TiDB cluster data (including the database, monitor data, backup data), so the Kubernetes must provide at least one kind of persistent volume. To achieve better performance, local SSD disk persistent volume is recommended. You can follow [this step](#local-persistent-volume) to auto provisioning local persistent volumes.

The Kubernetes cluster is suggested to enable [RBAC](https://kubernetes.io/docs/admin/authorization/rbac). Otherwise you may want to set `rbac.create` to `false` in the values.yaml of both tidb-operator and tidb-cluster charts.

Because TiDB by default will use at most 40960 file descriptors, the [worker node](https://access.redhat.com/solutions/61334) and its [Docker daemon's](https://docs.docker.com/engine/reference/commandline/dockerd/#default-ulimit-settings) ulimit must be configured to greater than 40960. Otherwise you have to change TiKV's `max-open-files` to match your work node `ulimit -n` in the configuration file `charts/tidb-cluster/templates/config/_tikv-config.tpl`, but this will impact TiDB performance.

## Helm

You can follow Helm official [documentation](https://helm.sh) to install Helm in your Kubernetes cluster. The following instructions are listed here for quick reference:

1. Install helm client

```
$ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
```

Or if you use macOS, you can use homebrew to install Helm by `brew install kubernetes-helm`

2. Install helm server

```shell
$ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/tiller-rbac.yaml
$ helm init --service-account=tiller --upgrade
$ kubectl get po -n kube-system -l name=tiller # make sure tiller pod is running
```

If `RBAC` is not enabled for the Kubernetes cluster, then `helm init --upgrade` should be enough.

## Local Persistent Volume

Local disks are recommended to be formatted as ext4 filesystem.

Mount local ssd disks of your Kubernetes nodes at subdirectory of /mnt/disks. For example if your data disk is `/dev/nvme0n1`, you can format and mount with the following commands:

```shell
$ sudo mkdir -p /mnt/disks/disk0
$ sudo mkfs.ext4 /dev/nvme0n1
$ sudo mount -t ext4 -o nodelalloc /dev/nvme0n1 /mnt/disks/disk0
```

To auto-mount disks when your operating system is booted, you should edit `/etc/fstab` to include these mounting info.

After mounting all data disks on Kubernetes nodes, you can deploy [local-volume-provisioner](https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume) to automatically provision the mounted disks as Local PersistentVolumes.

```shell
$ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/local-dind/local-volume-provisioner.yaml
$ kubectl get po -n kube-system -l app=local-volume-provisioner
$ kubectl get pv | grep local-storage
```

## Install TiDB Operator

```shell
$ kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/crd.yaml
$ helm install charts/tidb-operator --name=tidb-operator --namespace=tidb-admin
$ kubectl get po -n tidb-admin -l app.kubernetes.io/name=tidb-operator
```
26 changes: 26 additions & 0 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Troubleshooting

## Some pods are pending for a long time

When a pod is pending, it means the required resources are not satisfied. The most common cases are:

* CPU, memory or storage insufficient

Check the detail info of the pod by:

```shell
$ kubectl describe po -n <ns> <pod-name>
```

When this happens, either reduce the resource requests of the TiDB cluster and then using `helm` to upgrade the cluster. If the storage request is larger than any of the available volumes, you have to delete the pod and corresponding pending PVC.

* Storage class not exist or no PV available

You can check this by:

```shell
$ kubectl get pvc -n <ns>
$ kubectl get pv | grep <storage-class-name> | grep Available
```

When this happens, you can change the `storageClassName` and then using `helm` to upgrade the cluster. After that, delete the pending pods and the corresponding pending PVC and waiting new pod and pvc to be created.
13 changes: 13 additions & 0 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# TiDB Operator User Guide

For quick start, please reference one of the following tutorials:

* [Local DinD tutorial](./local-dind-tutorial.md)
* [Google GKE tutorial](./google-kubernetes-tutorial.md)
* [AWS EKS tutorial](./aws-eks-tutorial.md)

If you are already familiar with [Kubernetes](https://kubernetes.io) and [TiDB](https://pingcap.com/docs), the following docs can be helpful for managing TiDB clusters with TiDB Operator

* [TiDB Operator Setup](./setup.md)
* [TiDB Cluster Operation Guide](./operation-guide.md)
* [Troubleshooting](./troubleshooting.md)