From 86173878a8ca56f3b4b6840690cb82be9eae20d7 Mon Sep 17 00:00:00 2001 From: Yilong Li Date: Mon, 7 Dec 2020 15:08:03 +0800 Subject: [PATCH 1/2] en, zh: fix tiflash deployment example (#904) * fix tidb-operator issue 3563 * fix pd.config.enable-placement-rules --- en/configure-a-tidb-cluster.md | 4 ++-- en/deploy-tiflash.md | 2 +- zh/configure-a-tidb-cluster.md | 4 ++-- zh/deploy-tiflash.md | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/en/configure-a-tidb-cluster.md b/en/configure-a-tidb-cluster.md index 44899f5bbb..986c1db116 100644 --- a/en/configure-a-tidb-cluster.md +++ b/en/configure-a-tidb-cluster.md @@ -167,14 +167,14 @@ The deployed cluster topology by default has three PD Pods, three TiKV Pods, and #### Enable TiFlash -If you want to enable TiFlash in the cluster, configure `spec.pd.config.replication.enable-placement-rules` to `true` and configure `spec.tiflash` in the `${cluster_name}/tidb-cluster.yaml` file as follows: +If you want to enable TiFlash in the cluster, configure `spec.pd.config.replication.enable-placement-rules: true` and configure `spec.tiflash` in the `${cluster_name}/tidb-cluster.yaml` file as follows: ```yaml pd: config: ... replication: - enable-placement-rules: "true" + enable-placement-rules: true ... tiflash: baseImage: pingcap/tiflash diff --git a/en/deploy-tiflash.md b/en/deploy-tiflash.md index c4f39fa666..a10ef8ea3e 100644 --- a/en/deploy-tiflash.md +++ b/en/deploy-tiflash.md @@ -75,7 +75,7 @@ TiFlash supports mounting multiple Persistent Volumes (PVs). If you want to conf > > Since TiDB Operator will mount PVs automatically in the **order** of the items in the `storageClaims` list, if you need to add more disks to TiFlash, make sure to append the new item only to the **end** of the original items, and **DO NOT** modify the order of the original items. -To add TiFlash component to an existing TiDB cluster, you need to set `replication.enable-placement-rules` to `true` in PD. After you add the TiFlash configuration in `TidbCluster` by taking the above steps, TiDB Operator automatically configures `replication.enable-placement-rules: "true"` in PD. +To add TiFlash component to an existing TiDB cluster, you need to set `replication.enable-placement-rules: true` in PD. After you add the TiFlash configuration in `TidbCluster` by taking the above steps, TiDB Operator automatically configures `replication.enable-placement-rules: true` in PD. If the server does not have an external network, refer to [deploy the TiDB cluster](deploy-on-general-kubernetes.md#deploy-the-tidb-cluster) to download the required Docker image on the machine with an external network and upload it to the server. diff --git a/zh/configure-a-tidb-cluster.md b/zh/configure-a-tidb-cluster.md index 68a8cd9fdf..484d25ff01 100644 --- a/zh/configure-a-tidb-cluster.md +++ b/zh/configure-a-tidb-cluster.md @@ -162,14 +162,14 @@ PD 和 TiKV 支持配置 `mountClusterClientSecret`,建议配置 `spec.pd.moun #### 部署 TiFlash -如果要在集群中开启 TiFlash,需要在 `${cluster_name}/tidb-cluster.yaml` 文件中配置 `spec.pd.config.replication.enable-placement-rules: "true"`,并配置 `spec.tiflash`: +如果要在集群中开启 TiFlash,需要在 `${cluster_name}/tidb-cluster.yaml` 文件中配置 `spec.pd.config.replication.enable-placement-rules: true`,并配置 `spec.tiflash`: ```yaml pd: config: ... replication: - enable-placement-rules: "true" + enable-placement-rules: true ... tiflash: baseImage: pingcap/tiflash diff --git a/zh/deploy-tiflash.md b/zh/deploy-tiflash.md index ab189e8504..d8ab19cdfe 100644 --- a/zh/deploy-tiflash.md +++ b/zh/deploy-tiflash.md @@ -75,7 +75,7 @@ TiFlash 支持挂载多个 PV,如果要为 TiFlash 配置多个 PV,可以在 > > 由于 TiDB Operator 会按照 `storageClaims` 列表中的配置**按顺序**自动挂载 PV,如果需要为 TiFlash 增加磁盘,请确保只在列表原有配置**最后添加**,并且**不能**修改列表中原有配置的顺序。 -新增部署 TiFlash 需要 PD 配置 `replication.enable-placement-rules: "true"`,通过上述步骤在 TidbCluster 中增加 TiFlash 配置后,TiDB Operator 会自动为 PD 配置 `replication.enable-placement-rules: "true"`。 +新增部署 TiFlash 需要 PD 配置 `replication.enable-placement-rules: true`,通过上述步骤在 TidbCluster 中增加 TiFlash 配置后,TiDB Operator 会自动为 PD 配置 `replication.enable-placement-rules: true`。 如果服务器没有外网,请参考[部署 TiDB 集群](deploy-on-general-kubernetes.md#部署-tidb-集群)在有外网的机器上将用到的 Docker 镜像下载下来并上传到服务器上。 From aa9570f5daae3f881003a4f523fb810bb2bdda6e Mon Sep 17 00:00:00 2001 From: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com> Date: Wed, 9 Dec 2020 13:43:05 +0800 Subject: [PATCH 2/2] en, zh: update aws deploy doc (#902) * update aws deploy doc * update english doc * Apply suggestions from code review Co-authored-by: Ran * update to use private network * update for tikv scaling out * Apply suggestions from code review Co-authored-by: Ran Co-authored-by: Ran --- en/deploy-on-aws-eks.md | 144 +++++++++++++++++++++++++++++++++++----- zh/deploy-on-aws-eks.md | 144 +++++++++++++++++++++++++++++++++++----- 2 files changed, 252 insertions(+), 36 deletions(-) diff --git a/en/deploy-on-aws-eks.md b/en/deploy-on-aws-eks.md index b2beb3ee7c..4a0493c7a4 100644 --- a/en/deploy-on-aws-eks.md +++ b/en/deploy-on-aws-eks.md @@ -27,6 +27,8 @@ Before deploying a TiDB cluster on AWS EKS, make sure the following requirements ## Create a EKS cluster and a node pool +According to AWS [Official Blog](https://aws.amazon.com/blogs/containers/amazon-eks-cluster-multi-zone-auto-scaling-groups/) recommendation and EKS [Best Practice Document](https://aws.github.io/aws-eks-best-practices/reliability/docs/dataplane/#ensure-capacity-in-each-az-when-using-ebs-volumes), since most of the TiDB cluster components use EBS volumes as storage, it is recommended to create a node pool in each availability zone (at least 3 in total) for each component when creating an EKS. + Save the following configuration as the `cluster.yaml` file. Replace `${clusterName}` with your desired cluster name. {{< copyable "shell-regular" >}} @@ -36,36 +38,93 @@ apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${clusterName} - region: us-west-2 + region: ap-northeast-1 nodeGroups: - name: admin desiredCapacity: 1 + privateNetworking: true labels: dedicated: admin - - name: tidb - desiredCapacity: 2 + - name: tidb-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tidb + taints: + dedicated: tidb:NoSchedule + - name: tidb-1d + desiredCapacity: 0 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: tidb + taints: + dedicated: tidb:NoSchedule + - name: tidb-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - - name: pd - desiredCapacity: 3 + - name: pd-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: pd + taints: + dedicated: pd:NoSchedule + - name: pd-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: pd + taints: + dedicated: pd:NoSchedule + - name: pd-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: pd taints: dedicated: pd:NoSchedule - - name: tikv - desiredCapacity: 3 + - name: tikv-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tikv + taints: + dedicated: tikv:NoSchedule + - name: tikv-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: tikv + taints: + dedicated: tikv:NoSchedule + - name: tikv-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: tikv taints: dedicated: tikv:NoSchedule ``` +By default, only two TiDB nodes are required, so you can set the `desiredCapacity` of the `tidb-1d` node group to `0`. You can scale out this node group any time if necessary. + Execute the following command to create the cluster: {{< copyable "shell-regular" >}} @@ -74,9 +133,14 @@ Execute the following command to create the cluster: eksctl create cluster -f cluster.yaml ``` -> **Note:** +After executing the command above, you need to wait until the EKS cluster is successfully created and the node group is created and added in the EKS cluster. This process might take 5 to 10 minutes. For more cluster configuration, refer to [`eksctl` documentation](https://eksctl.io/usage/creating-and-managing-clusters/#using-config-files). + +> **Warning:** +> +> If the Regional Auto Scaling Group (ASG) is used: > -> After executing the command above, you need to wait until the EKS cluster is successfully created and the node group is created and added in the EKS cluster. This process might take 5 to 10 minutes. For more cluster configuration, refer to [`eksctl` documentation](https://eksctl.io/usage/creating-and-managing-clusters/#using-config-files). +> * [Enable the instance scale-in protection](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#instance-protection-instance) for all the EC2s that have been started. The instance scale-in protection for the ASG is not required. +> * [Set termination policy](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#custom-termination-policy) to `NewestInstance` for the ASG. ## Deploy TiDB Operator @@ -262,12 +326,14 @@ This section describes how to scale out the EKS node group and TiDB components. ### Scale out EKS node group -The following example shows how to scale out the `tikv` group of the `${clusterName}` cluster to 4 nodes: +When scaling out TiKV, the node groups must be scaled out evenly among the different availability zones. The following example shows how to scale out the `tikv-1a`, `tikv-1c`, and `tikv-1d` groups of the `${clusterName}` cluster to 2 nodes: {{< copyable "shell-regular" >}} ```shell -eksctl scale nodegroup --cluster ${clusterName} --name tikv --nodes 4 --nodes-min 4 --nodes-max 4 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1a --nodes 2 --nodes-min 2 --nodes-max 2 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1c --nodes 2 --nodes-min 2 --nodes-max 2 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1d --nodes 2 --nodes-min 2 --nodes-max 2 ``` For more information on managing node groups, refer to [`eksctl` documentation](https://eksctl.io/usage/managing-nodegroups/). @@ -289,16 +355,53 @@ The two components are *not required* in the deployment. This section shows a qu In the configuration file of eksctl (`cluster.yaml`), add the following two items to add a node group for TiFlash/TiCDC respectively. `desiredCapacity` is the number of nodes you desire. ```yaml -- name: tiflash - desiredCapacity: 3 + - name: tiflash-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tiflash + taints: + dedicated: tiflash:NoSchedule + - name: tiflash-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] labels: - role: tiflash + dedicated: tiflash taints: dedicated: tiflash:NoSchedule - - name: ticdc + - name: tiflash-1c desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: - role: ticdc + dedicated: tiflash + taints: + dedicated: tiflash:NoSchedule + + - name: ticdc-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: ticdc + taints: + dedicated: ticdc:NoSchedule + - name: ticdc-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: ticdc + taints: + dedicated: ticdc:NoSchedule + - name: ticdc-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] + labels: + dedicated: ticdc taints: dedicated: ticdc:NoSchedule ``` @@ -418,6 +521,8 @@ Some AWS instance types provide additional [NVMe SSD local store volumes](https: > You cannot dynamically change the storage class of a running TiDB cluster. You can create a new cluster for testing. > > During the EKS upgrade, [data in the local storage will be lost](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-lifetime) due to the node reconstruction. When the node reconstruction occurs, you need to migrate data in TiKV. If you do not want to migrate data, it is recommended not to use the local disk in the production environment. +> +> As the node reconstruction will cause the data loss of local storage, refer to [AWS document](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html) to suspend the `ReplaceUnhealthy` process for the TiKV node group. For instance types that provide local volumes, see [AWS Instance Types](https://aws.amazon.com/ec2/instance-types/). Take `c5d.4xlarge` as an example: @@ -426,10 +531,13 @@ For instance types that provide local volumes, see [AWS Instance Types](https:// Modify the instance type of the TiKV node group in the `eksctl` configuration file to `c5d.4xlarge`: ```yaml - - name: tikv + - name: tikv-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] instanceType: c5d.4xlarge labels: - role: tikv + dedicated: tikv taints: dedicated: tikv:NoSchedule ... diff --git a/zh/deploy-on-aws-eks.md b/zh/deploy-on-aws-eks.md index edfac19aa1..6ee58db456 100644 --- a/zh/deploy-on-aws-eks.md +++ b/zh/deploy-on-aws-eks.md @@ -28,6 +28,8 @@ aliases: ['/docs-cn/tidb-in-kubernetes/dev/deploy-on-aws-eks/'] ## 创建 EKS 集群和节点池 +根据 AWS [官方博客](https://aws.amazon.com/cn/blogs/containers/amazon-eks-cluster-multi-zone-auto-scaling-groups/)推荐和 EKS [最佳实践文档](https://aws.github.io/aws-eks-best-practices/reliability/docs/dataplane/#ensure-capacity-in-each-az-when-using-ebs-volumes),由于 TiDB 集群大部分组件使用 EBS 卷作为存储,推荐在创建 EKS 的时候针对每个组件在每个可用区(至少 3 个可用区)创建一个节点池。 + 将以下配置存为 cluster.yaml 文件,并替换 `${clusterName}` 为自己想命名的集群名字: {{< copyable "shell-regular" >}} @@ -37,36 +39,93 @@ apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${clusterName} - region: us-west-2 + region: ap-northeast-1 nodeGroups: - name: admin desiredCapacity: 1 + privateNetworking: true labels: dedicated: admin - - name: tidb - desiredCapacity: 2 + - name: tidb-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tidb + taints: + dedicated: tidb:NoSchedule + - name: tidb-1d + desiredCapacity: 0 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: tidb + taints: + dedicated: tidb:NoSchedule + - name: tidb-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - - name: pd - desiredCapacity: 3 + - name: pd-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: pd + taints: + dedicated: pd:NoSchedule + - name: pd-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: pd + taints: + dedicated: pd:NoSchedule + - name: pd-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: pd taints: dedicated: pd:NoSchedule - - name: tikv - desiredCapacity: 3 + - name: tikv-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tikv + taints: + dedicated: tikv:NoSchedule + - name: tikv-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: tikv + taints: + dedicated: tikv:NoSchedule + - name: tikv-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: dedicated: tikv taints: dedicated: tikv:NoSchedule ``` +默认只需要两个 TiDB 节点,因此可以设置 `tidb-1d` 节点组的 `desiredCapacity` 为 `0`,后面如果需要可以随时扩容这个节点组。 + 执行以下命令创建集群: {{< copyable "shell-regular" >}} @@ -75,9 +134,14 @@ nodeGroups: eksctl create cluster -f cluster.yaml ``` -> **注意:** +该命令需要等待 EKS 集群创建完成,以及节点组创建完成并加入进去,耗时约 5~10 分钟。可参考 [eksctl 文档](https://eksctl.io/usage/creating-and-managing-clusters/#using-config-files)了解更多集群配置选项。 + +> **警告:** +> +> 如果使用了 Regional Auto Scaling Group (ASG): > -> 该命令需要等待 EKS 集群创建完成,以及节点组创建完成并加入进去,耗时约 5~10 分钟。可参考 [eksctl 文档](https://eksctl.io/usage/creating-and-managing-clusters/#using-config-files)了解更多集群配置选项。 +> * 为已经启动的 EC2 [开启实例缩减保护](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#instance-protection-instance),ASG 自身的实例缩减保护不需要打开。 +> * [设置 ASG 终止策略](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html#custom-termination-policy)为 `NewestInstance`。 ## 部署 TiDB Operator @@ -261,12 +325,14 @@ basic-grafana LoadBalancer 10.100.199.42 a806cfe84c12a4831aa3313e792e3eed- ### 扩容 EKS 节点组 -以下是将集群 `${clusterName}` 的 `tikv` 节点组扩容到 4 节点的示例: +TiKV 扩容需要保证在各可用区均匀扩容。以下是将集群 `${clusterName}` 的 `tikv-1a`、`tikv-1c`、`tikv-1d` 节点组扩容到 2 节点的示例: {{< copyable "shell-regular" >}} ```shell -eksctl scale nodegroup --cluster ${clusterName} --name tikv --nodes 4 --nodes-min 4 --nodes-max 4 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1a --nodes 2 --nodes-min 2 --nodes-max 2 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1c --nodes 2 --nodes-min 2 --nodes-max 2 +eksctl scale nodegroup --cluster ${clusterName} --name tikv-1d --nodes 2 --nodes-min 2 --nodes-max 2 ``` 更多节点组管理可参考 [eksctl 文档](https://eksctl.io/usage/managing-nodegroups/)。 @@ -288,16 +354,53 @@ eksctl scale nodegroup --cluster ${clusterName} --name tikv --nodes 4 --nodes-mi 在 eksctl 的配置文件 cluster.yaml 中新增以下两项,为 TiFlash/TiCDC 各自新增一个节点组。`desiredCapacity` 决定期望的节点数,根据实际需求而定。 ```yaml - - name: tiflash - desiredCapacity: 3 + - name: tiflash-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: tiflash + taints: + dedicated: tiflash:NoSchedule + - name: tiflash-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] labels: - role: tiflash + dedicated: tiflash taints: dedicated: tiflash:NoSchedule - - name: ticdc + - name: tiflash-1c desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] labels: - role: ticdc + dedicated: tiflash + taints: + dedicated: tiflash:NoSchedule + + - name: ticdc-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] + labels: + dedicated: ticdc + taints: + dedicated: ticdc:NoSchedule + - name: ticdc-1d + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1d"] + labels: + dedicated: ticdc + taints: + dedicated: ticdc:NoSchedule + - name: ticdc-1c + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1c"] + labels: + dedicated: ticdc taints: dedicated: ticdc:NoSchedule ``` @@ -413,6 +516,8 @@ AWS 部分实例类型提供额外的 [NVMe SSD 本地存储卷](https://docs.aw > 运行中的 TiDB 集群不能动态更换 storage class,可创建一个新的 TiDB 集群测试。 > > 由于 EKS 升级过程中节点重建,[本地盘数据会丢失](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-lifetime)。由于 EKS 升级或其他原因造成的节点重建,会导致需要迁移 TiKV 数据,如果无法接受这一点,则不建议在生产环境中使用本地盘。 +> +> 由于节点重建会导致本地存储数据丢失,请参考 [AWS 文档](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html)停止 TiKV 节点组的 `ReplaceUnhealthy` 功能。 了解哪些实例可提供本地存储卷,可以查看 [AWS 实例列表](https://aws.amazon.com/ec2/instance-types/)。以下以 `c5d.4xlarge` 为例: @@ -421,10 +526,13 @@ AWS 部分实例类型提供额外的 [NVMe SSD 本地存储卷](https://docs.aw 修改 `eksctl` 配置文件中 TiKV 节点组实例类型为 `c5d.4xlarge`: ```yaml - - name: tikv + - name: tikv-1a + desiredCapacity: 1 + privateNetworking: true + availabilityZones: ["ap-northeast-1a"] instanceType: c5d.4xlarge labels: - role: tikv + dedicated: tikv taints: dedicated: tikv:NoSchedule ...