From 83c1a755d3d3a904df059b44b21bcad9c8e49f2e Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 7 Jan 2025 11:47:49 +0800 Subject: [PATCH 1/7] Add temp.md --- temp.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 temp.md diff --git a/temp.md b/temp.md new file mode 100644 index 0000000000000..af27ff4986a7b --- /dev/null +++ b/temp.md @@ -0,0 +1 @@ +This is a test file. \ No newline at end of file From 5423d2ea4239a1a4320487641cd92707fca0e61c Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 7 Jan 2025 11:47:53 +0800 Subject: [PATCH 2/7] Delete temp.md --- temp.md | 1 - 1 file changed, 1 deletion(-) delete mode 100644 temp.md diff --git a/temp.md b/temp.md deleted file mode 100644 index af27ff4986a7b..0000000000000 --- a/temp.md +++ /dev/null @@ -1 +0,0 @@ -This is a test file. \ No newline at end of file From 227622c64b5b454703b29fd1d51e384d182d6ed9 Mon Sep 17 00:00:00 2001 From: qiancai Date: Tue, 7 Jan 2025 14:20:39 +0800 Subject: [PATCH 3/7] add translation --- scale-tidb-using-tiup.md | 87 ++++++++------------------------- tiflash/tiflash-overview.md | 4 +- tiflash/troubleshoot-tiflash.md | 73 +++++++++++++++++++++++---- 3 files changed, 86 insertions(+), 78 deletions(-) diff --git a/scale-tidb-using-tiup.md b/scale-tidb-using-tiup.md index 03bc3561b5ed3..c836bbda33408 100644 --- a/scale-tidb-using-tiup.md +++ b/scale-tidb-using-tiup.md @@ -393,6 +393,8 @@ This section exemplifies how to remove a TiFlash node from the `10.0.1.4` host. ALTER TABLE . SET tiflash replica 'new_replica_num'; ``` + After executing this statement, TiDB modifies or deletes PD [placement rules](/configure-placement-rules.md) accordingly. Then, PD schedules data based on the updated placement rules. + 3. Perform step 1 again and make sure that there is no table with TiFlash replicas more than the number of TiFlash nodes after scale-in. ### 2. Perform the scale-in operation @@ -403,20 +405,28 @@ Perform the scale-in operation with one of the following solutions. 1. Confirm the name of the node to be taken down: - {{< copyable "shell-regular" >}} - ```shell tiup cluster display ``` 2. Remove the TiFlash node (assume that the node name is `10.0.1.4:9000` from Step 1): - {{< copyable "shell-regular" >}} - ```shell tiup cluster scale-in --node 10.0.1.4:9000 ``` +3. View the status of the removed TiFlash node: + + ```shell + tiup cluster display + ``` + +4. After the status of the removed TiFlash node becomes `Tombstone`, delete the information of the removed node from the TiUP topology (TiUP will automatically clean up the related data files of the `Tombstone` node): + + ```shell + tiup cluster prune + ``` + #### Solution 2. Manually remove a TiFlash node In special cases (such as when a node needs to be forcibly taken down), or if the TiUP scale-in operation fails, you can manually remove a TiFlash node with the following steps. @@ -427,8 +437,6 @@ In special cases (such as when a node needs to be forcibly taken down), or if th * If you use TiUP deployment, replace `pd-ctl` with `tiup ctl:v pd`: - {{< copyable "shell-regular" >}} - ```shell tiup ctl:v pd -u http://: store ``` @@ -443,8 +451,6 @@ In special cases (such as when a node needs to be forcibly taken down), or if th * If you use TiUP deployment, replace `pd-ctl` with `tiup ctl:v pd`: - {{< copyable "shell-regular" >}} - ```shell tiup ctl:v pd -u http://: store delete ``` @@ -455,70 +461,19 @@ In special cases (such as when a node needs to be forcibly taken down), or if th 3. Wait for the store of the TiFlash node to disappear or for the `state_name` to become `Tombstone` before you stop the TiFlash process. -4. Manually delete TiFlash data files (the location can be found in the `data_dir` directory under the TiFlash configuration of the cluster topology file). - -5. Delete information about the TiFlash node that goes down from the cluster topology using the following command: - - {{< copyable "shell-regular" >}} +4. Delete the information of the removed node from the TiUP topology (TiUP will automatically clean up the related data files of the `Tombstone` node): ```shell - tiup cluster scale-in --node : --force + tiup cluster prune ``` -> **Note:** -> -> Before all TiFlash nodes in the cluster stop running, if not all tables replicated to TiFlash are canceled, you need to manually clean up the replication rules in PD, or the TiFlash node cannot be taken down successfully. - -The steps to manually clean up the replication rules in PD are below: +### 3. View the cluster status -1. View all data replication rules related to TiFlash in the current PD instance: +```shell +tiup cluster display +``` - {{< copyable "shell-regular" >}} - - ```shell - curl http://:/pd/api/v1/config/rules/group/tiflash - ``` - - ``` - [ - { - "group_id": "tiflash", - "id": "table-45-r", - "override": true, - "start_key": "7480000000000000FF2D5F720000000000FA", - "end_key": "7480000000000000FF2E00000000000000F8", - "role": "learner", - "count": 1, - "label_constraints": [ - { - "key": "engine", - "op": "in", - "values": [ - "tiflash" - ] - } - ] - } - ] - ``` - -2. Remove all data replication rules related to TiFlash. Take the rule whose `id` is `table-45-r` as an example. Delete it by the following command: - - {{< copyable "shell-regular" >}} - - ```shell - curl -v -X DELETE http://:/pd/api/v1/config/rule/tiflash/table-45-r - ``` - -3. View the cluster status: - - {{< copyable "shell-regular" >}} - - ```shell - tiup cluster display - ``` - - Access the monitoring platform at using your browser, and view the status of the cluster and the new nodes. +Access the monitoring platform at using your browser, and view the status of the cluster and the new nodes. After the scale-out, the cluster topology is as follows: diff --git a/tiflash/tiflash-overview.md b/tiflash/tiflash-overview.md index d915441bf6f63..4e7a8d6dc8551 100644 --- a/tiflash/tiflash-overview.md +++ b/tiflash/tiflash-overview.md @@ -38,9 +38,9 @@ It is recommended that you deploy TiFlash in different nodes from TiKV to ensure Currently, data cannot be written directly into TiFlash. You need to write data in TiKV and then replicate it to TiFlash, because it connects to the TiDB cluster as a Learner role. TiFlash supports data replication in the unit of table, but no data is replicated by default after deployment. To replicate data of a specified table, see [Create TiFlash replicas for tables](/tiflash/create-tiflash-replicas.md#create-tiflash-replicas-for-tables). -TiFlash has three components: the columnar storage module, `tiflash proxy`, and `pd buddy`. `tiflash proxy` is responsible for the communication using the Multi-Raft consensus algorithm. `pd buddy` works with PD to replicate data from TiKV to TiFlash in the unit of table. +TiFlash consists of two main components: the columnar storage component, and the TiFlash proxy component. The TiFlash proxy component is responsible for the communication using the Multi-Raft consensus algorithm. -When TiDB receives the DDL command to create replicas in TiFlash, the `pd buddy` component acquires the information of the table to be replicated via the status port of TiDB, and sends the information to PD. Then PD performs the corresponding data scheduling according to the information provided by `pd buddy`. +After receiving a DDL command to create replicas for a table in TiFlash, TiDB automatically creates the corresponding [placement rules](/configure-placement-rules.md) in PD, and then PD performs the corresponding data scheduling based on these rules. ## Key features diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index e5737061b1600..13cd696a93c86 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -53,22 +53,16 @@ This is because TiFlash is in an abnormal state caused by configuration errors o 3. Check whether the TiFlash proxy status is normal through `pd-ctl`. - {{< copyable "shell-regular" >}} - ```shell - echo "store" | /path/to/pd-ctl -u http://${pd-ip}:${pd-port} + tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} store ``` The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy. -4. Check whether `pd buddy` can correctly print the logs (the log path is the value of `log` in the [flash.flash_cluster] configuration item; the default log path is under the `tmp` directory configured in the TiFlash configuration file). - -5. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash: - - {{< copyable "shell-regular" >}} +4. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash: ```shell - echo 'config placement-rules show' | /path/to/pd-ctl -u http://${pd-ip}:${pd-port} + tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default ``` Reconfirm the value of `default: count`. @@ -78,7 +72,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o > - When [Placement Rules](/configure-placement-rules.md) are enabled and multiple rules exist, the previously configured [`max-replicas`](/pd-configuration-file.md#max-replicas), [`location-labels`](/pd-configuration-file.md#location-labels), and [`isolation-level`](/pd-configuration-file.md#isolation-level) no longer take effect. To adjust the replica policy, use the interface related to Placement Rules. > - When [Placement Rules](/configure-placement-rules.md) are enabled and only one default rule exists, TiDB will automatically update this default rule when `max-replicas`, `location-labels`, or `isolation-level` configurations are changed. -6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. +5. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. ## Some queries return the `Region Unavailable` error @@ -94,6 +88,65 @@ Take the following steps to handle the data file corruption: 2. Delete the related data of the TiFlash node. 3. Redeploy the TiFlash node in the cluster. +## Removing TiFlash nodes is slow + +Take the following steps to handle this issue: + +1. Check whether any table has more TiFlash replicas than the number of TiFlash nodes available after the cluster scale-in: + + ```sql + SELECT * FROM information_schema.tiflash_replica WHERE REPLICA_COUNT > 'tobe_left_nodes'; + ``` + + `tobe_left_nodes` is the number of TiFlash nodes after the scale-in. + + If the query result is not empty, you need to modify the number of TiFlash replicas for the corresponding tables. This is because when the number of TiFlash replicas is greater than the number of TiFlash nodes after scale-in, PD will not move Region peers away from the TiFlash nodes to be removed, which prevents the removal of these TiFlash nodes. + +2. In scenarios where all TiFlash nodes need to be removed from a cluster, if the `INFORMATION_SCHEMA.TIFLASH_REPLICA` table shows that there are no TiFlash replicas in the cluster but removing TiFlash nodes still fails, check whether you have recently executed `DROP TABLE .` or `DROP DATABASE ` operations. + + For tables or databases with TiFlash replicas, after executing `DROP TABLE .` or `DROP DATABASE `, TiDB does not immediately delete the TiFlash replication rules for the corresponding tables in PD. Instead, it waits until the corresponding tables meet the garbage collection (GC) conditions before deleting these replication rules. After GC is complete, the corresponding TiFlash nodes can be successfully removed. + + To remove data replication rules of TiFlash manually before the GC conditions are met, you can do the following: + + > **Note:** + > + > After manually removing TiFlash replication rules for a table, if you perform `RECOVER TABLE`, `FLASHBACK TABLE`, or `FLASHBACK DATABASE` operations on this table, the TiFlash replicas of this table will not be restored. + + 1. View all data replication rules related to TiFlash in the current PD instance: + + ```shell + curl http://:/pd/api/v1/config/rules/group/tiflash + ``` + + ``` + [ + { + "group_id": "tiflash", + "id": "table-45-r", + "override": true, + "start_key": "7480000000000000FF2D5F720000000000FA", + "end_key": "7480000000000000FF2E00000000000000F8", + "role": "learner", + "count": 1, + "label_constraints": [ + { + "key": "engine", + "op": "in", + "values": [ + "tiflash" + ] + } + ] + } + ] + ``` + + 2. Remove all data replication rules related to TiFlash. Take the rule whose `id` is `table-45-r` as an example. Delete it by the following command: + + ```shell + curl -v -X DELETE http://:/pd/api/v1/config/rule/tiflash/table-45-r + ``` + ## TiFlash analysis is slow If a statement contains operators or functions not supported in the MPP mode, TiDB does not select the MPP mode. Therefore, the analysis of the statement is slow. In this case, you can execute the `EXPLAIN` statement to check for operators or functions not supported in the MPP mode. From b8795561468905db7f320cbb69a5b29762ad6d6f Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 7 Jan 2025 14:27:41 +0800 Subject: [PATCH 4/7] Update scale-tidb-using-tiup.md --- scale-tidb-using-tiup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scale-tidb-using-tiup.md b/scale-tidb-using-tiup.md index c836bbda33408..26cddc5677b49 100644 --- a/scale-tidb-using-tiup.md +++ b/scale-tidb-using-tiup.md @@ -475,7 +475,7 @@ tiup cluster display Access the monitoring platform at using your browser, and view the status of the cluster and the new nodes. -After the scale-out, the cluster topology is as follows: +After the scaling, the cluster topology is as follows: | Host IP | Service | |:----|:----| From 9e49575ecfc622c56307bac426510fa7f3c42b02 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 7 Jan 2025 14:45:25 +0800 Subject: [PATCH 5/7] Update tiflash/troubleshoot-tiflash.md --- tiflash/troubleshoot-tiflash.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 13cd696a93c86..b6e57eb50813d 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -59,7 +59,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy. -4. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash: +4. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash. ```shell tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default From e5ba1d39d1b858673e8882cc9d993508a4a05461 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 7 Jan 2025 14:49:29 +0800 Subject: [PATCH 6/7] Update tiflash/tiflash-overview.md --- tiflash/tiflash-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tiflash/tiflash-overview.md b/tiflash/tiflash-overview.md index 4e7a8d6dc8551..ffce1d3373b38 100644 --- a/tiflash/tiflash-overview.md +++ b/tiflash/tiflash-overview.md @@ -40,7 +40,7 @@ Currently, data cannot be written directly into TiFlash. You need to write data TiFlash consists of two main components: the columnar storage component, and the TiFlash proxy component. The TiFlash proxy component is responsible for the communication using the Multi-Raft consensus algorithm. -After receiving a DDL command to create replicas for a table in TiFlash, TiDB automatically creates the corresponding [placement rules](/configure-placement-rules.md) in PD, and then PD performs the corresponding data scheduling based on these rules. +After receiving a DDL command to create replicas for a table in TiFlash, TiDB automatically creates the corresponding [placement rules](https://docs.pingcap.com/tidb/stable/configure-placement-rules) in PD, and then PD performs the corresponding data scheduling based on these rules. ## Key features From 5a0f83db3d013699fb4d9e80777747ca6dc56468 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 9 Jan 2025 09:20:31 +0800 Subject: [PATCH 7/7] Apply suggestions from code review Co-authored-by: xixirangrang --- tiflash/troubleshoot-tiflash.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index b6e57eb50813d..f8fc5a93d37df 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -72,7 +72,7 @@ This is because TiFlash is in an abnormal state caused by configuration errors o > - When [Placement Rules](/configure-placement-rules.md) are enabled and multiple rules exist, the previously configured [`max-replicas`](/pd-configuration-file.md#max-replicas), [`location-labels`](/pd-configuration-file.md#location-labels), and [`isolation-level`](/pd-configuration-file.md#isolation-level) no longer take effect. To adjust the replica policy, use the interface related to Placement Rules. > - When [Placement Rules](/configure-placement-rules.md) are enabled and only one default rule exists, TiDB will automatically update this default rule when `max-replicas`, `location-labels`, or `isolation-level` configurations are changed. -5. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node. +5. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) parameter), PD cannot schedule data to this TiFlash node. ## Some queries return the `Region Unavailable` error @@ -95,16 +95,16 @@ Take the following steps to handle this issue: 1. Check whether any table has more TiFlash replicas than the number of TiFlash nodes available after the cluster scale-in: ```sql - SELECT * FROM information_schema.tiflash_replica WHERE REPLICA_COUNT > 'tobe_left_nodes'; + SELECT * FROM information_schema.tiflash_replica WHERE REPLICA_COUNT > 'tobe_left_nodes'; ``` `tobe_left_nodes` is the number of TiFlash nodes after the scale-in. - If the query result is not empty, you need to modify the number of TiFlash replicas for the corresponding tables. This is because when the number of TiFlash replicas is greater than the number of TiFlash nodes after scale-in, PD will not move Region peers away from the TiFlash nodes to be removed, which prevents the removal of these TiFlash nodes. + If the query result is not empty, you need to modify the number of TiFlash replicas for the corresponding tables. This is because, when the number of TiFlash replicas exceeds the number of TiFlash nodes after the scale-in, PD will not move Region peers away from the TiFlash nodes to be removed, causing the removal of these TiFlash nodes to fail. -2. In scenarios where all TiFlash nodes need to be removed from a cluster, if the `INFORMATION_SCHEMA.TIFLASH_REPLICA` table shows that there are no TiFlash replicas in the cluster but removing TiFlash nodes still fails, check whether you have recently executed `DROP TABLE .` or `DROP DATABASE ` operations. +2. In scenarios where all TiFlash nodes need to be removed from a cluster, if the `INFORMATION_SCHEMA.TIFLASH_REPLICA` table shows that there are no TiFlash replicas in the cluster but removing TiFlash nodes still fails, check whether you have recently executed `DROP TABLE .` or `DROP DATABASE ` operations. - For tables or databases with TiFlash replicas, after executing `DROP TABLE .` or `DROP DATABASE `, TiDB does not immediately delete the TiFlash replication rules for the corresponding tables in PD. Instead, it waits until the corresponding tables meet the garbage collection (GC) conditions before deleting these replication rules. After GC is complete, the corresponding TiFlash nodes can be successfully removed. + For tables or databases with TiFlash replicas, after executing `DROP TABLE .` or `DROP DATABASE `, TiDB does not immediately delete the TiFlash replication rules for the corresponding tables in PD. Instead, it waits until the corresponding tables meet the garbage collection (GC) conditions before deleting these replication rules. After GC is complete, the corresponding TiFlash nodes can be successfully removed. To remove data replication rules of TiFlash manually before the GC conditions are met, you can do the following: