Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine tiflash docs to remove deprecated description #19913

Merged
merged 7 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 21 additions & 66 deletions scale-tidb-using-tiup.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,8 @@ This section exemplifies how to remove a TiFlash node from the `10.0.1.4` host.
ALTER TABLE <db-name>.<table-name> SET tiflash replica 'new_replica_num';
```

After executing this statement, TiDB modifies or deletes PD [placement rules](/configure-placement-rules.md) accordingly. Then, PD schedules data based on the updated placement rules.

3. Perform step 1 again and make sure that there is no table with TiFlash replicas more than the number of TiFlash nodes after scale-in.

### 2. Perform the scale-in operation
Expand All @@ -403,20 +405,28 @@ Perform the scale-in operation with one of the following solutions.

1. Confirm the name of the node to be taken down:

{{< copyable "shell-regular" >}}

```shell
tiup cluster display <cluster-name>
```

2. Remove the TiFlash node (assume that the node name is `10.0.1.4:9000` from Step 1):

{{< copyable "shell-regular" >}}

```shell
tiup cluster scale-in <cluster-name> --node 10.0.1.4:9000
```

3. View the status of the removed TiFlash node:

```shell
tiup cluster display <cluster-name>
```

4. After the status of the removed TiFlash node becomes `Tombstone`, delete the information of the removed node from the TiUP topology (TiUP will automatically clean up the related data files of the `Tombstone` node):

```shell
tiup cluster prune <cluster-name>
```

#### Solution 2. Manually remove a TiFlash node

In special cases (such as when a node needs to be forcibly taken down), or if the TiUP scale-in operation fails, you can manually remove a TiFlash node with the following steps.
Expand All @@ -427,8 +437,6 @@ In special cases (such as when a node needs to be forcibly taken down), or if th

* If you use TiUP deployment, replace `pd-ctl` with `tiup ctl:v<CLUSTER_VERSION> pd`:

{{< copyable "shell-regular" >}}

```shell
tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> store
```
Expand All @@ -443,8 +451,6 @@ In special cases (such as when a node needs to be forcibly taken down), or if th

* If you use TiUP deployment, replace `pd-ctl` with `tiup ctl:v<CLUSTER_VERSION> pd`:

{{< copyable "shell-regular" >}}

```shell
tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> store delete <store_id>
```
Expand All @@ -455,70 +461,19 @@ In special cases (such as when a node needs to be forcibly taken down), or if th

3. Wait for the store of the TiFlash node to disappear or for the `state_name` to become `Tombstone` before you stop the TiFlash process.

4. Manually delete TiFlash data files (the location can be found in the `data_dir` directory under the TiFlash configuration of the cluster topology file).

5. Delete information about the TiFlash node that goes down from the cluster topology using the following command:

{{< copyable "shell-regular" >}}
4. Delete the information of the removed node from the TiUP topology (TiUP will automatically clean up the related data files of the `Tombstone` node):

```shell
tiup cluster scale-in <cluster-name> --node <pd_ip>:<pd_port> --force
tiup cluster prune <cluster-name>
```

> **Note:**
>
> Before all TiFlash nodes in the cluster stop running, if not all tables replicated to TiFlash are canceled, you need to manually clean up the replication rules in PD, or the TiFlash node cannot be taken down successfully.

The steps to manually clean up the replication rules in PD are below:
### 3. View the cluster status

1. View all data replication rules related to TiFlash in the current PD instance:
```shell
tiup cluster display <cluster-name>
```

{{< copyable "shell-regular" >}}

```shell
curl http://<pd_ip>:<pd_port>/pd/api/v1/config/rules/group/tiflash
```

```
[
{
"group_id": "tiflash",
"id": "table-45-r",
"override": true,
"start_key": "7480000000000000FF2D5F720000000000FA",
"end_key": "7480000000000000FF2E00000000000000F8",
"role": "learner",
"count": 1,
"label_constraints": [
{
"key": "engine",
"op": "in",
"values": [
"tiflash"
]
}
]
}
]
```

2. Remove all data replication rules related to TiFlash. Take the rule whose `id` is `table-45-r` as an example. Delete it by the following command:

{{< copyable "shell-regular" >}}

```shell
curl -v -X DELETE http://<pd_ip>:<pd_port>/pd/api/v1/config/rule/tiflash/table-45-r
```

3. View the cluster status:

{{< copyable "shell-regular" >}}

```shell
tiup cluster display <cluster-name>
```

Access the monitoring platform at <http://10.0.1.5:3000> using your browser, and view the status of the cluster and the new nodes.
Access the monitoring platform at <http://10.0.1.5:3000> using your browser, and view the status of the cluster and the new nodes.

After the scale-out, the cluster topology is as follows:
qiancai marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
4 changes: 2 additions & 2 deletions tiflash/tiflash-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ It is recommended that you deploy TiFlash in different nodes from TiKV to ensure

Currently, data cannot be written directly into TiFlash. You need to write data in TiKV and then replicate it to TiFlash, because it connects to the TiDB cluster as a Learner role. TiFlash supports data replication in the unit of table, but no data is replicated by default after deployment. To replicate data of a specified table, see [Create TiFlash replicas for tables](/tiflash/create-tiflash-replicas.md#create-tiflash-replicas-for-tables).

TiFlash has three components: the columnar storage module, `tiflash proxy`, and `pd buddy`. `tiflash proxy` is responsible for the communication using the Multi-Raft consensus algorithm. `pd buddy` works with PD to replicate data from TiKV to TiFlash in the unit of table.
TiFlash consists of two main components: the columnar storage component, and the TiFlash proxy component. The TiFlash proxy component is responsible for the communication using the Multi-Raft consensus algorithm.

When TiDB receives the DDL command to create replicas in TiFlash, the `pd buddy` component acquires the information of the table to be replicated via the status port of TiDB, and sends the information to PD. Then PD performs the corresponding data scheduling according to the information provided by `pd buddy`.
After receiving a DDL command to create replicas for a table in TiFlash, TiDB automatically creates the corresponding [placement rules](/configure-placement-rules.md) in PD, and then PD performs the corresponding data scheduling based on these rules.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

## Key features

Expand Down
73 changes: 63 additions & 10 deletions tiflash/troubleshoot-tiflash.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,22 +53,16 @@

3. Check whether the TiFlash proxy status is normal through `pd-ctl`.

{{< copyable "shell-regular" >}}

```shell
echo "store" | /path/to/pd-ctl -u http://${pd-ip}:${pd-port}
tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} store
```

The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy.

4. Check whether `pd buddy` can correctly print the logs (the log path is the value of `log` in the [flash.flash_cluster] configuration item; the default log path is under the `tmp` directory configured in the TiFlash configuration file).

5. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash:

{{< copyable "shell-regular" >}}
4. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash:
qiancai marked this conversation as resolved.
Show resolved Hide resolved

```shell
echo 'config placement-rules show' | /path/to/pd-ctl -u http://${pd-ip}:${pd-port}
tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default
```

Reconfirm the value of `default: count`.
Expand All @@ -78,7 +72,7 @@
> - When [Placement Rules](/configure-placement-rules.md) are enabled and multiple rules exist, the previously configured [`max-replicas`](/pd-configuration-file.md#max-replicas), [`location-labels`](/pd-configuration-file.md#location-labels), and [`isolation-level`](/pd-configuration-file.md#isolation-level) no longer take effect. To adjust the replica policy, use the interface related to Placement Rules.
> - When [Placement Rules](/configure-placement-rules.md) are enabled and only one default rule exists, TiDB will automatically update this default rule when `max-replicas`, `location-labels`, or `isolation-level` configurations are changed.

6. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node.
5. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the `low-space-ratio` parameter), PD cannot schedule data to this TiFlash node.

Check warning on line 75 in tiflash/troubleshoot-tiflash.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion.", "location": {"path": "tiflash/troubleshoot-tiflash.md", "range": {"start": {"line": 75, "column": 100}}}, "severity": "INFO"}
qiancai marked this conversation as resolved.
Show resolved Hide resolved

## Some queries return the `Region Unavailable` error

Expand All @@ -94,6 +88,65 @@
2. Delete the related data of the TiFlash node.
3. Redeploy the TiFlash node in the cluster.

## Removing TiFlash nodes is slow

Take the following steps to handle this issue:

1. Check whether any table has more TiFlash replicas than the number of TiFlash nodes available after the cluster scale-in:

```sql
SELECT * FROM information_schema.tiflash_replica WHERE REPLICA_COUNT > 'tobe_left_nodes';
qiancai marked this conversation as resolved.
Show resolved Hide resolved
```

`tobe_left_nodes` is the number of TiFlash nodes after the scale-in.

If the query result is not empty, you need to modify the number of TiFlash replicas for the corresponding tables. This is because when the number of TiFlash replicas is greater than the number of TiFlash nodes after scale-in, PD will not move Region peers away from the TiFlash nodes to be removed, which prevents the removal of these TiFlash nodes.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

2. In scenarios where all TiFlash nodes need to be removed from a cluster, if the `INFORMATION_SCHEMA.TIFLASH_REPLICA` table shows that there are no TiFlash replicas in the cluster but removing TiFlash nodes still fails, check whether you have recently executed `DROP TABLE <db-nam>.<table-name>` or `DROP DATABASE <db-name>` operations.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

For tables or databases with TiFlash replicas, after executing `DROP TABLE <db-nam>.<table-name>` or `DROP DATABASE <db-name>`, TiDB does not immediately delete the TiFlash replication rules for the corresponding tables in PD. Instead, it waits until the corresponding tables meet the garbage collection (GC) conditions before deleting these replication rules. After GC is complete, the corresponding TiFlash nodes can be successfully removed.
qiancai marked this conversation as resolved.
Show resolved Hide resolved

To remove data replication rules of TiFlash manually before the GC conditions are met, you can do the following:

> **Note:**
>
> After manually removing TiFlash replication rules for a table, if you perform `RECOVER TABLE`, `FLASHBACK TABLE`, or `FLASHBACK DATABASE` operations on this table, the TiFlash replicas of this table will not be restored.

1. View all data replication rules related to TiFlash in the current PD instance:

```shell
curl http://<pd_ip>:<pd_port>/pd/api/v1/config/rules/group/tiflash
```

```
[
{
"group_id": "tiflash",
"id": "table-45-r",
"override": true,
"start_key": "7480000000000000FF2D5F720000000000FA",
"end_key": "7480000000000000FF2E00000000000000F8",
"role": "learner",
"count": 1,
"label_constraints": [
{
"key": "engine",
"op": "in",
"values": [
"tiflash"
]
}
]
}
]
```

2. Remove all data replication rules related to TiFlash. Take the rule whose `id` is `table-45-r` as an example. Delete it by the following command:

```shell
curl -v -X DELETE http://<pd_ip>:<pd_port>/pd/api/v1/config/rule/tiflash/table-45-r
```

## TiFlash analysis is slow

If a statement contains operators or functions not supported in the MPP mode, TiDB does not select the MPP mode. Therefore, the analysis of the statement is slow. In this case, you can execute the `EXPLAIN` statement to check for operators or functions not supported in the MPP mode.
Expand Down
Loading