From 08b18b945f22c945ddb1b0fc2c6ffc273cfdf843 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 5 Sep 2022 11:08:15 +0800 Subject: [PATCH 01/15] Update TOC.md (#10319) --- TOC.md | 1 - 1 file changed, 1 deletion(-) diff --git a/TOC.md b/TOC.md index 7bffd4cabbfcd..5d860799769df 100644 --- a/TOC.md +++ b/TOC.md @@ -518,7 +518,6 @@ - [Integrate TiDB with Confluent and Snowflake](/ticdc/integrate-confluent-using-ticdc.md) - [FAQs](/ticdc/ticdc-faq.md) - [Glossary](/ticdc/ticdc-glossary.md) - - [Dumpling](/dumpling-overview.md) - sync-diff-inspector - [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md) - [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md) From 4ae6a98734726d4d4e3a60bfb9472989922c2c6c Mon Sep 17 00:00:00 2001 From: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> Date: Mon, 5 Sep 2022 11:18:55 +0800 Subject: [PATCH 02/15] br: add documents for s3-multi-part-size (#10316) --- tikv-configuration-file.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 2ae014dce9e91..e59495180f319 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -1815,3 +1815,13 @@ To reduce write latency and avoid frequent access to PD, TiKV periodically fetch + TiKV adjusts the number of cached timestamps according to the timestamp consumption in the previous period. If the usage of locally cached timestamps is low, TiKV gradually reduces the number of cached timestamps until it reaches `renew-batch-min-size`. If large bursty write traffic often occurs in your application, you can set this parameter to a larger value as appropriate. Note that this parameter is the cache size for a single tikv-server. If you set the parameter to too large a value and the cluster contains many tikv-servers, the TSO consumption will be too fast. + In the **TiKV-RAW** \> **Causal timestamp** panel in Grafana, **TSO batch size** is the number of locally cached timestamps that has been dynamically adjusted according to the application workload. You can refer to this metric to adjust `renew-batch-min-size`. + Default value: `100` + +### `s3-multi-part-size` New in v5.3.2 + +> **Note:** +> +> This configuration is introduced to address backup failures caused by S3 rate limiting. This problem has been fixed by [refining the backup data storage structure](/br/backup-and-restore-design.md#backup-file-structure). Therefore, this configuration is deprecated from v6.1.1 and is no longer recommended. + ++ The part size used when you perform multipart upload to S3 during backup. You can adjust the value of this configuration to control the number of requests sent to S3. ++ If data is backed up to S3 and the backup file is larger than the value of this configuration item, [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html) is automatically enabled. Based on the compression ratio, the backup file generated by a 96-MiB Region is approximately 10 MiB to 30 MiB. ++ Default value: 5MiB \ No newline at end of file From 631b1563b52d7ccd10428ca157154586520a5116 Mon Sep 17 00:00:00 2001 From: Benjamin2037 Date: Mon, 5 Sep 2022 17:56:55 +0800 Subject: [PATCH 03/15] docs:Add tidb_last_ddl_info system vars usage description in docs. (#10117) --- system-variables.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/system-variables.md b/system-variables.md index 8ce833ac9f52b..c3d238bca12ec 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1819,6 +1819,16 @@ For a system upgraded to v5.0 from an earlier version, if you have not modified - Default value: `tikv,tiflash,tidb` - This variable is used to set the storage engine list that TiDB can use when reading data. +### tidb_last_ddl_info New in v6.0.0 + +- Scope: SESSION +- Persists to cluster: No +- Default value: "" +- Type: String +- This is a read-only variable. It is internally used in TiDB to get the information of the last DDL operation within the current session. + - "query": The last DDL query string. + - "seq_num": The sequence number for each DDL operation. It is used to identify the order of DDL operations. + ### tidb_last_query_info New in v4.0.14 - Scope: SESSION From aa726a07ad40f2b656abc168d8d70011e9e5f5ed Mon Sep 17 00:00:00 2001 From: Xiang Zhang Date: Tue, 6 Sep 2022 16:12:55 +0800 Subject: [PATCH 04/15] Add missing collations and unify code block style (#10345) --- character-set-and-collation.md | 134 +++++++++++++++++++++++++++------ 1 file changed, 113 insertions(+), 21 deletions(-) diff --git a/character-set-and-collation.md b/character-set-and-collation.md index 5bee561c6a5f0..2fce1ffa8d07d 100644 --- a/character-set-and-collation.md +++ b/character-set-and-collation.md @@ -24,18 +24,31 @@ SELECT 'A' = 'a'; ``` ```sql -mysql> SELECT 'A' = 'a'; +SELECT 'A' = 'a'; +``` + +```sql +-----------+ | 'A' = 'a' | +-----------+ | 0 | +-----------+ 1 row in set (0.00 sec) +``` -mysql> SET NAMES utf8mb4 COLLATE utf8mb4_general_ci; +```sql +SET NAMES utf8mb4 COLLATE utf8mb4_general_ci; +``` + +```sql Query OK, 0 rows affected (0.00 sec) +``` -mysql> SELECT 'A' = 'a'; +```sql +SELECT 'A' = 'a'; +``` + +```sql +-----------+ | 'A' = 'a' | +-----------+ @@ -73,18 +86,26 @@ SHOW CHARACTER SET; TiDB supports the following collations: ```sql -mysql> show collation; -+-------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+-------------+---------+------+---------+----------+---------+ -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | -| binary | binary | 63 | Yes | Yes | 1 | -| ascii_bin | ascii | 65 | Yes | Yes | 1 | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | -| gbk_bin | gbk | 87 | Yes | Yes | 1 | -+-------------+---------+------+---------+----------+---------+ -6 rows in set (0.00 sec) +SHOW COLLATION; +``` + +```sql ++--------------------+---------+------+---------+----------+---------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | ++--------------------+---------+------+---------+----------+---------+ +| ascii_bin | ascii | 65 | Yes | Yes | 1 | +| binary | binary | 63 | Yes | Yes | 1 | +| gbk_bin | gbk | 87 | | Yes | 1 | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | +| utf8_general_ci | utf8 | 33 | | Yes | 1 | +| utf8_unicode_ci | utf8 | 192 | | Yes | 1 | +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | +| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | +| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 | ++--------------------+---------+------+---------+----------+---------+ +11 rows in set (0.00 sec) ``` > **Warning:** @@ -125,25 +146,54 @@ By default, TiDB provides the same 3-byte limit on `utf8` to ensure that data cr The following demonstrates the default behavior when inserting a 4-byte emoji character into a table. The `INSERT` statement fails for the `utf8` character set, but succeeds for `utf8mb4`: ```sql -mysql> CREATE TABLE utf8_test ( +CREATE TABLE utf8_test ( -> c char(1) NOT NULL -> ) CHARACTER SET utf8; +``` + +```sql Query OK, 0 rows affected (0.09 sec) +``` -mysql> CREATE TABLE utf8m4_test ( +```sql +CREATE TABLE utf8m4_test ( -> c char(1) NOT NULL -> ) CHARACTER SET utf8mb4; +``` + +```sql Query OK, 0 rows affected (0.09 sec) +``` + +```sql +INSERT INTO utf8_test VALUES ('😉'); +``` -mysql> INSERT INTO utf8_test VALUES ('😉'); +```sql ERROR 1366 (HY000): incorrect utf8 value f09f9889(😉) for column c -mysql> INSERT INTO utf8m4_test VALUES ('😉'); +``` + +```sql +INSERT INTO utf8m4_test VALUES ('😉'); +``` + +```sql Query OK, 1 row affected (0.02 sec) +``` + +```sql +SELECT char_length(c), length(c), c FROM utf8_test; +``` -mysql> SELECT char_length(c), length(c), c FROM utf8_test; +```sql Empty set (0.01 sec) +``` -mysql> SELECT char_length(c), length(c), c FROM utf8m4_test; +```sql +SELECT char_length(c), length(c), c FROM utf8m4_test; +``` + +```sql +----------------+-----------+------+ | char_length(c) | length(c) | c | +----------------+-----------+------+ @@ -400,12 +450,33 @@ Before v4.0, you can specify most of the MySQL collations in TiDB, and these col ```sql CREATE TABLE t(a varchar(20) charset utf8mb4 collate utf8mb4_general_ci PRIMARY KEY); +``` + +```sql Query OK, 0 rows affected +``` + +```sql INSERT INTO t VALUES ('A'); +``` + +```sql Query OK, 1 row affected +``` + +```sql INSERT INTO t VALUES ('a'); +``` + +```sql Query OK, 1 row affected # In TiDB, it is successfully executed. In MySQL, because utf8mb4_general_ci is case-insensitive, the `Duplicate entry 'a'` error is reported. +``` + +```sql INSERT INTO t1 VALUES ('a '); +``` + +```sql Query OK, 1 row affected # In TiDB, it is successfully executed. In MySQL, because comparison is performed after the spaces are filled in, the `Duplicate entry 'a '` error is returned. ``` @@ -448,12 +519,33 @@ When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4 ```sql CREATE TABLE t(a varchar(20) charset utf8mb4 collate utf8mb4_general_ci PRIMARY KEY); +``` + +```sql Query OK, 0 rows affected (0.00 sec) +``` + +```sql INSERT INTO t VALUES ('A'); +``` + +```sql Query OK, 1 row affected (0.00 sec) +``` + +```sql INSERT INTO t VALUES ('a'); +``` + +```sql ERROR 1062 (23000): Duplicate entry 'a' for key 'PRIMARY' # TiDB is compatible with the case-insensitive collation of MySQL. +``` + +```sql INSERT INTO t VALUES ('a '); +``` + +```sql ERROR 1062 (23000): Duplicate entry 'a ' for key 'PRIMARY' # TiDB modifies the `PADDING` behavior to be compatible with MySQL. ``` From 058109d0cc9b4add4a2836107dd2421c20cc5680 Mon Sep 17 00:00:00 2001 From: Aolin Date: Tue, 6 Sep 2022 16:16:55 +0800 Subject: [PATCH 05/15] clustered-indexes.md: change the system variable default value to `ON` (#10311) --- clustered-indexes.md | 2 +- system-variables.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/clustered-indexes.md b/clustered-indexes.md index a75cf3650f95f..d29cbdd7b6b31 100644 --- a/clustered-indexes.md +++ b/clustered-indexes.md @@ -67,7 +67,7 @@ For statements that do not explicitly specify the keyword `CLUSTERED`/`NONCLUSTE - `ON` indicates that primary keys are created as clustered indexes by default. - `INT_ONLY` indicates that the behavior is controlled by the configuration item `alter-primary-key`. If `alter-primary-key` is set to `true`, primary keys are created as non-clustered indexes by default. If it is set to `false`, only the primary keys which consist of an integer column are created as clustered indexes. -The default value of `@@global.tidb_enable_clustered_index` is `INT_ONLY`. +The default value of `@@global.tidb_enable_clustered_index` is `ON`. ### Add or drop clustered indexes diff --git a/system-variables.md b/system-variables.md index c3d238bca12ec..8fd827e80805f 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1012,7 +1012,7 @@ Constraint checking is always performed in place for pessimistic transactions (d - Scope: SESSION | GLOBAL - Persists to cluster: Yes - Type: Enumeration -- Default value: `INT_ONLY` +- Default value: `ON` - Possible values: `OFF`, `ON`, `INT_ONLY` - This variable is used to control whether to create the primary key as a [clustered index](/clustered-indexes.md) by default. "By default" here means that the statement does not explicitly specify the keyword `CLUSTERED`/`NONCLUSTERED`. Supported values are `OFF`, `ON`, and `INT_ONLY`: - `OFF` indicates that primary keys are created as non-clustered indexes by default. From e599bbfb4a1642897ed314be7929ce1dda56ad61 Mon Sep 17 00:00:00 2001 From: Lucas Date: Tue, 6 Sep 2022 16:26:55 +0800 Subject: [PATCH 06/15] Add docs for raft-engine log recycling feature. (#10258) --- tikv-configuration-file.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index e59495180f319..a840f03c7af90 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -1495,6 +1495,27 @@ Configuration items related to Raft Engine. + When this configuration value is not set, 15% of the available system memory is used. + Default value: `Total machine memory * 15%` +### `format-version` New in v6.3.0 + +> **Warning:** +> +> After `format-version` is set to `2`, you **cannot** downgrade the TiKV cluster to a version earlier than v6.3.0. Otherwise, data corruption might occur. + ++ Specifies the version of log files in Raft Engine. ++ Value Options: + + `1`: Default log file version for TiKV earlier than v6.3.0. Can be read by TiKV >= v6.1.0. + + `2`: Supports log recycling. Can be read by TiKV >= v6.3.0. ++ Default value: `2` + +### `enable-log-recycle` New in v6.3.0 + +> **Note:** +> +> This configuration item is only available when [`format-version`](#format-version-new-in-v630) >= 2. + ++ Determines whether to recycle stale log files in Raft Engine. When it is enabled, logically purged log files will be reserved for recycling. This reduces the long tail latency on write workloads. ++ Default value: `true` + ## security Configuration items related to security. From 313cf2158e5c37887f191845849ad775f109e281 Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 6 Sep 2022 16:32:56 +0800 Subject: [PATCH 07/15] support account lock/unlock in create/alter user (#10330) --- sql-statements/sql-statement-alter-user.md | 16 +++++++++++++++- sql-statements/sql-statement-create-user.md | 15 +++++++++++++-- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/sql-statements/sql-statement-alter-user.md b/sql-statements/sql-statement-alter-user.md index c8522eaf1d82d..ba1f4071b4847 100644 --- a/sql-statements/sql-statement-alter-user.md +++ b/sql-statements/sql-statement-alter-user.md @@ -12,7 +12,7 @@ This statement changes an existing user inside the TiDB privilege system. In the ```ebnf+diagram AlterUserStmt ::= - 'ALTER' 'USER' IfExists (UserSpecList RequireClauseOpt ConnectionOptions PasswordOrLockOptions | 'USER' '(' ')' 'IDENTIFIED' 'BY' AuthString) + 'ALTER' 'USER' IfExists (UserSpecList RequireClauseOpt ConnectionOptions LockOption | 'USER' '(' ')' 'IDENTIFIED' 'BY' AuthString) UserSpecList ::= UserSpec ( ',' UserSpec )* @@ -25,6 +25,8 @@ Username ::= AuthOption ::= ( 'IDENTIFIED' ( 'BY' ( AuthString | 'PASSWORD' HashString ) | 'WITH' StringName ( 'BY' AuthString | 'AS' HashString )? ) )? + +LockOption ::= ( 'ACCOUNT' 'LOCK' | 'ACCOUNT' 'UNLOCK' )? ``` ## Examples @@ -53,6 +55,18 @@ mysql> SHOW CREATE USER 'newuser'; 1 row in set (0.00 sec) ``` +```sql +ALTER USER 'newuser' ACCOUNT LOCK; +``` + +``` +Query OK, 0 rows affected (0.02 sec) +``` + +> **Note:** +> +> Do not use `ACCOUNT UNLOCK` to unlock a [role](/sql-statements/sql-statement-create-role.md). Otherwise, the unlocked role can be used to log in to TiDB without password. + ## MySQL compatibility * In MySQL this statement is used to change attributes such as to expire a password. This functionality is not yet supported by TiDB. diff --git a/sql-statements/sql-statement-create-user.md b/sql-statements/sql-statement-create-user.md index e0d6ab6066918..d77e194be49cd 100644 --- a/sql-statements/sql-statement-create-user.md +++ b/sql-statements/sql-statement-create-user.md @@ -12,7 +12,7 @@ This statement creates a new user, specified with a password. In the MySQL privi ```ebnf+diagram CreateUserStmt ::= - 'CREATE' 'USER' IfNotExists UserSpecList RequireClauseOpt ConnectionOptions PasswordOrLockOptions + 'CREATE' 'USER' IfNotExists UserSpecList RequireClauseOpt ConnectionOptions LockOption IfNotExists ::= ('IF' 'NOT' 'EXISTS')? @@ -29,6 +29,8 @@ AuthOption ::= StringName ::= stringLit | Identifier + +LockOption ::= ( 'ACCOUNT' 'LOCK' | 'ACCOUNT' 'UNLOCK' )? ``` ## Examples @@ -61,6 +63,16 @@ CREATE USER 'newuser4'@'%' REQUIRE ISSUER '/C=US/ST=California/L=San Francisco/O Query OK, 1 row affected (0.02 sec) ``` +Create a user who is locked upon creation. + +```sql +CREATE USER 'newuser5'@'%' ACCOUNT LOCK; +``` + +``` +Query OK, 1 row affected (0.02 sec) +``` + ## MySQL compatibility The following `CREATE USER` options are not yet supported by TiDB, and will be parsed but ignored: @@ -68,7 +80,6 @@ The following `CREATE USER` options are not yet supported by TiDB, and will be p * TiDB does not support `WITH MAX_QUERIES_PER_HOUR`, `WITH MAX_UPDATES_PER_HOUR`, and `WITH MAX_USER_CONNECTIONS` options. * TiDB does not support the `DEFAULT ROLE` option. * TiDB does not support `PASSWORD EXPIRE`, `PASSWORD HISTORY` or other options related to password. -* TiDB does not support the `ACCOUNT LOCK` and `ACCOUNT UNLOCK` options. ## See also From a11ce573053555d684579eacffa51f5d9b404dd5 Mon Sep 17 00:00:00 2001 From: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> Date: Wed, 7 Sep 2022 13:18:56 +0800 Subject: [PATCH 08/15] move comment from code block to the normal text (#10357) --- character-set-and-collation.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/character-set-and-collation.md b/character-set-and-collation.md index 2fce1ffa8d07d..ce8c15f548820 100644 --- a/character-set-and-collation.md +++ b/character-set-and-collation.md @@ -469,17 +469,21 @@ INSERT INTO t VALUES ('a'); ``` ```sql -Query OK, 1 row affected # In TiDB, it is successfully executed. In MySQL, because utf8mb4_general_ci is case-insensitive, the `Duplicate entry 'a'` error is reported. +Query OK, 1 row affected ``` +In TiDB, the preceding statement is successfully executed. In MySQL, because `utf8mb4_general_ci` is case-insensitive, the `Duplicate entry 'a'` error is reported. + ```sql INSERT INTO t1 VALUES ('a '); ``` ```sql -Query OK, 1 row affected # In TiDB, it is successfully executed. In MySQL, because comparison is performed after the spaces are filled in, the `Duplicate entry 'a '` error is returned. +Query OK, 1 row affected ``` +In TiDB, the preceding statement is successfully executed. In MySQL, because comparison is performed after the spaces are filled in, the `Duplicate entry 'a '` error is returned. + ### New framework for collations Since TiDB v4.0, a complete framework for collations is introduced. From fd5b09c74d67e698505143209533668982dcc4e8 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Wed, 7 Sep 2022 13:36:56 +0800 Subject: [PATCH 09/15] placement rules: remove experimental note (#10360) --- schedule-replicas-by-topology-labels.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schedule-replicas-by-topology-labels.md b/schedule-replicas-by-topology-labels.md index f4b41fb2fff2a..49f8e8f0626bd 100644 --- a/schedule-replicas-by-topology-labels.md +++ b/schedule-replicas-by-topology-labels.md @@ -8,7 +8,7 @@ aliases: ['/docs/dev/location-awareness/','/docs/dev/how-to/deploy/geographic-re > **Note:** > -> TiDB v5.3.0 introduces an experimental support for [Placement Rules in SQL](/placement-rules-in-sql.md). This offers a more convenient way to configure the placement of tables and partitions. Placement Rules in SQL might replace placement configuration with PD in future releases. +> TiDB v5.3.0 introduces [Placement Rules in SQL](/placement-rules-in-sql.md). This offers a more convenient way to configure the placement of tables and partitions. Placement Rules in SQL might replace placement configuration with PD in future releases. To improve the high availability and disaster recovery capability of TiDB clusters, it is recommended that TiKV nodes are physically scattered as much as possible. For example, TiKV nodes can be distributed on different racks or even in different data centers. According to the topology information of TiKV, the PD scheduler automatically performs scheduling at the background to isolate each replica of a Region as much as possible, which maximizes the capability of disaster recovery. From 6f4fa89f8149789597ee482472f52eb74576baeb Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Thu, 8 Sep 2022 11:26:55 +0800 Subject: [PATCH 10/15] best-practices: fix a wrong description (#10366) --- best-practices/pd-scheduling-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md index 0c5406677f1ee..0fb7313cfd787 100644 --- a/best-practices/pd-scheduling-best-practices.md +++ b/best-practices/pd-scheduling-best-practices.md @@ -102,7 +102,7 @@ The processes of scale-down and failure recovery are basically the same. `replic Region merge refers to the process of merging adjacent small regions. It serves to avoid unnecessary resource consumption by a large number of small or even empty regions after data deletion. Region merge is performed by `mergeChecker`, which processes in a similar way to `replicaChecker`: PD continuously scans all regions in the background, and generates an operator when contiguous small regions are found. -Specifically, when a newly split Region exists for more than the value of [`split-merge-interval`](/pd-configuration-file.md#split-merge-interval) (`1h` by default), if any of the following conditions occurs, this Region triggers the Region merge scheduling: +Specifically, when a newly split Region exists for more than the value of [`split-merge-interval`](/pd-configuration-file.md#split-merge-interval) (`1h` by default), if the following conditions occur at the same time, this Region triggers the Region merge scheduling: - The size of this Region is smaller than the value of the [`max-merge-region-size`](/pd-configuration-file.md#max-merge-region-size) (20 MiB by default) From 8723111f691265e69828e563e356a2855011a2f1 Mon Sep 17 00:00:00 2001 From: Cheese Date: Thu, 8 Sep 2022 12:36:56 +0800 Subject: [PATCH 11/15] [Update] supported third lib (#10260) --- TOC.md | 2 +- develop/dev-guide-third-party-support.md | 13 +++++++++---- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/TOC.md b/TOC.md index 5d860799769df..a8abc775b7c88 100644 --- a/TOC.md +++ b/TOC.md @@ -86,7 +86,7 @@ - Cloud Native Development Environment - [Gitpod](/develop/dev-guide-playground-gitpod.md) - Third-party Support - - [Third-Party Libraries Support](/develop/dev-guide-third-party-support.md) + - [Third-Party Tools Supported by TiDB](/develop/dev-guide-third-party-support.md) - [Integrate with ProxySQL](/develop/dev-guide-proxysql-integration.md) - Deploy - [Software and Hardware Requirements](/hardware-and-software-requirements.md) diff --git a/develop/dev-guide-third-party-support.md b/develop/dev-guide-third-party-support.md index c5c6966aa75c0..620f6373be323 100644 --- a/develop/dev-guide-third-party-support.md +++ b/develop/dev-guide-third-party-support.md @@ -1,9 +1,13 @@ --- -title: Third-Party Libraries Support Maintained by PingCAP -summary: Learn about third-party libraries support maintained by PingCAP. +title: Third-Party Tools Supported by TiDB +summary: Learn about third-party tools supported by TiDB. --- -# Third-Party Libraries Support Maintained by PingCAP +# Third-Party Tools Supported by TiDB + +> **Note:** +> +> This document only lists common third-party tools supported by TiDB. Some other third-party tools are not listed, not because they are not supported, but because PingCAP is not sure whether they use features that are incompatible with TiDB. TiDB is highly compatible with the MySQL 5.7 protocol, so most of the MySQL drivers, ORM frameworks, and other tools that adapt to MySQL are compatible with TiDB. This document focuses on these tools and their support levels for TiDB. @@ -14,7 +18,7 @@ PingCAP works with the community and provides the following support levels for t - **_Full_**: Indicates that TiDB is already compatible with most functionalities of the corresponding third-party tool, and maintains compatibility with its newer versions. PingCAP will periodically conduct compatibility tests with the latest version of the tool. - **_Compatible_**: Indicates that because the corresponding third-party tool is adapted to MySQL and TiDB is highly compatible with the MySQL protocol, so TiDB can use most features of the tool. However, PingCAP has not completed a full test on all features of the tool, which might lead to some unexpected behaviors. -> **Warning:** +> **Note:** > > Unless specified, support for [Application retry and error handling](/develop/dev-guide-transaction-troubleshoot.md#application-retry-and-error-handling) is not included for **Driver** or **ORM frameworks**. @@ -45,6 +49,7 @@ If you encounter problems when connecting to TiDB using the tools listed in this | Java | [MyBatis](https://mybatis.org/mybatis-3/) | v3.5.10 | Full | N/A | [Build a Simple CRUD App with TiDB and Java](/develop/dev-guide-sample-application-java.md) | | Java | [Spring Data JPA](https://spring.io/projects/spring-data-jpa/) | 2.7.2 | Full | N/A | [Build a TiDB Application Using Spring Boot](/develop/dev-guide-sample-application-spring-boot.md) | | Java | [jOOQ](https://github.com/jOOQ/jOOQ) | v3.16.7 (Open Source) | Full | N/A | N/A | +| Ruby | [Active Record](https://guides.rubyonrails.org/active_record_basics.html) | v7.0 | Full | N/A | N/A | | JavaScript/TypeScript | [sequelize](https://www.npmjs.com/package/sequelize) | v6.20.1 | Compatible | N/A | N/A | | JavaScript/TypeScript | [Knex.js](https://knexjs.org/) | v1.0.7 | Compatible | N/A | N/A | | JavaScript/TypeScript | [Prisma Client](https://www.prisma.io/) | 3.15.1 | Compatible | N/A | N/A | From f2cdccd471037e8496945b79acbe44835c2cb670 Mon Sep 17 00:00:00 2001 From: shichun-0415 <89768198+shichun-0415@users.noreply.github.com> Date: Thu, 8 Sep 2022 15:04:57 +0800 Subject: [PATCH 12/15] support downloading server and toolkit packages under arm (#10263) --- binary-package.md | 94 ++++++++++++++++++++----------------- download-ecosystem-tools.md | 26 +++++----- 2 files changed, 66 insertions(+), 54 deletions(-) diff --git a/binary-package.md b/binary-package.md index 0169b85e10d40..637ca16295b15 100644 --- a/binary-package.md +++ b/binary-package.md @@ -7,69 +7,77 @@ summary: Learn about TiDB installation packages and the specific components incl Before [deploying TiUP offline](/production-deployment-using-tiup.md#deploy-tiup-offline), you need to download the binary packages of TiDB at the [official download page](https://en.pingcap.com/download/). -TiDB provides two binary packages: `TiDB-community-server` and `TiDB-community-toolkit` +TiDB binary packages are available in amd64 and arm64 architectures. In either architecture, TiDB provides two binary packages: `TiDB-community-server` and `TiDB-community-toolkit`. The `TiDB-community-server` package contains the following contents. | Content | Change history | |---|---| -| tidb-{version}-linux-amd64.tar.gz | | -| tikv-{version}-linux-amd64.tar.gz | | -| tiflash-{version}-linux-amd64.tar.gz | | -| pd-{version}-linux-amd64.tar.gz | | -| ctl-{version}-linux-amd64.tar.gz | | -| grafana-{version}-linux-amd64.tar.gz | | -| alertmanager-{version}-linux-amd64.tar.gz | | -| blackbox_exporter-{version}-linux-amd64.tar.gz | | -| prometheus-{version}-linux-amd64.tar.gz | | -| node_exporter-{version}-linux-amd64.tar.gz | | -| tiup-linux-amd64.tar.gz | | -| tiup-{version}-linux-amd64.tar.gz | | +| tidb-{version}-linux-{arch}.tar.gz | | +| tikv-{version}-linux-{arch}.tar.gz | | +| tiflash-{version}-linux-{arch}.tar.gz | | +| pd-{version}-linux-{arch}.tar.gz | | +| ctl-{version}-linux-{arch}.tar.gz | | +| grafana-{version}-linux-{arch}.tar.gz | | +| alertmanager-{version}-linux-{arch}.tar.gz | | +| blackbox_exporter-{version}-linux-{arch}.tar.gz | | +| prometheus-{version}-linux-{arch}.tar.gz | | +| node_exporter-{version}-linux-{arch}.tar.gz | | +| tiup-linux-{arch}.tar.gz | | +| tiup-{version}-linux-{arch}.tar.gz | | | local_install.sh | | -| cluster-{version}-linux-amd64.tar.gz | | -| insight-{version}-linux-amd64.tar.gz | | -| diag-{version}-linux-amd64.tar.gz | New in v6.0.0 | -| influxdb-{version}-linux-amd64.tar.gz | | -| playground-{version}-linux-amd64.tar.gz | | +| cluster-{version}-linux-{arch}.tar.gz | | +| insight-{version}-linux-{arch}.tar.gz | | +| diag-{version}-linux-{arch}.tar.gz | New in v6.0.0 | +| influxdb-{version}-linux-{arch}.tar.gz | | +| playground-{version}-linux-{arch}.tar.gz | | + +> **Note**: +> +> `{version}` depends on the version of the component or server you are installing. `{arch}` depends on the architecture of the system, which can be `amd64` or `arm64`. The `TiDB-community-toolkit` package contains the following contents. | Content | Change history | |---|---| -| tikv-importer-{version}-linux-amd64.tar.gz | | -| pd-recover-{version}-linux-amd64.tar.gz | | +| tikv-importer-{version}-linux-{arch}.tar.gz | | +| pd-recover-{version}-linux-{arch}.tar.gz | | | etcdctl | New in v6.0.0 | -| tiup-linux-amd64.tar.gz | | -| tiup-{version}-linux-amd64.tar.gz | | -| tidb-lightning-{version}-linux-amd64.tar.gz | | +| tiup-linux-{arch}.tar.gz | | +| tiup-{version}-linux-{arch}.tar.gz | | +| tidb-lightning-{version}-linux-{arch}.tar.gz | | | tidb-lightning-ctl | | -| dumpling-{version}-linux-amd64.tar.gz | | -| cdc-{version}-linux-amd64.tar.gz | | -| dm-{version}-linux-amd64.tar.gz | | -| dm-worker-{version}-linux-amd64.tar.gz | | -| dm-master-{version}-linux-amd64.tar.gz | | -| dmctl-{version}-linux-amd64.tar.gz | | -| br-{version}-linux-amd64.tar.gz | | +| dumpling-{version}-linux-{arch}.tar.gz | | +| cdc-{version}-linux-{arch}.tar.gz | | +| dm-{version}-linux-{arch}.tar.gz | | +| dm-worker-{version}-linux-{arch}.tar.gz | | +| dm-master-{version}-linux-{arch}.tar.gz | | +| dmctl-{version}-linux-{arch}.tar.gz | | +| br-{version}-linux-{arch}.tar.gz | | | spark-{version}-any-any.tar.gz | | | tispark-{version}-any-any.tar.gz | | -| package-{version}-linux-amd64.tar.gz | | -| bench-{version}-linux-amd64.tar.gz | | -| errdoc-{version}-linux-amd64.tar.gz | | -| dba-{version}-linux-amd64.tar.gz | | -| PCC-{version}-linux-amd64.tar.gz | | -| pump-{version}-linux-amd64.tar.gz | | -| drainer-{version}-linux-amd64.tar.gz | | +| package-{version}-linux-{arch}.tar.gz | | +| bench-{version}-linux-{arch}.tar.gz | | +| errdoc-{version}-linux-{arch}.tar.gz | | +| dba-{version}-linux-{arch}.tar.gz | | +| PCC-{version}-linux-{arch}.tar.gz | | +| pump-{version}-linux-{arch}.tar.gz | | +| drainer-{version}-linux-{arch}.tar.gz | | | binlogctl | New in v6.0.0 | | sync_diff_inspector | | | reparo | | | arbiter | | | mydumper | New in v6.0.0 | -| server-{version}-linux-amd64.tar.gz | New in v6.2.0 | -| grafana-{version}-linux-amd64.tar.gz | New in v6.2.0 | -| alertmanager-{version}-linux-amd64.tar.gz | New in v6.2.0 | -| prometheus-{version}-linux-amd64.tar.gz | New in v6.2.0 | -| blackbox_exporter-{version}-linux-amd64.tar.gz | New in v6.2.0 | -| node_exporter-{version}-linux-amd64.tar.gz | New in v6.2.0 | +| server-{version}-linux-{arch}.tar.gz | New in v6.2.0 | +| grafana-{version}-linux-{arch}.tar.gz | New in v6.2.0 | +| alertmanager-{version}-linux-{arch}.tar.gz | New in v6.2.0 | +| prometheus-{version}-linux-{arch}.tar.gz | New in v6.2.0 | +| blackbox_exporter-{version}-linux-{arch}.tar.gz | New in v6.2.0 | +| node_exporter-{version}-linux-{arch}.tar.gz | New in v6.2.0 | + +> **Note**: +> +> `{version}` depends on the version of the tool you are installing. `{arch}` depends on the architecture of the system, which can be `amd64` or `arm64`. ## See also diff --git a/download-ecosystem-tools.md b/download-ecosystem-tools.md index 2ce203c3fd826..92653b3d4c26f 100644 --- a/download-ecosystem-tools.md +++ b/download-ecosystem-tools.md @@ -18,17 +18,17 @@ TiDB Toolkit contains frequently used TiDB tools, such as data export tool Dumpl ## Environment requirements - Operating system: Linux -- Architecture: amd64 +- Architecture: amd64 or arm64 ## Download link You can download TiDB Toolkit from the following link: ``` -https://download.pingcap.org/tidb-community-toolkit-{version}-linux-amd64.tar.gz +https://download.pingcap.org/tidb-community-toolkit-{version}-linux-{arch}.tar.gz ``` -`{version}` in the link indicates the version number of TiDB. For example, the download link for `v6.2.0` is `https://download.pingcap.org/tidb-community-toolkit-v6.2.0-linux-amd64.tar.gz`. +`{version}` in the link indicates the version number of TiDB and `{arch}` indicates the architecture of the system, which can be `amd64` or `arm64`. For example, the download link for `v6.2.0` in the `amd64` architecture is `https://download.pingcap.org/tidb-community-toolkit-v6.2.0-linux-amd64.tar.gz`. ## TiDB Toolkit description @@ -36,14 +36,18 @@ Depending on which tools you want to use, you can install the corresponding offl | Tool | Offline package name | |:------|:----------| -| [TiUP](/tiup/tiup-overview.md) | `tiup-linux-amd64.tar.gz`
`tiup-{tiup-version}-linux-amd64.tar.gz`
`dm-{tiup-version}-linux-amd64.tar.gz`
`server-{version}-linux-amd64.tar.gz` | -| [Dumpling](/dumpling-overview.md) | `dumpling-{version}-linux-amd64.tar.gz` | -| [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) | `tidb-lightning-ctl`
`tidb-lightning-{version}-linux-amd64.tar.gz` | -| [TiDB Data Migration (DM)](/dm/dm-overview.md) | `dm-worker-{version}-linux-amd64.tar.gz`
`dm-master-{version}-linux-amd64.tar.gz`
`dmctl-{version}-linux-amd64.tar.gz` | -| [TiCDC](/ticdc/ticdc-overview.md) | `cdc-{version}-linux-amd64.tar.gz` | -| [TiDB Binlog](/tidb-binlog/tidb-binlog-overview.md) | `pump-{version}-linux-amd64.tar.gz`
`drainer-{version}-linux-amd64.tar.gz`
`binlogctl`
`reparo` | -| [Backup & Restore (BR)](/br/backup-and-restore-overview.md) | `br-{version}-linux-amd64.tar.gz` | +| [TiUP](/tiup/tiup-overview.md) | `tiup-linux-{arch}.tar.gz`
`tiup-{tiup-version}-linux-{arch}.tar.gz`
`dm-{tiup-version}-linux-{arch}.tar.gz`
`server-{version}-linux-{arch}.tar.gz` | +| [Dumpling](/dumpling-overview.md) | `dumpling-{version}-linux-{arch}.tar.gz` | +| [TiDB Lightning](/tidb-lightning/tidb-lightning-overview.md) | `tidb-lightning-ctl`
`tidb-lightning-{version}-linux-{arch}.tar.gz` | +| [TiDB Data Migration (DM)](/dm/dm-overview.md) | `dm-worker-{version}-linux-{arch}.tar.gz`
`dm-master-{version}-linux-{arch}.tar.gz`
`dmctl-{version}-linux-{arch}.tar.gz` | +| [TiCDC](/ticdc/ticdc-overview.md) | `cdc-{version}-linux-{arch}.tar.gz` | +| [TiDB Binlog](/tidb-binlog/tidb-binlog-overview.md) | `pump-{version}-linux-{arch}.tar.gz`
`drainer-{version}-linux-{arch}.tar.gz`
`binlogctl`
`reparo` | +| [Backup & Restore (BR)](/br/backup-and-restore-overview.md) | `br-{version}-linux-{arch}.tar.gz` | | [sync-diff-inspector](/sync-diff-inspector/sync-diff-inspector-overview.md) | `sync_diff_inspector` | | [TiSpark](/tispark-overview.md) | `tispark-{tispark-version}-any-any.tar.gz`
`spark-{spark-version}-any-any.tar.gz` | -| [PD Control](/pd-control.md) | `pd-recover-{version}-linux-amd64.tar` | +| [PD Control](/pd-control.md) | `pd-recover-{version}-linux-{arch}.tar` | | [PD Recover](/pd-recover.md) | `etcdctl` | + +> **Note**: +> +> `{version}` depends on the version of the tool you are installing. `{arch}` depends on the architecture of the system, which can be `amd64` or `arm64`. \ No newline at end of file From 1e5d44413fee29909e3286f90a7f27618acd2263 Mon Sep 17 00:00:00 2001 From: Aolin Date: Tue, 13 Sep 2022 10:46:58 +0800 Subject: [PATCH 13/15] Add docs for regexp functions (#10352) --- functions-and-operators/string-functions.md | 25 ++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index c9d2539fb8a58..69ae038d0b697 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -6,7 +6,7 @@ aliases: ['/docs/dev/functions-and-operators/string-functions/','/docs/dev/refer # String Functions -TiDB supports most of the [string functions](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html) available in MySQL 5.7 and some of the [functions](https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlqr/SQL-Functions.html#GUID-93EC62F8-415D-4A7E-B050-5D5B2C127009) available in Oracle 21. +TiDB supports most of the [string functions](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html) available in MySQL 5.7, some of the [string functions](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html) available in MySQL 8.0, and some of the [functions](https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlqr/SQL-Functions.html#GUID-93EC62F8-415D-4A7E-B050-5D5B2C127009) available in Oracle 21. ## Supported functions @@ -47,6 +47,10 @@ TiDB supports most of the [string functions](https://dev.mysql.com/doc/refman/5. | [`POSITION()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_position) | Synonym for `LOCATE()` | | [`QUOTE()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_quote) | Escape the argument for use in an SQL statement | | [`REGEXP`](https://dev.mysql.com/doc/refman/5.7/en/regexp.html#operator_regexp) | Pattern matching using regular expressions | +| [`REGEXP_INSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-instr) | Return the starting index of the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)) | +| [`REGEXP_LIKE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like) | Whether the string matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)) | +| [`REGEXP_REPLACE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace) | Replace substrings that match the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)) | +| [`REGEXP_SUBSTR()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr) | Return the substring that matches the regular expression (Partly compatible with MySQL. For more details, see [Regular expression compatibility with MySQL](#regular-expression-compatibility-with-mysql)) | | [`REPEAT()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_repeat) | Repeat a string the specified number of times | | [`REPLACE()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_replace) | Replace occurrences of a specified string | | [`REVERSE()`](https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_reverse) | Reverse the characters in a string | @@ -73,3 +77,22 @@ TiDB supports most of the [string functions](https://dev.mysql.com/doc/refman/5. * `SOUNDEX()` * `SOUNDS LIKE` * `WEIGHT_STRING()` + +## Regular expression compatibility with MySQL + +The following sections describe the regular expression compatibility with MySQL. + +### Syntax compatibility + +MySQL implements regular expression using International Components for Unicode (ICU), and TiDB uses RE2. To learn the syntax differences between the two libraries, you can refer to the [ICU documentation](https://unicode-org.github.io/icu/userguide/) and [RE2 Syntax](https://github.com/google/re2/wiki/Syntax). + +### `match_type` compatibility + +The value options of `match_type` between TiDB and MySQL are: + +- Value options in TiDB are `"c"`, `"i"`, `"m"`, and `"s"`, and value options in MySQL are `"c"`, `"i"`, `"m"`, `"n"`, and `"u"`. +- The `"s"` in TiDB corresponds to `"n"` in MySQL. When `"s"` is set in TiDB, the `.` character also matches line terminators (`\n`). + + For example, the `SELECT REGEXP_LIKE(a, b, "n") FROM t1` in MySQL is the same as the `SELECT REGEXP_LIKE(a, b, "s") FROM t1` in TiDB. + +- TiDB does not support `"u"`, which means Unix-only line endings in MySQL. From 00b23c124d7783a663ce1397734223cc65c265c1 Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 13 Sep 2022 14:40:59 +0800 Subject: [PATCH 14/15] lightning: split troubleshoot from faq (#10309) --- TOC.md | 1 + tidb-lightning/tidb-lightning-faq.md | 204 +----------------- tidb-lightning/tidb-lightning-glossary.md | 4 +- tidb-lightning/troubleshoot-tidb-lightning.md | 203 +++++++++++++++++ tidb-troubleshooting-map.md | 10 +- 5 files changed, 218 insertions(+), 204 deletions(-) create mode 100644 tidb-lightning/troubleshoot-tidb-lightning.md diff --git a/TOC.md b/TOC.md index a8abc775b7c88..0a52638e8d85c 100644 --- a/TOC.md +++ b/TOC.md @@ -371,6 +371,7 @@ - [Configure](/tidb-lightning/tidb-lightning-configuration.md) - [Monitor](/tidb-lightning/monitor-tidb-lightning.md) - [FAQ](/tidb-lightning/tidb-lightning-faq.md) + - [Troubleshooting](/tidb-lightning/troubleshoot-tidb-lightning.md) - [Glossary](/tidb-lightning/tidb-lightning-glossary.md) - TiDB Data Migration - [About TiDB Data Migration](/dm/dm-overview.md) diff --git a/tidb-lightning/tidb-lightning-faq.md b/tidb-lightning/tidb-lightning-faq.md index 99412ff71f490..2aabb468c302b 100644 --- a/tidb-lightning/tidb-lightning-faq.md +++ b/tidb-lightning/tidb-lightning-faq.md @@ -1,11 +1,13 @@ --- title: TiDB Lightning FAQs summary: Learn about the frequently asked questions (FAQs) and answers about TiDB Lightning. -aliases: ['/docs/dev/tidb-lightning/tidb-lightning-faq/','/docs/dev/faq/tidb-lightning/','/docs/dev/troubleshoot-tidb-lightning/','/docs/dev/how-to/troubleshoot/tidb-lightning/','/docs/dev/tidb-lightning/tidb-lightning-misuse-handling/','/docs/dev/reference/tools/error-case-handling/lightning-misuse-handling/','/tidb/dev/tidb-lightning-misuse-handling','/tidb/dev/troubleshoot-tidb-lightning'] +aliases: ['/docs/dev/tidb-lightning/tidb-lightning-faq/','/docs/dev/faq/tidb-lightning/'] --- # TiDB Lightning FAQs +This document lists the frequently asked questions (FAQs) and answers about TiDB Lightning. + ## What is the minimum TiDB/TiKV/PD cluster version supported by TiDB Lightning? The version of TiDB Lightning should be the same as the cluster. If you use the Local-backend mode, the earliest available version is 4.0.0. If you use the Importer-backend mode or the TiDB-backend mode, the earliest available version is 2.0.9, but it is recommended to use the 3.0 stable version. @@ -41,7 +43,7 @@ If `tikv-importer` needs to be restarted: 4. Start `tikv-importer`. 5. Start `tidb-lightning` *and wait until the program fails with CHECKSUM error, if any*. * Restarting `tikv-importer` would destroy all engine files still being written, but `tidb-lightning` did not know about it. As of v3.0 the simplest way is to let `tidb-lightning` go on and retry. -6. [Destroy the failed tables and checkpoints](#checkpoint-for--has-invalid-status-error-code) +6. [Destroy the failed tables and checkpoints](/tidb-lightning/troubleshoot-tidb-lightning.md#checkpoint-for--has-invalid-status-error-code) 7. Start `tidb-lightning` again. If you are using Local-backend or TiDB-backend, the operations are the same as those of using Importer-backend when the `tikv-importer` is still running. @@ -103,41 +105,11 @@ To stop the `tidb-lightning` process, you can choose the corresponding operation - For manual deployment: if `tidb-lightning` is running in foreground, press Ctrl+C to exit. Otherwise, obtain the process ID using the `ps aux | grep tidb-lighting` command and then terminate the process using the `kill -2 ${PID}` command. -## Why the `tidb-lightning` process suddenly quits while running in background? - -It is potentially caused by starting `tidb-lightning` incorrectly, which causes the system to send a SIGHUP signal to stop the `tidb-lightning` process. In this situation, `tidb-lightning.log` usually outputs the following log: - -``` -[2018/08/10 07:29:08.310 +08:00] [INFO] [main.go:41] ["got signal to exit"] [signal=hangup] -``` - -It is not recommended to directly use `nohup` in the command line to start `tidb-lightning`. You can [start `tidb-lightning`](/tidb-lightning/deploy-tidb-lightning.md#step-3-start-tidb-lightning) by executing a script. - -In addition, if the last log of TiDB Lightning shows that the error is "Context canceled", you need to search for the first "ERROR" level log. This "ERROR" level log is usually followed by "got signal to exit", which indicates that TiDB Lightning received an interrupt signal and then exited. - -## Why my TiDB cluster is using lots of CPU resources and running very slowly after using TiDB Lightning? - -If `tidb-lightning` abnormally exited, the cluster might be stuck in the "import mode", which is not suitable for production. The current mode can be retrieved using the following command: - -{{< copyable "shell-regular" >}} - -```sh -tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode -``` - -You can force the cluster back to "normal mode" using the following command: - -{{< copyable "shell-regular" >}} - -```sh -tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode -``` - ## Can TiDB Lightning be used with 1-Gigabit network card? -The TiDB Lightning toolset is best used with a 10-Gigabit network card. 1-Gigabit network cards are *not recommended*, especially for `tikv-importer`. +TiDB Lightning is best used with a 10-Gigabit network card. -1-Gigabit network cards can only provide a total bandwidth of 120 MB/s, which has to be shared among all target TiKV stores. TiDB Lightning can easily saturate all bandwidth of the 1-Gigabit network and bring down the cluster because PD is unable to be contacted anymore. +1-Gigabit network cards can only provide a total bandwidth of 120 MB/s, which has to be shared among all target TiKV stores. TiDB Lightning can easily saturate all bandwidth of the 1-Gigabit network in physical import mode and bring down the cluster because PD is unable to be contacted anymore. ## Why TiDB Lightning requires so much free space in the target TiKV cluster? @@ -181,168 +153,6 @@ See also [How to properly restart TiDB Lightning?](#how-to-properly-restart-tidb DROP DATABASE IF EXISTS `lightning_metadata`; ``` -## Why does TiDB Lightning report the `could not find first pair, this shouldn't happen` error? - -This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during the import. - -## Import speed is too slow - -Normally it takes TiDB Lightning 2 minutes per thread to import a 256 MB data file. If the speed is much slower than this, there is an error. You can check the time taken for each data file from the log mentioning `restore chunk … takes`. This can also be observed from metrics on Grafana. - -There are several reasons why TiDB Lightning becomes slow: - -**Cause 1**: `region-concurrency` is set too high, which causes thread contention and reduces performance. - -1. The setting can be found from the start of the log by searching `region-concurrency`. -2. If TiDB Lightning shares the same machine with other services (for example, TiKV Importer), `region-concurrency` must be **manually** set to 75% of the total number of CPU cores. -3. If there is a quota on CPU (for example, limited by Kubernetes settings), TiDB Lightning may not be able to read this out. In this case, `region-concurrency` must also be **manually** reduced. - -**Cause 2**: The table schema is too complex. - -Every additional index introduces a new KV pair for each row. If there are N indices, the actual size to be imported would be approximately (N+1) times the size of the Dumpling output. If the indices are negligible, you may first remove them from the schema, and add them back using `CREATE INDEX` after the import is complete. - -**Cause 3**: Each file is too large. - -TiDB Lightning works the best when the data source is broken down into multiple files of size around 256 MB so that the data can be processed in parallel. If each file is too large, TiDB Lightning might not respond. - -If the data source is CSV, and all CSV files have no fields containing newline control characters (U+000A and U+000D), you can turn on "strict format" to let TiDB Lightning automatically split the large files. - -```toml -[mydumper] -strict-format = true -``` - -**Cause 4**: TiDB Lightning is too old. - -Try the latest version! Maybe there is new speed improvement. - -## `checksum failed: checksum mismatched remote vs local` - -**Cause**: The checksum of a table in the local data source and the remote imported database differ. This error has several deeper reasons. You can further locate the reason by checking the log that contains `checksum mismatched`. - -The lines that contain `checksum mismatched` provide the information `total_kvs: x vs y`, where `x` indicates the number of key-value pairs (KV pairs) calculated by the target cluster after the import is completed, and `y` indicates the number of key-value pairs generated by the local data source. - -- If `x` is greater, it means that there are more KV pairs in the target cluster. - - It is possible that this table is not empty before the import and therefore affects the data checksum. It is also possible that TiDB Lightning has previously failed and shut down, but did not restart correctly. -- If `y` is greater, it means that there are more KV pairs in the local data source. - - If the checksum of the target database is all 0, it means that no import has occurred. It is possible that the cluster is too busy to receive any data. - - It is possible that the exported data contains duplicate data, such as the UNIQUE and PRIMARY KEYs with duplicate values, or that the downstream table structure is case-insensitive while the data is case-sensitive. -- Other possible reasons - - If the data source is machine-generated and not backed up by Dumpling, make sure the data conforms to the table limits. For example, the AUTO_INCREMENT column needs to be positive and not 0. - -**Solutions**: - -1. Delete the corrupted data using `tidb-lightning-ctl`, check the table structure and the data, and restart TiDB Lightning to import the affected tables again. - - {{< copyable "shell-regular" >}} - - ```sh - tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all - ``` - -2. Consider using an external database to store the checkpoints (change `[checkpoint] dsn`) to reduce the target database's load. - -3. If TiDB Lightning was improperly restarted, see also the "[How to properly restart TiDB Lightning](#how-to-properly-restart-tidb-lightning)" section in the FAQ. - -## `Checkpoint for … has invalid status:` (error code) - -**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. - -The error code is an integer smaller than 25, with possible values of 0, 3, 6, 9, 12, 14, 15, 17, 18, 20, and 21. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later step the exit occurs at. - -**Solutions**: - -If the error was caused by invalid data source, delete the imported data using `tidb-lightning-ctl` and start Lightning again. - -```sh -tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all -``` - -See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#checkpoints-control) section for other options. - -## `ResourceTemporarilyUnavailable("Too many open engines …: …")` - -**Cause**: The number of concurrent engine files exceeds the limit specified by `tikv-importer`. This could be caused by misconfiguration. Additionally, if `tidb-lightning` exited abnormally, an engine file might be left at a dangling open state, which could cause this error as well. - -**Solutions**: - -1. Increase the value of `max-open-engines` setting in `tikv-importer.toml`. This value is typically dictated by the available memory. This could be calculated by using: - - Max Memory Usage ≈ `max-open-engines` × `write-buffer-size` × `max-write-buffer-number` - -2. Decrease the value of `table-concurrency` + `index-concurrency` so it is less than `max-open-engines`. - -3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires TiDB Lightning to clear the outdated checkpoints. - - ```sh - tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all - ``` - -## `cannot guess encoding for input file, please convert to UTF-8 manually` - -**Cause**: TiDB Lightning only recognizes the UTF-8 and GB-18030 encodings for the table schemas. This error is emitted if the file isn't in any of these encodings. It is also possible that the file has mixed encoding, such as containing a string in UTF-8 and another string in GB-18030, due to historical `ALTER TABLE` executions. - -**Solutions**: - -1. Fix the schema so that the file is entirely in either UTF-8 or GB-18030. - -2. Manually `CREATE` the affected tables in the target database. - -3. Set `[mydumper] character-set = "binary"` to skip the check. Note that this might introduce mojibake into the target database. - -## `[sql2kv] sql encode error = [types:1292]invalid time format: '{1970 1 1 …}'` - -**Cause**: A table contains a column with the `timestamp` type, but the time value itself does not exist. This is either because of DST changes or the time value has exceeded the supported range (Jan 1, 1970 to Jan 19, 2038). - -**Solutions**: - -1. Ensure TiDB Lightning and the source database are using the same time zone. - - When executing TiDB Lightning directly, the time zone can be forced using the `$TZ` environment variable. - - ```sh - # Manual deployment, and force Asia/Shanghai. - TZ='Asia/Shanghai' bin/tidb-lightning -config tidb-lightning.toml - ``` - -2. When exporting data using Mydumper, make sure to include the `--skip-tz-utc` flag. - -3. Ensure the entire cluster is using the same and latest version of `tzdata` (version 2018i or above). - - On CentOS, run `yum info tzdata` to check the installed version and whether there is an update. Run `yum upgrade tzdata` to upgrade the package. - -## `[Error 8025: entry too large, the max entry size is 6291456]` - -**Cause**: A single row of key-value pairs generated by TiDB Lightning exceeds the limit set by TiDB. - -**Solution**: - -Currently, the limitation of TiDB cannot be bypassed. You can only ignore this table to ensure the successful import of other tables. - -## Encounter `rpc error: code = Unimplemented ...` when TiDB Lightning switches the mode - -**Cause**: Some node(s) in the cluster does not support `switch-mode`. For example, if the TiFlash version is earlier than `v4.0.0-rc.2`, [`switch-mode` is not supported](https://github.com/pingcap/tidb-lightning/issues/273). - -**Solutions**: - -- If there are TiFlash nodes in the cluster, you can update the cluster to `v4.0.0-rc.2` or higher versions. -- Temporarily disable TiFlash if you do not want to upgrade the cluster. - -## `tidb lightning encountered error: TiDB version too old, expected '>=4.0.0', found '3.0.18'` - -TiDB Lightning Local-backend only supports importing data to TiDB clusters of v4.0.0 and later versions. If you try to use Local-backend to import data to a v2.x or v3.x cluster, the above error is reported. At this time, you can modify the configuration to use Importer-backend or TiDB-backend for data import. - -Some `nightly` versions might be similar to v4.0.0-beta.2. These `nightly` versions of TiDB Lightning actually support Local-backend. If you encounter this error when using a `nightly` version, you can skip the version check by setting the configuration `check-requirements = false`. Before setting this parameter, make sure that the configuration of TiDB Lightning supports the corresponding version; otherwise, the import might fail. - -## `restore table test.district failed: unknown columns in header [...]` - -This error occurs usually because the CSV data file does not contain a header (the first row is not column names but data). Therefore, you need to add the following configuration to the TiDB Lightning configuration file: - -``` -[mydumper.csv] -header = false -``` - ## How to get the runtime goroutine information of TiDB Lightning 1. If [`status-port`](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-configuration) has been specified in the configuration file of TiDB Lightning, skip this step. Otherwise, you need to send the USR1 signal to TiDB Lightning to enable `status-port`. @@ -357,4 +167,4 @@ header = false Check the log of TiDB Lightning. The log of `starting HTTP server` / `start HTTP server` / `started HTTP server` shows the newly enabled `status-port`. -2. Access `http://:/debug/pprof/goroutine?debug=2` to get the goroutine information. \ No newline at end of file +2. Access `http://:/debug/pprof/goroutine?debug=2` to get the goroutine information. diff --git a/tidb-lightning/tidb-lightning-glossary.md b/tidb-lightning/tidb-lightning-glossary.md index 80c423c24551d..420d2d7327871 100644 --- a/tidb-lightning/tidb-lightning-glossary.md +++ b/tidb-lightning/tidb-lightning-glossary.md @@ -52,7 +52,7 @@ In TiDB Lightning, the checksum of a table is a set of 3 numbers calculated from TiDB Lightning [validates the imported data](/tidb-lightning/tidb-lightning-faq.md#how-to-ensure-the-integrity-of-the-imported-data) by comparing the [local](/tidb-lightning/tidb-lightning-glossary.md#local-checksum) and [remote checksums](/tidb-lightning/tidb-lightning-glossary.md#remote-checksum) of every table. The program would stop if any pair does not match. You can skip this check by setting the `post-restore.checksum` configuration to `false`. -See also the [FAQs](/tidb-lightning/tidb-lightning-faq.md#checksum-failed-checksum-mismatched-remote-vs-local) for how to properly handle checksum mismatch. +See also the [FAQs](/tidb-lightning/troubleshoot-tidb-lightning.md#checksum-failed-checksum-mismatched-remote-vs-local) for how to properly handle checksum mismatch. ### Chunk @@ -114,7 +114,7 @@ See [Table Filter](/table-filter.md) for details. A configuration that optimizes TiKV for writing at the cost of degraded read speed and space usage. -TiDB Lightning automatically switches to and off the import mode while running. However, if TiKV gets stuck in import mode, you can use `tidb-lightning-ctl` to [force revert](/tidb-lightning/tidb-lightning-faq.md#why-my-tidb-cluster-is-using-lots-of-cpu-resources-and-running-very-slowly-after-using-tidb-lightning) to [normal mode](/tidb-lightning/tidb-lightning-glossary.md#normal-mode). +TiDB Lightning automatically switches to and off the import mode while running. However, if TiKV gets stuck in import mode, you can use `tidb-lightning-ctl` to [force revert](/tidb-lightning/troubleshoot-tidb-lightning.md#the-tidb-cluster-uses-lots-of-cpu-resources-and-runs-very-slowly-after-using-tidb-lightning) to [normal mode](/tidb-lightning/tidb-lightning-glossary.md#normal-mode). ### Index engine diff --git a/tidb-lightning/troubleshoot-tidb-lightning.md b/tidb-lightning/troubleshoot-tidb-lightning.md new file mode 100644 index 0000000000000..10a9581e54466 --- /dev/null +++ b/tidb-lightning/troubleshoot-tidb-lightning.md @@ -0,0 +1,203 @@ +--- +title: Troubleshoot TiDB Lightning +summary: Learn the common problems you might encounter when you use TiDB Lightning and their solutions. +aliases: ['/docs/dev/troubleshoot-tidb-lightning/','/docs/dev/how-to/troubleshoot/tidb-lightning/','/docs/dev/tidb-lightning/tidb-lightning-misuse-handling/','/docs/dev/reference/tools/error-case-handling/lightning-misuse-handling/','/tidb/dev/tidb-lightning-misuse-handling','/tidb/dev/troubleshoot-tidb-lightning'] +--- + +# Troubleshoot TiDB Lightning + +This document summarizes the common problems you might encounter when you use TiDB Lightning and their solutions. + +## Import speed is too slow + +Normally it takes 2 minutes per thread for TiDB Lightning to import a 256 MB data file. If the speed is much slower than this, there is an error. You can check the time taken for each data file from the log mentioning `restore chunk … takes`. This can also be observed from metrics on Grafana. + +There are several reasons why TiDB Lightning becomes slow: + +**Cause 1**: `region-concurrency` is set too high, which causes thread contention and reduces performance. + +1. The setting can be found from the start of the log by searching `region-concurrency`. +2. If TiDB Lightning shares the same machine with other services (for example, TiKV Importer), `region-concurrency` must be **manually** set to 75% of the total number of CPU cores. +3. If there is a quota on CPU (for example, limited by Kubernetes settings), TiDB Lightning may not be able to read this out. In this case, `region-concurrency` must also be **manually** reduced. + +**Cause 2**: The table schema is too complex. + +Every additional index introduces a new KV pair for each row. If there are N indices, the actual size to be imported would be approximately (N+1) times the size of the Dumpling output. If the indices are negligible, you may first remove them from the schema, and add them back using `CREATE INDEX` after the import is complete. + +**Cause 3**: Each file is too large. + +TiDB Lightning works the best when the data source is broken down into multiple files of size around 256 MB so that the data can be processed in parallel. If each file is too large, TiDB Lightning might not respond. + +If the data source is CSV, and all CSV files have no fields containing newline control characters (U+000A and U+000D), you can turn on "strict format" to let TiDB Lightning automatically split the large files. + +```toml +[mydumper] +strict-format = true +``` + +**Cause 4**: TiDB Lightning is too old. + +Try the latest version. Maybe there is new speed improvement. + +## The `tidb-lightning` process suddenly quits while running in background + +It is potentially caused by starting `tidb-lightning` incorrectly, which causes the system to send a SIGHUP signal to stop the `tidb-lightning` process. In this situation, `tidb-lightning.log` usually outputs the following log: + +``` +[2018/08/10 07:29:08.310 +08:00] [INFO] [main.go:41] ["got signal to exit"] [signal=hangup] +``` + +It is not recommended to directly use `nohup` in the command line to start `tidb-lightning`. You can [start `tidb-lightning`](/tidb-lightning/deploy-tidb-lightning.md#step-3-start-tidb-lightning) by executing a script. + +In addition, if the last log of TiDB Lightning shows that the error is "Context canceled", you need to search for the first "ERROR" level log. This "ERROR" level log is usually followed by "got signal to exit", which indicates that TiDB Lightning received an interrupt signal and then exited. + +## The TiDB cluster uses lots of CPU resources and runs very slowly after using TiDB Lightning + +If `tidb-lightning` abnormally exited, the cluster might be stuck in the "import mode", which is not suitable for production. The current mode can be retrieved using the following command: + +{{< copyable "shell-regular" >}} + +```sh +tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode +``` + +You can force the cluster back to "normal mode" using the following command: + +{{< copyable "shell-regular" >}} + +```sh +tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode +``` + +## TiDB Lightning reports an error + +### `could not find first pair, this shouldn't happen` + +This error occurs possibly because the number of files opened by TiDB Lightning exceeds the system limit when TiDB Lightning reads the sorted local files. In the Linux system, you can use the `ulimit -n` command to confirm whether the value of this system limit is too small. It is recommended that you adjust this value to `1000000` (`ulimit -n 1000000`) during the import. + +### `checksum failed: checksum mismatched remote vs local` + +**Cause**: The checksum of a table in the local data source and the remote imported database differ. This error has several deeper reasons. You can further locate the reason by checking the log that contains `checksum mismatched`. + +The lines that contain `checksum mismatched` provide the information `total_kvs: x vs y`, where `x` indicates the number of key-value pairs (KV pairs) calculated by the target cluster after the import is completed, and `y` indicates the number of key-value pairs generated by the local data source. + +- If `x` is greater, it means that there are more KV pairs in the target cluster. + - It is possible that this table is not empty before the import and therefore affects the data checksum. It is also possible that TiDB Lightning has previously failed and shut down, but did not restart correctly. +- If `y` is greater, it means that there are more KV pairs in the local data source. + - If the checksum of the target database is all 0, it means that no import has occurred. It is possible that the cluster is too busy to receive any data. + - It is possible that the exported data contains duplicate data, such as the UNIQUE and PRIMARY KEYs with duplicate values, or that the downstream table structure is case-insensitive while the data is case-sensitive. +- Other possible reasons + - If the data source is machine-generated and not backed up by Dumpling, make sure the data conforms to the table limits. For example, the AUTO_INCREMENT column needs to be positive and not 0. + +**Solutions**: + +1. Delete the corrupted data using `tidb-lightning-ctl`, check the table structure and the data, and restart TiDB Lightning to import the affected tables again. + + {{< copyable "shell-regular" >}} + + ```sh + tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all + ``` + +2. Consider using an external database to store the checkpoints (change `[checkpoint] dsn`) to reduce the target database's load. + +3. If TiDB Lightning was improperly restarted, see also the "[How to properly restart TiDB Lightning](/tidb-lightning/tidb-lightning-faq.md#how-to-properly-restart-tidb-lightning)" section in the FAQ. + +### `Checkpoint for … has invalid status:` (error code) + +**Cause**: [Checkpoint](/tidb-lightning/tidb-lightning-checkpoints.md) is enabled, and TiDB Lightning or TiKV Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. + +The error code is an integer smaller than 25, with possible values of 0, 3, 6, 9, 12, 14, 15, 17, 18, 20, and 21. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later step the exit occurs at. + +**Solutions**: + +If the error was caused by invalid data source, delete the imported data using `tidb-lightning-ctl` and start Lightning again. + +```sh +tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all +``` + +See the [Checkpoints control](/tidb-lightning/tidb-lightning-checkpoints.md#checkpoints-control) section for other options. + +### `ResourceTemporarilyUnavailable("Too many open engines …: …")` + +**Cause**: The number of concurrent engine files exceeds the limit specified by `tikv-importer`. This could be caused by misconfiguration. Additionally, if `tidb-lightning` exited abnormally, an engine file might be left at a dangling open state, which could cause this error as well. + +**Solutions**: + +1. Increase the value of `max-open-engines` setting in `tikv-importer.toml`. This value is typically dictated by the available memory. This could be calculated by using: + + Max Memory Usage ≈ `max-open-engines` × `write-buffer-size` × `max-write-buffer-number` + +2. Decrease the value of `table-concurrency` + `index-concurrency` so it is less than `max-open-engines`. + +3. Restart `tikv-importer` to forcefully remove all engine files (default to `./data.import/`). This also removes all partially imported tables, which requires TiDB Lightning to clear the outdated checkpoints. + + ```sh + tidb-lightning-ctl --config conf/tidb-lightning.toml --checkpoint-error-destroy=all + ``` + +### `cannot guess encoding for input file, please convert to UTF-8 manually` + +**Cause**: TiDB Lightning only recognizes the UTF-8 and GB-18030 encodings for the table schemas. This error is emitted if the file isn't in any of these encodings. It is also possible that the file has mixed encoding, such as containing a string in UTF-8 and another string in GB-18030, due to historical `ALTER TABLE` executions. + +**Solutions**: + +1. Fix the schema so that the file is entirely in either UTF-8 or GB-18030. + +2. Manually `CREATE` the affected tables in the target database. + +3. Set `[mydumper] character-set = "binary"` to skip the check. Note that this might introduce mojibake into the target database. + +### `[sql2kv] sql encode error = [types:1292]invalid time format: '{1970 1 1 …}'` + +**Cause**: A table contains a column with the `timestamp` type, but the time value itself does not exist. This is either because of DST changes or the time value has exceeded the supported range (Jan 1, 1970 to Jan 19, 2038). + +**Solutions**: + +1. Ensure TiDB Lightning and the source database are using the same time zone. + + When executing TiDB Lightning directly, the time zone can be forced using the `$TZ` environment variable. + + ```sh + # Manual deployment, and force Asia/Shanghai. + TZ='Asia/Shanghai' bin/tidb-lightning -config tidb-lightning.toml + ``` + +2. When exporting data using Mydumper, make sure to include the `--skip-tz-utc` flag. + +3. Ensure the entire cluster is using the same and latest version of `tzdata` (version 2018i or above). + + On CentOS, run `yum info tzdata` to check the installed version and whether there is an update. Run `yum upgrade tzdata` to upgrade the package. + +### `[Error 8025: entry too large, the max entry size is 6291456]` + +**Cause**: A single row of key-value pairs generated by TiDB Lightning exceeds the limit set by TiDB. + +**Solution**: + +Currently, the limitation of TiDB cannot be bypassed. You can only ignore this table to ensure the successful import of other tables. + +### Encounter `rpc error: code = Unimplemented ...` when TiDB Lightning switches the mode + +**Cause**: Some node(s) in the cluster does not support `switch-mode`. For example, if the TiFlash version is earlier than `v4.0.0-rc.2`, [`switch-mode` is not supported](https://github.com/pingcap/tidb-lightning/issues/273). + +**Solutions**: + +- If there are TiFlash nodes in the cluster, you can update the cluster to `v4.0.0-rc.2` or higher versions. +- Temporarily disable TiFlash if you do not want to upgrade the cluster. + +### `tidb lightning encountered error: TiDB version too old, expected '>=4.0.0', found '3.0.18'` + +TiDB Lightning Local-backend only supports importing data to TiDB clusters of v4.0.0 and later versions. If you try to use Local-backend to import data to a v2.x or v3.x cluster, the above error is reported. At this time, you can modify the configuration to use Importer-backend or TiDB-backend for data import. + +Some `nightly` versions might be similar to v4.0.0-beta.2. These `nightly` versions of TiDB Lightning actually support Local-backend. If you encounter this error when using a `nightly` version, you can skip the version check by setting the configuration `check-requirements = false`. Before setting this parameter, make sure that the configuration of TiDB Lightning supports the corresponding version; otherwise, the import might fail. + +### `restore table test.district failed: unknown columns in header [...]` + +This error occurs usually because the CSV data file does not contain a header (the first row is not column names but data). Therefore, you need to add the following configuration to the TiDB Lightning configuration file: + +``` +[mydumper.csv] +header = false +``` diff --git a/tidb-troubleshooting-map.md b/tidb-troubleshooting-map.md index 534fc6df5f3e5..3467ea263d072 100644 --- a/tidb-troubleshooting-map.md +++ b/tidb-troubleshooting-map.md @@ -519,30 +519,30 @@ Check the specific cause for busy by viewing the monitor **Grafana** -> **TiKV** - `AUTO_INCREMENT` columns need to be positive, and do not contain the value "0". - UNIQUE and PRIMARY KEYs must not have duplicate entries. - - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#checksum-failed-checksum-mismatched-remote-vs-local). + - Solution: See [Troubleshooting Solution](/tidb-lightning/troubleshoot-tidb-lightning.md#checksum-failed-checksum-mismatched-remote-vs-local). - 6.3.4 `Checkpoint for … has invalid status:(error code)` - Cause: Checkpoint is enabled, and Lightning/Importer has previously abnormally exited. To prevent accidental data corruption, TiDB Lightning will not start until the error is addressed. The error code is an integer less than 25, with possible values as `0, 3, 6, 9, 12, 14, 15, 17, 18, 20 and 21`. The integer indicates the step where the unexpected exit occurs in the import process. The larger the integer is, the later the exit occurs. - - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#checkpoint-for--has-invalid-status-error-code). + - Solution: See [Troubleshooting Solution](/tidb-lightning/troubleshoot-tidb-lightning.md#checkpoint-for--has-invalid-status-error-code). - 6.3.5 `ResourceTemporarilyUnavailable("Too many open engines …: 8")` - Cause: The number of concurrent engine files exceeds the limit specified by tikv-importer. This could be caused by misconfiguration. In addition, even when the configuration is correct, if tidb-lightning has exited abnormally before, an engine file might be left at a dangling open state, which could cause this error as well. - - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#resourcetemporarilyunavailabletoo-many-open-engines--). + - Solution: See [Troubleshooting Solution](/tidb-lightning/troubleshoot-tidb-lightning.md#resourcetemporarilyunavailabletoo-many-open-engines--). - 6.3.6 `cannot guess encoding for input file, please convert to UTF-8 manually` - Cause: TiDB Lightning only supports the UTF-8 and GB-18030 encodings. This error means the file is not in any of these encodings. It is also possible that the file has mixed encoding, such as containing a string in UTF-8 and another string in GB-18030, due to historical ALTER TABLE executions. - - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#cannot-guess-encoding-for-input-file-please-convert-to-utf-8-manually). + - Solution: See [Troubleshooting Solution](/tidb-lightning/troubleshoot-tidb-lightning.md#cannot-guess-encoding-for-input-file-please-convert-to-utf-8-manually). - 6.3.7 `[sql2kv] sql encode error = [types:1292]invalid time format: '{1970 1 1 0 45 0 0}'` - Cause: A timestamp type entry has a time value that does not exist. This is either because of DST changes or because the time value has exceeded the supported range (from Jan 1, 1970 to Jan 19, 2038). - - Solution: See [Troubleshooting Solution](/tidb-lightning/tidb-lightning-faq.md#sql2kv-sql-encode-error--types1292invalid-time-format-1970-1-1-). + - Solution: See [Troubleshooting Solution](/tidb-lightning/troubleshoot-tidb-lightning.md#sql2kv-sql-encode-error--types1292invalid-time-format-1970-1-1-). ## 7. Common log analysis From 198d38521eb05e682d1fb9ae7f32136e277145bc Mon Sep 17 00:00:00 2001 From: Ran Date: Tue, 13 Sep 2022 15:14:59 +0800 Subject: [PATCH 15/15] lightning: add lightning data source (#10261) --- TOC.md | 7 +- tidb-lightning/tidb-lightning-data-source.md | 326 +++++++++++++++++++ 2 files changed, 332 insertions(+), 1 deletion(-) create mode 100644 tidb-lightning/tidb-lightning-data-source.md diff --git a/TOC.md b/TOC.md index 0a52638e8d85c..be7d6a013f936 100644 --- a/TOC.md +++ b/TOC.md @@ -359,13 +359,18 @@ - Key Features - [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md) - [Table Filter](/table-filter.md) - - [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md) - [Backends](/tidb-lightning/tidb-lightning-backends.md) - [Physical Import Mode](/tidb-lightning/tidb-lightning-physical-import-mode.md) - [Physical Import Mode Usage](/tidb-lightning/tidb-lightning-physical-import-mode-usage.md) - [Import Data in Parallel](/tidb-lightning/tidb-lightning-distributed-import.md) - [Error Resolution](/tidb-lightning/tidb-lightning-error-resolution.md) - [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md) + - Data Sources + - [Data Match Rules](/tidb-lightning/tidb-lightning-data-source.md) + - [CSV](/tidb-lightning/tidb-lightning-data-source.md#csv) + - [SQL](/tidb-lightning/tidb-lightning-data-source.md#sql) + - [Parquet](/tidb-lightning/tidb-lightning-data-source.md#parquet) + - [Customized File](/tidb-lightning/tidb-lightning-data-source.md#match-customized-files) - [Tutorial](/get-started-with-tidb-lightning.md) - [Deploy](/tidb-lightning/deploy-tidb-lightning.md) - [Configure](/tidb-lightning/tidb-lightning-configuration.md) diff --git a/tidb-lightning/tidb-lightning-data-source.md b/tidb-lightning/tidb-lightning-data-source.md new file mode 100644 index 0000000000000..93945e85dc145 --- /dev/null +++ b/tidb-lightning/tidb-lightning-data-source.md @@ -0,0 +1,326 @@ +--- +title: TiDB Lightning Data Sources +summary: Learn all the data sources supported by TiDB Lightning. +aliases: ['/docs/dev/tidb-lightning/migrate-from-csv-using-tidb-lightning/','/docs/dev/reference/tools/tidb-lightning/csv/','/tidb/dev/migrate-from-csv-using-tidb-lightning/'] +--- + +# TiDB Lightning Data Sources + +TiDB Lightning supports importing data from multiple data sources to TiDB clusters, including CSV, SQL, and Parquet files. + +To specify the data source for TiDB Lightning, use the following configuration: + +```toml +[mydumper] +# Local source data directory or the URL of the external storage such as S3. +data-source-dir = "/data/my_database" +``` + +When TiDB Lightning is running, it looks for all files that match the pattern of `data-source-dir`. + +| File | Type | Pattern | +| --------- | -------- | ------- | +| Schema file | Contains the `CREATE TABLE` DDL statement | `${db_name}.${table_name}-schema.sql` | +| Schema file | Contains the `CREATE DATABASE` DDL statement| `${db_name}-schema-create.sql` | +| Data file | If the data file contains data for a whole table, the file is imported into a table named `${db_name}.${table_name}` | \${db_name}.\${table_name}.\${csv\|sql\|parquet} | +| Data file | If the data for a table is split into multiple data files, each data file must be suffixed with a number in its filename | \${db_name}.\${table_name}.001.\${csv\|sql\|parquet} | + +TiDB Lightning processes data in parallel as much as possible. Because files must be read in sequence, the data processing concurrency is at the file level (controlled by `region-concurrency`). Therefore, when the imported file is large, the import performance is poor. It is recommended to limit the size of the imported file to no greater than 256 MiB to achieve the best performance. + +## CSV + +### Schema + +CSV files are schema-less. To import CSV files into TiDB, you must provide a table schema. You can provide schema by either of the following methods: + +* Create files named `${db_name}.${table_name}-schema.sql` and `${db_name}-schema-create.sql` that contain DDL statements. +* Manually create the table schema in TiDB. + +### Configuration + +You can configure the CSV format in the `[mydumper.csv]` section in the `tidb-lightning.toml` file. Most settings have a corresponding option in the [`LOAD DATA`](https://dev.mysql.com/doc/refman/8.0/en/load-data.html) statement of MySQL. + +```toml +[mydumper.csv] +# The field separator. Can be one or multiple characters. The default is ','. +# If the data might contain commas, it is recommended to use '|+|' or other uncommon +# character combinations as a separator. +separator = ',' +# Quoting delimiter. Empty value means no quoting. +delimiter = '"' +# Line terminator. Can be one or multiple characters. Empty value (default) means +# both "\n" (LF) and "\r\n" (CRLF) are line terminators. +terminator = '' +# Whether the CSV file contains a header. +# If `header` is true, the first line is skipped and mapped +# to the table columns. +header = true +# Whether the CSV file contains any NULL value. +# If `not-null` is true, all columns from CSV cannot be parsed as NULL. +not-null = false +# When `not-null` is false (that is, CSV can contain NULL), +# fields equal to this value will be treated as NULL. +null = '\N' +# Whether to parse backslash as escape character. +backslash-escape = true +# Whether to treat `separator` as the line terminator and trim all trailing separators. +trim-last-separator = false +``` + +If the input of a string field such as `separator`, `delimiter`, or `terminator` involves special characters, you can use a backslash to escape the special characters. The escape sequence must be a *double-quoted* string (`"…"`). For example, `separator = "\u001f"` means using the ASCII character `0X1F` as the separator. + +You can use *single-quoted* strings (`'…'`) to suppress backslash escaping. For example, `terminator = '\n'` means using the two-character string, a backslash (`\`) followed by the letter `n`, as the terminator, rather than the LF `\n`. + +For more details, see the [TOML v1.0.0 specification](https://toml.io/en/v1.0.0#string). + +#### `separator` + +- Defines the field separator. +- Can be one or multiple characters, but must not be empty. +- Common values: + + * `','` for CSV (comma-separated values). + * `"\t"` for TSV (tab-separated values). + * `"\u0001"` to use the ASCII character `0x01`. + +- Corresponds to the `FIELDS TERMINATED BY` option in the LOAD DATA statement. + +#### `delimiter` + +- Defines the delimiter used for quoting. +- If `delimiter` is empty, all fields are unquoted. +- Common values: + + * `'"'` quotes fields with double-quote. The same as [RFC 4180](https://tools.ietf.org/html/rfc4180). + * `''` disables quoting. + +- Corresponds to the `FIELDS ENCLOSED BY` option in the `LOAD DATA` statement. + +#### `terminator` + +- Defines the line terminator. +- If `terminator` is empty, both `"\n"` (Line Feed) and `"\r\n"` (Carriage Return + Line Feed) are used as the line terminator. +- Corresponds to the `LINES TERMINATED BY` option in the `LOAD DATA` statement. + +#### `header` + +- Whether *all* CSV files contain a header row. +- If `header` is `true`, the first row is used as the *column names*. If `header` is `false`, the first row is treated as an ordinary data row. + +#### `not-null` and `null` + +- The `not-null` setting controls whether all fields are non-nullable. +- If `not-null` is `false`, the string specified by `null` is transformed to the SQL NULL instead of a specific value. +- Quoting does not affect whether a field is null. + + For example, in the following CSV file: + + ```csv + A,B,C + \N,"\N", + ``` + + In the default settings (`not-null = false; null = '\N'`), the columns `A` and `B` are both converted to NULL after being imported to TiDB. The column `C` is an empty string `''` but not NULL. + +#### `backslash-escape` + +- Whether to parse backslash inside fields as escape characters. +- If `backslash-escape` is true, the following sequences are recognized and converted: + + | Sequence | Converted to | + |----------|--------------------------| + | `\0` | Null character (`U+0000`) | + | `\b` | Backspace (`U+0008`) | + | `\n` | Line feed (`U+000A`) | + | `\r` | Carriage return (`U+000D`) | + | `\t` | Tab (`U+0009`) | + | `\Z` | Windows EOF (`U+001A`) | + + In all other cases (for example, `\"`), the backslash is stripped, leaving the next character (`"`) in the field. The character left has no special roles (for example, delimiters) and is just an ordinary character. + +- Quoting does not affect whether backslash is parsed as an escape character. + +- Corresponds to the `FIELDS ESCAPED BY '\'` option in the `LOAD DATA` statement. + +#### `trim-last-separator` + +- Whether to treat `separator` as the line terminator and trim all trailing separators. + + For example, in the following CSV file: + + ```csv + A,,B,, + ``` + + - When `trim-last-separator = false`, this is interpreted as a row of 5 fields `('A', '', 'B', '', '')`. + - When `trim-last-separator = true`, this is interpreted as a row of 3 fields `('A', '', 'B')`. + +- This option is deprecated. Use the `terminator` option instead. + + If your existing configuration is: + + ```toml + separator = ',' + trim-last-separator = true + ``` + + It is recommended to change the configuration to: + + ```toml + separator = ',' + terminator = ",\n" # Use ",\n" or ",'\r\n" according to your actual file. + ``` + +#### Non-configurable options + +TiDB Lightning does not support every option supported by the `LOAD DATA` statement. For example: + +* There cannot be line prefixes (`LINES STARTING BY`). +* The header cannot be skipped (`IGNORE n LINES`) and must be valid column names. + +### Strict format + +TiDB Lightning works best when the input files have a uniform size of around 256 MiB. When the input is a single huge CSV file, TiDB Lightning can only process the file in one thread, which slows down the import speed. + +This can be fixed by splitting the CSV into multiple files first. For the generic CSV format, there is no way to quickly identify where a row starts or ends without reading the whole file. Therefore, TiDB Lightning by default does *not* automatically split a CSV file. However, if you are certain that the CSV input adheres to certain restrictions, you can enable the `strict-format` setting to allow TiDB Lightning to split the file into multiple 256 MiB-sized chunks for parallel processing. + +```toml +[mydumper] +strict-format = true +``` + +In a strict CSV file, every field occupies only a single line. In other words, one of the following must be true: + +* Delimiter is empty. +* Every field does not contain the terminator itself. In the default configuration, this means every field does not contain CR (`\r`) or LF (`\n`). + +If a CSV file is not strict, but `strict-format` is wrongly set to `true`, a field spanning multiple lines may be cut in half into two chunks, causing parse failure, or even quietly importing corrupted data. + +### Common configuration examples + +#### CSV + +The default setting is already tuned for CSV following RFC 4180. + +```toml +[mydumper.csv] +separator = ',' # If the data might contain a comma (','), it is recommended to use '|+|' or other uncommon character combinations as the separator. +delimiter = '"' +header = true +not-null = false +null = '\N' +backslash-escape = true +``` + +Example content: + +``` +ID,Region,Count +1,"East",32 +2,"South",\N +3,"West",10 +4,"North",39 +``` + +#### TSV + +```toml +[mydumper.csv] +separator = "\t" +delimiter = '' +header = true +not-null = false +null = 'NULL' +backslash-escape = false +``` + +Example content: + +``` +ID Region Count +1 East 32 +2 South NULL +3 West 10 +4 North 39 +``` + +#### TPC-H DBGEN + +```toml +[mydumper.csv] +separator = '|' +delimiter = '' +terminator = "|\n" +header = false +not-null = true +backslash-escape = false +``` + +Example content: + +``` +1|East|32| +2|South|0| +3|West|10| +4|North|39| +``` + +## SQL + +When TiDB Lightning processes a SQL file, because TiDB Lightning cannot quickly split a single SQL file, it cannot improve the import speed of a single file by increasing concurrency. Therefore, when you import data from SQL files, avoid a single huge SQL file. TiDB Lightning works best when the input files have a uniform size of around 256 MiB. + +## Parquet + +TiDB Lightning currently only supports Parquet files generated by Amazon Aurora. To identify the file structure in S3, use the following configuration to match all data files: + +``` +[[mydumper.files]] +# The expression needed for parsing Amazon Aurora parquet files +pattern = '(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$' +schema = '$1' +table = '$2' +type = '$3' +``` + +Note that this configuration only shows how to match the parquet files exported by Aurora snapshot. You need to export and process the schema file separately. + +For more information on `mydumper.files`, refer to [Match customized file](#match-customized-files). + +## Match customized files + +TiDB Lightning only recognizes data files that follow the naming pattern. In some cases, your data file might not follow the naming pattern, and thus data import is completed in a short time without importing any file. + +To resolve this issue, you can use `[[mydumper.files]]` to match data files in your customized expression. + +Take the Aurora snapshot exported to S3 as an example. The complete path of the Parquet file is `S3://some-bucket/some-subdir/some-database/some-database.some-table/part-00000-c5a881bb-58ff-4ee6-1111-b41ecff340a3-c000.gz.parquet`. + +Usually, `data-source-dir` is set to `S3://some-bucket/some-subdir/some-database/` to import the `some-database` database. + +Based on the preceding Parquet file path, you can write a regular expression like `(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$` to match the files. In the match group, `index=1` is `some-database`, `index=2` is `some-table`, and `index=3` is `parquet`. + +You can write the configuration file according to the regular expression and the corresponding index so that TiDB Lightning can recognize the data files that do not follow the default naming convention. For example: + +```toml +[[mydumper.files]] +# The expression needed for parsing the Amazon Aurora parquet file +pattern = '(?i)^(?:[^/]*/)*([a-z0-9_]+)\.([a-z0-9_]+)/(?:[^/]*/)*(?:[a-z0-9\-_.]+\.(parquet))$' +schema = '$1' +table = '$2' +type = '$3' +``` + +- **schema**: The name of the target database. The value can be: + - The group index obtained by using a regular expression, such as `$1`. + - The name of the database that you want to import, such as `db1`. All matched files are imported into `db1`. +- **table**: The name of the target table. The value can be: + - The group index obtained by using a regular expression, such as `$2`. + - The name of the table that you want to import, such as `table1`. All matched files are imported into `table1`. +- **type**: The file type. Supports `sql`, `parquet`, and `csv`. The value can be: + - The group index obtained by using a regular expression, such as `$3`. +- **key**: The file number, such as `001` in `${db_name}.${table_name}.001.csv`. + - The group index obtained by using a regular expression, such as `$4`. + +## More resources + +- [Export to CSV files Using Dumpling](/dumpling-overview.md#export-to-csv-files) +- [`LOAD DATA`](https://dev.mysql.com/doc/refman/8.0/en/load-data.html)