From bb7a81ce7c190065b67d9d79e15370bfdbc05fe7 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 4 Dec 2020 16:53:07 +0800 Subject: [PATCH 1/6] ticdc: doc updates for 4.0.9 --- ticdc/manage-ticdc.md | 9 +++++++++ ticdc/troubleshoot-ticdc.md | 13 ++++++++----- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index d755cdbb0c39b..8e0bf82bb36d6 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -98,6 +98,10 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time": - `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time. - `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data. +- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must output sorted data changes. This item supports `memory`/`unified`/`file`. + - `memory`: Sorts data changes in memory. It is recommended to `memory` in a production environment. + - `unified`: An experimental feature introduced in v4.0.9. When `unified` is used, TiCDC prioritizes data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is **NOT** recommended to use it in a production environment unless `memory` cannot be used due to insufficient memory. + - `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is not recommended to use it. - `--config`: Specifies the configuration file of the `changefeed`. #### Configure sink URI with `mysql`/`tidb` @@ -148,10 +152,15 @@ The following are descriptions of parameters and parameter values that can be co | `max-message-bytes` | The maximum size of data that is sent to Kafka broker each time (optional, `64MB` by default) | | `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default) | | `protocol` | The protocol with which messages are output to Kafka. The value options are `default`, `canal`, `avro`, and `maxwell` (`default` by default) | +| `max-batch-size` | New in v4.0.9. If the message protocol supports outputting multiple data changes to one Kafka message, this parameter specifies the maximum number of allowable data changes in one Kafka message and currently takes effect only when Kafka's `protocol` is `default`. (optional, `4096` by default) | | `ca` | The path of the CA certificate file needed to connect to the downstream Kafka instance (optional) | | `cert` | The path of the certificate file needed to connect to the downstream Kafka instance (optional) | | `key` | The path of the certificate key file needed to connect to the downstream Kafka instance (optional) | +> **Note:** +> +> When `protocol` is `default`, TiCDC tries to avoid generating messages that exceed `max-message-bytes` in length. However, if a data change needs to exceed `max-message-bytes`, to avoid silent failure, TiCDC tries to output this message and prints a warning in the log. + #### Integrate TiCDC with Kafka Connect (Confluent Platform) > **Note:** diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 249a3e3b964ff..62394bd01379f 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -45,24 +45,27 @@ A replication task might be interrupted in the following known scenarios: - Execute `cdc cli changefeed list` and `cdc cli changefeed query` to check the status of the replication task. `stopped` means the task has stopped and the `error` item provides the detailed error information. After the error occurs, you can search `error on running processor` in the TiCDC server log to see the error stack for troubleshooting. - In some extreme cases, the TiCDC service is restarted. You can search the `FATAL` level log in the TiCDC server log for troubleshooting. -## What is `gc-ttl` and file sorting in TiCDC? +## What is `gc-ttl` in TiCDC? Since v4.0.0-rc.1, PD supports external services in setting the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted. When starting the TiCDC server, you can specify the Time To Live (TTL) duration of GC safepoint through `gc-ttl`, which means the longest time that data is retained within the GC safepoint. This value is set by TiCDC in PD, which is 86,400 seconds by default. -If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable the file sorting feature of TiCDC that uses system files for sorting. To enable this feature, pass `--sort-engine=file` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example: +## How do I handle the OOM that occurs after TiCDC is restarted after a task interruption? + +If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable unified sorter, TiCDC's experimental sorting engine. This engine sorts data in the disk when the memory is insufficient. To enable this feature, pass `--sort-engine=unified` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="file" --sort-dir="/data/cdc/sort" +cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="unified" --sort-dir="/data/cdc/sort" ``` > **Note:** > -> + TiCDC (the 4.0 version) does not support dynamically modifying the file sorting and memory sorting yet. -> + Currently, the file sorting feature only has limited processing capacity. If the data size of a single table is too large and causes the file sorting to fail, you can modify the task configuration of TiCDC to filter out this table and use other backup and restore tools (such as [BR](/br/backup-and-restore-tool.md)) to restore the table before you resume replicating the table. +> + Since v4.0.9, TiCDC supports the unified sorter engine. +> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. +> + Currently, unified sorter is an experimental feature. When many tables exist (>=100), unified sorter might cause performance issues and affect the replication speed. Therefore, it is not recommended to use it in a production environment. Before you enable unified sorter, make sure that the machine of each TiCDC node have enough disk capacity. If the total data size to be accumulated might exceed 1 TB, it is not recommend to use TiCDC for replication. ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? From 9c5c29ce3bf7daea73479ddefa573c8a5f4bffe3 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Mon, 7 Dec 2020 13:50:11 +0800 Subject: [PATCH 2/6] Apply suggestions from code review Co-authored-by: Ran --- ticdc/manage-ticdc.md | 6 +++--- ticdc/troubleshoot-ticdc.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 8e0bf82bb36d6..9220d35801ac2 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -98,8 +98,8 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time": - `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time. - `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data. -- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must output sorted data changes. This item supports `memory`/`unified`/`file`. - - `memory`: Sorts data changes in memory. It is recommended to `memory` in a production environment. +- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must output sorted data changes. This option supports `memory`/`unified`/`file`. + - `memory`: Sorts data changes in memory. It is recommended to use `memory` in a production environment. - `unified`: An experimental feature introduced in v4.0.9. When `unified` is used, TiCDC prioritizes data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is **NOT** recommended to use it in a production environment unless `memory` cannot be used due to insufficient memory. - `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is not recommended to use it. - `--config`: Specifies the configuration file of the `changefeed`. @@ -152,7 +152,7 @@ The following are descriptions of parameters and parameter values that can be co | `max-message-bytes` | The maximum size of data that is sent to Kafka broker each time (optional, `64MB` by default) | | `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default) | | `protocol` | The protocol with which messages are output to Kafka. The value options are `default`, `canal`, `avro`, and `maxwell` (`default` by default) | -| `max-batch-size` | New in v4.0.9. If the message protocol supports outputting multiple data changes to one Kafka message, this parameter specifies the maximum number of allowable data changes in one Kafka message and currently takes effect only when Kafka's `protocol` is `default`. (optional, `4096` by default) | +| `max-batch-size` | New in v4.0.9. If the message protocol supports outputting multiple data changes to one Kafka message, this parameter specifies the maximum number of data changes in one Kafka message. It currently takes effect only when Kafka's `protocol` is `default`. (optional, `4096` by default) | | `ca` | The path of the CA certificate file needed to connect to the downstream Kafka instance (optional) | | `cert` | The path of the certificate file needed to connect to the downstream Kafka instance (optional) | | `key` | The path of the certificate key file needed to connect to the downstream Kafka instance (optional) | diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 62394bd01379f..9542d7b3334c8 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -65,7 +65,7 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235 > > + Since v4.0.9, TiCDC supports the unified sorter engine. > + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. -> + Currently, unified sorter is an experimental feature. When many tables exist (>=100), unified sorter might cause performance issues and affect the replication speed. Therefore, it is not recommended to use it in a production environment. Before you enable unified sorter, make sure that the machine of each TiCDC node have enough disk capacity. If the total data size to be accumulated might exceed 1 TB, it is not recommend to use TiCDC for replication. +> + Currently, the unified sorter is an experimental feature. When many tables exist (>=100), the unified sorter might cause performance issues and affect the replication speed. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total data size to be accumulated might exceed 1 TB, it is not recommend to use TiCDC for replication. ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? From b3f7f0321d85eaad12576570e39f42d7f8ef7549 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Mon, 14 Dec 2020 16:50:08 +0800 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: Zixiong Liu --- ticdc/manage-ticdc.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 9220d35801ac2..8015158fb01e7 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -98,9 +98,9 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time": - `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time. - `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data. -- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must output sorted data changes. This option supports `memory`/`unified`/`file`. +- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must sort the data changes before writing them to the sink. This option supports `memory`/`unified`/`file`. - `memory`: Sorts data changes in memory. It is recommended to use `memory` in a production environment. - - `unified`: An experimental feature introduced in v4.0.9. When `unified` is used, TiCDC prioritizes data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is **NOT** recommended to use it in a production environment unless `memory` cannot be used due to insufficient memory. + - `unified`: An experimental feature introduced in v4.0.9. When `unified` is used, TiCDC prefers data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is **NOT** recommended to use it in a production environment unless `memory` cannot be used due to insufficient memory. - `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is not recommended to use it. - `--config`: Specifies the configuration file of the `changefeed`. From 9200f71d2dce49a88593549987a86dbc4f6b1e2f Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Mon, 14 Dec 2020 16:59:13 +0800 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Zixiong Liu --- ticdc/manage-ticdc.md | 2 +- ticdc/troubleshoot-ticdc.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/ticdc/manage-ticdc.md b/ticdc/manage-ticdc.md index 8015158fb01e7..241e1ec6c72f8 100644 --- a/ticdc/manage-ticdc.md +++ b/ticdc/manage-ticdc.md @@ -159,7 +159,7 @@ The following are descriptions of parameters and parameter values that can be co > **Note:** > -> When `protocol` is `default`, TiCDC tries to avoid generating messages that exceed `max-message-bytes` in length. However, if a data change needs to exceed `max-message-bytes`, to avoid silent failure, TiCDC tries to output this message and prints a warning in the log. +> When `protocol` is `default`, TiCDC tries to avoid generating messages that exceed `max-message-bytes` in length. However, if a row is so large that a single change alone exceeds `max-message-bytes` in length , to avoid silent failure, TiCDC tries to output this message and prints a warning in the log. #### Integrate TiCDC with Kafka Connect (Confluent Platform) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 9542d7b3334c8..1285623c85963 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -53,7 +53,7 @@ When starting the TiCDC server, you can specify the Time To Live (TTL) duration ## How do I handle the OOM that occurs after TiCDC is restarted after a task interruption? -If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable unified sorter, TiCDC's experimental sorting engine. This engine sorts data in the disk when the memory is insufficient. To enable this feature, pass `--sort-engine=unified` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example: +If the replication task is interrupted for a long time and a large volume of new data has been written to TiDB, Out of Memory (OOM) might occur when TiCDC is restarted. In this situation, you can enable unified sorter, TiCDC's experimental sorting engine. This engine sorts data in the disk when the memory is insufficient. To enable this feature, pass `--sort-engine=unified` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example: {{< copyable "shell-regular" >}} @@ -65,7 +65,7 @@ cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235 > > + Since v4.0.9, TiCDC supports the unified sorter engine. > + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. -> + Currently, the unified sorter is an experimental feature. When many tables exist (>=100), the unified sorter might cause performance issues and affect the replication speed. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total data size to be accumulated might exceed 1 TB, it is not recommend to use TiCDC for replication. +> + Currently, the unified sorter is an experimental feature. When the number of tables is too large (>=100), the unified sorter might cause performance issues and affect replication throughput. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total size of unprocessed data changes might exceed 1 TB, it is not recommend to use TiCDC for replication. ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? From e675a37b76637828a77c1df5e6acb0803231d620 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:33:21 +0800 Subject: [PATCH 5/6] Apply suggestions from code review Co-authored-by: Zixiong Liu --- ticdc/troubleshoot-ticdc.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index 1d816d78593b8..ee1f21622856e 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -58,13 +58,13 @@ If the replication task is interrupted for a long time and a large volume of new {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="unified" --sort-dir="/data/cdc/sort" +cdc cli changefeed update -c [changefeed-id] --sort-engine="unified" --sort-dir="/data/cdc/sort" ``` > **Note:** > > + Since v4.0.9, TiCDC supports the unified sorter engine. -> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. +> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. Make sure that the changefeed has paused before modifying the sorter settings. > + Currently, the unified sorter is an experimental feature. When the number of tables is too large (>=100), the unified sorter might cause performance issues and affect replication throughput. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total size of unprocessed data changes might exceed 1 TB, it is not recommend to use TiCDC for replication. ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL? From 1ca11bbd856483ae21c97047bbf3dd187d1607d7 Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 18 Dec 2020 20:45:21 +0800 Subject: [PATCH 6/6] Update ticdc/troubleshoot-ticdc.md --- ticdc/troubleshoot-ticdc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index ee1f21622856e..393864eb7c0fe 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -64,7 +64,7 @@ cdc cli changefeed update -c [changefeed-id] --sort-engine="unified" --sort-dir= > **Note:** > > + Since v4.0.9, TiCDC supports the unified sorter engine. -> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. Make sure that the changefeed has paused before modifying the sorter settings. +> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet. Make sure that the changefeed has stopped before modifying the sorter settings. > + Currently, the unified sorter is an experimental feature. When the number of tables is too large (>=100), the unified sorter might cause performance issues and affect replication throughput. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total size of unprocessed data changes might exceed 1 TB, it is not recommend to use TiCDC for replication. ## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL?