Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ticdc: doc updates for 4.0.9 #4334

Merged
merged 9 commits into from
Dec 21, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions ticdc/manage-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@ Info: {"sink-uri":"mysql://root:123456@127.0.0.1:3306/","opts":{},"create-time":

- `--start-ts`: Specifies the starting TSO of the `changefeed`. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time.
- `--target-ts`: Specifies the ending TSO of the `changefeed`. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data.
- `--sort-engine`: Specifies the sorting engine for the `changefeed`. Because TiDB and TiKV adopt distributed architectures, TiCDC must output sorted data changes. This option supports `memory`/`unified`/`file`.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
- `memory`: Sorts data changes in memory. It is recommended to use `memory` in a production environment.
- `unified`: An experimental feature introduced in v4.0.9. When `unified` is used, TiCDC prioritizes data sorting in memory. If the memory is insufficient, TiCDC automatically uses the disk to store the temporary data. It is **NOT** recommended to use it in a production environment unless `memory` cannot be used due to insufficient memory.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
- `file`: Entirely uses the disk to store the temporary data. This feature is **deprecated**. It is not recommended to use it.
- `--config`: Specifies the configuration file of the `changefeed`.

#### Configure sink URI with `mysql`/`tidb`
Expand Down Expand Up @@ -148,10 +152,15 @@ The following are descriptions of parameters and parameter values that can be co
| `max-message-bytes` | The maximum size of data that is sent to Kafka broker each time (optional, `64MB` by default) |
| `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default) |
| `protocol` | The protocol with which messages are output to Kafka. The value options are `default`, `canal`, `avro`, and `maxwell` (`default` by default) |
| `max-batch-size` | New in v4.0.9. If the message protocol supports outputting multiple data changes to one Kafka message, this parameter specifies the maximum number of data changes in one Kafka message. It currently takes effect only when Kafka's `protocol` is `default`. (optional, `4096` by default) |
| `ca` | The path of the CA certificate file needed to connect to the downstream Kafka instance (optional) |
| `cert` | The path of the certificate file needed to connect to the downstream Kafka instance (optional) |
| `key` | The path of the certificate key file needed to connect to the downstream Kafka instance (optional) |

> **Note:**
>
> When `protocol` is `default`, TiCDC tries to avoid generating messages that exceed `max-message-bytes` in length. However, if a data change needs to exceed `max-message-bytes`, to avoid silent failure, TiCDC tries to output this message and prints a warning in the log.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

#### Integrate TiCDC with Kafka Connect (Confluent Platform)

> **Note:**
Expand Down
13 changes: 8 additions & 5 deletions ticdc/troubleshoot-ticdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,24 +45,27 @@ A replication task might be interrupted in the following known scenarios:
- Execute `cdc cli changefeed list` and `cdc cli changefeed query` to check the status of the replication task. `stopped` means the task has stopped and the `error` item provides the detailed error information. After the error occurs, you can search `error on running processor` in the TiCDC server log to see the error stack for troubleshooting.
- In some extreme cases, the TiCDC service is restarted. You can search the `FATAL` level log in the TiCDC server log for troubleshooting.

## What is `gc-ttl` and file sorting in TiCDC?
## What is `gc-ttl` in TiCDC?

Since v4.0.0-rc.1, PD supports external services in setting the service-level GC safepoint. Any service can register and update its GC safepoint. PD ensures that the key-value data smaller than this GC safepoint is not cleaned by GC. Enabling this feature in TiCDC ensures that the data to be consumed by TiCDC is retained in TiKV without being cleaned by GC when the replication task is unavailable or interrupted.

When starting the TiCDC server, you can specify the Time To Live (TTL) duration of GC safepoint through `gc-ttl`, which means the longest time that data is retained within the GC safepoint. This value is set by TiCDC in PD, which is 86,400 seconds by default.

If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable the file sorting feature of TiCDC that uses system files for sorting. To enable this feature, pass `--sort-engine=file` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example:
## How do I handle the OOM that occurs after TiCDC is restarted after a task interruption?

If the replication task is interrupted for a long time and a large volume of unconsumed data is accumulated, Out of Memory (OOM) might occur when TiCDC is started. In this situation, you can enable unified sorter, TiCDC's experimental sorting engine. This engine sorts data in the disk when the memory is insufficient. To enable this feature, pass `--sort-engine=unified` and `--sort-dir=/path/to/sort_dir` to the `cdc cli` command when creating a replication task. For example:
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

{{< copyable "shell-regular" >}}

```shell
cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="file" --sort-dir="/data/cdc/sort"
cdc cli changefeed create --pd=http://10.0.10.25:2379 --start-ts=415238226621235200 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --sort-engine="unified" --sort-dir="/data/cdc/sort"
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
```

> **Note:**
>
> + TiCDC (the 4.0 version) does not support dynamically modifying the file sorting and memory sorting yet.
> + Currently, the file sorting feature only has limited processing capacity. If the data size of a single table is too large and causes the file sorting to fail, you can modify the task configuration of TiCDC to filter out this table and use other backup and restore tools (such as [BR](/br/backup-and-restore-tool.md)) to restore the table before you resume replicating the table.
> + Since v4.0.9, TiCDC supports the unified sorter engine.
> + TiCDC (the 4.0 version) does not support dynamically modifying the sorting engine yet.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved
> + Currently, the unified sorter is an experimental feature. When many tables exist (>=100), the unified sorter might cause performance issues and affect the replication speed. Therefore, it is not recommended to use it in a production environment. Before you enable the unified sorter, make sure that the machine of each TiCDC node has enough disk capacity. If the total data size to be accumulated might exceed 1 TB, it is not recommend to use TiCDC for replication.
TomShawn marked this conversation as resolved.
Show resolved Hide resolved

## How do I handle the `Error 1298: Unknown or incorrect time zone: 'UTC'` error when creating the replication task or replicating data to MySQL?

Expand Down