Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/tidb-lightning: document backend and that system DBs are filtered #1620

Merged
merged 15 commits into from
Nov 15, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dev/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ If you want to download the latest version of [TiDB Lightning](/dev/reference/to

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-toolkit-latest-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |
| [tidb-toolkit-latest-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-latest-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-latest-linux-amd64.sha256) |

## DM (Data Migration)

Expand Down
229 changes: 229 additions & 0 deletions dev/reference/tools/tidb-lightning/backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
---
title: TiDB Lightning Back End
summary: Choose how to write data into the TiDB cluster
kennytm marked this conversation as resolved.
Show resolved Hide resolved
category: reference
---

# TiDB Lightning Back End
kennytm marked this conversation as resolved.
Show resolved Hide resolved

TiDB Lightning supports two back ends: "Importer" and "TiDB". It determines how `tidb-lightning` delivers data into the target cluster.

The "Importer" back end (default) requires `tidb-lightning` to first encode the SQL/CSV data into KV pairs, and relies on the external `tikv-importer` program to sort these KV pairs and ingest directly into the TiKV nodes.

The "TiDB" back end requires `tidb-lightning` to encode these data into SQL `INSERT` statements, and have these executed directly on the TiDB node.

| Back end | "Importer" | "TiDB" |
|---|---|---|
kennytm marked this conversation as resolved.
Show resolved Hide resolved
| Speed | Fast (~300 GB/hr) | Slow (~50 GB/hr) |
| Resource usage | High | Low |
| ACID respected while importing | No | Yes |
| Target tables | Must be empty | Can be populated |

## Deployment for "TiDB" back end

When using the "TiDB" back end, you no longer need `tikv-importer`. Compared with the [standard deployment procedure](/dev/reference/tools/tidb-lightning/deployment.md),

* steps involving `tikv-importer` can all be skipped
* the configuration must be changed to indicate "TiDB" back end is used

### Ansible deployment

1. The `[importer_server]` section in `inventory.ini` can be left blank.

```ini
...

[importer_server]
# keep empty

[lightning_server]
192.168.20.10

...
```

2. The `tikv_importer_port` setting in `group_vars/all.yml` is ignored, and the file `group_vars/importer_server.yml` does not need to be changed.

But you need to edit `conf/tidb-lightning.yml` and change the `backend` setting to `tidb`.

```yaml
...
tikv_importer:
backend: "tidb" # <-- change this
...
```

3. Bootstrap and deploy the cluster as usual.

4. Mount the data source for TiDB Lightning as usual.

5. No need to start `tikv-importer`.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

6. Start `tidb-lightning` as usual.

7. No need to stop `tikv-importer`.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

### Manual deployment

There is no need to download and configure `tikv-importer`.
kennytm marked this conversation as resolved.
Show resolved Hide resolved

Before running `tidb-lightning`, include the following into the configuration file:
kennytm marked this conversation as resolved.
Show resolved Hide resolved

```toml
[tikv-importer]
backend = "tidb"
```

or supplying the `--backend tidb` arguments when executing `tidb-lightning`.

## Conflict resolution

The "TiDB" back end supports importing to an already-populated table. However, the new data may cause unique key conflict with the old data. You can control how to resolve the conflict by this task configuration

```toml
[tikv-importer]
backend = "tidb"
on-duplicate = "replace" # or "error" or "ignore"
```

| Setting | Behavior on conflict | Equivalent SQL statement |
|---|---|---|
kennytm marked this conversation as resolved.
Show resolved Hide resolved
| replace | New entries replace old ones | `REPLACE INTO ...` |
| ignore | Keep old entries and ignore new ones | `INSERT IGNORE INTO ...` |
| error | Abort import | `INSERT INTO ...` |

## Migrating from Loader to TiDB Lightning "TiDB" back end

TiDB Lightning using "TiDB" back end can completely replace functions of [Loader](/dev/reference/tools/loader.md). The following lists how to translate Loader configurations into [TiDB Lightning configurations](/dev/reference/tools/tidb-lightning/config.md).

<table>
<thead><tr><th>Loader</th><th>TiDB Lightning</th></tr></thread>
<tbody>
<tr><td>

```toml

# logging
log-level = "info"
log-file = "loader.log"

# Prometheus
status-addr = ":8272"

# concurrency
pool-size = 16
```

</td><td>

```toml
[lightning]
# logging
level = "info"
file = "tidb-lightning.log"

# Prometheus
pprof-port = 8289

# concurrency (better left as default)
#region-concurrency = 16
```

</td></tr>
<tr><td>

```toml

# checkpoint database

checkpoint-schema = "tidb_loader"






```

</td><td>

```toml
[checkpoint]
# checkpoint storage
enable = true
schema = "tidb_lightning_checkpoint"
# by default the checkpoint is stored in
# a local file, which is more efficient.
# but you could still choose to store the
# checkpoints in the target database with
# this setting:
#driver = "mysql"
```

</td></tr>
<tr><td>

```toml



```

</td><td>

```toml
[tikv-importer]
# use the "TiDB" back end
backend = "tidb"
```

</td></tr>
<tr><td>

```toml

# data source directory
dir = "/data/export/"
```

</td><td>

```toml
[mydumper]
# data source directory
data-source-dir = "/data/export"
```

</td></tr>

<tr><td>

```toml
[db]
# TiDB connection parameters
host = "127.0.0.1"
port = 4000

user = "root"
password = ""

#sql-mode = ""
```

</td><td>

```toml
[tidb]
# TiDB connection parameters
host = "127.0.0.1"
port = 4000
status-port = 10080 # <- this is required
user = "root"
password = ""

#sql-mode = ""
```

</td></tr>
</tbody>
</table>
10 changes: 9 additions & 1 deletion dev/reference/tools/tidb-lightning/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,15 @@ driver = "file"
#keep-after-success = false

[tikv-importer]
# The listening address of tikv-importer. Change it to the actual address.
# Delivery back end, can be "importer" or "tidb".
#backend = "importer"
kennytm marked this conversation as resolved.
Show resolved Hide resolved
# The listening address of tikv-importer when back end is "importer". Change it to the actual address.
addr = "172.16.31.10:8287"
# Action to do when trying to insert a duplicated entry in the "tidb" back end.
# - replace: new entry replaces existing entry
# - ignore: keep existing entry, ignore new entry
# - error: report error and quit the program
#on-duplicate = "replace"

[mydumper]
# Block size for file reading. Keep it longer than the longest string of
Expand Down Expand Up @@ -288,6 +295,7 @@ min-available-ratio = 0.05
| -V | Prints program version | |
| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| --backend *backend* | [Delivery back end](/dev/reference/tools/tidb-lightning/backend.md) (`importer` or `tidb`) | `tikv-importer.backend` |
| --log-file *file* | Log file path | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
Expand Down
4 changes: 3 additions & 1 deletion dev/reference/tools/tidb-lightning/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ category: reference

# TiDB Lightning Deployment

This document describes the hardware requirements of TiDB Lightning on separate deployment and mixed deployment, and how to deploy it using Ansible or manually.
This document describes the hardware requirements of TiDB Lightning using the default "Importer" back end, and how to deploy it using Ansible or manually.

If you wish to use the "TiDB" back end, also read [TiDB Lightning Back End](/dev/reference/tools/tidb-lightning/backend.md) for the changes to the deployment steps.

## Notes

Expand Down
2 changes: 2 additions & 0 deletions dev/reference/tools/tidb-lightning/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ The complete import process is as follows:
The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/dev/reference/mysql-compatibility.md#auto-increment-id).

7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.

TiDB Lightning also supports using "TiDB" instead of "Importer" as the back end. In this configuration, `tidb-lightning` transforms data into SQL `INSERT` statements and directly execute them on the target cluster, similar to Loader. See [TiDB Lightning Back End](/dev/reference/tools/tidb-lightning/backend.md) for details.
4 changes: 4 additions & 0 deletions dev/reference/tools/tidb-lightning/table-filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ ignore-dbs = ["pattern4", "pattern5"]

The pattern can either be a simple name, or a regular expression in [Go dialect](https://golang.org/pkg/regexp/syntax/#hdr-syntax) if it starts with a `~` character.

>**Note:**
kennytm marked this conversation as resolved.
Show resolved Hide resolved
>
> The system databases `INFORMATION_SCHEMA`, `PERFORMANCE_SCHEMA`, `mysql` and `sys` are always black-listed regardless of the table filter settings.

## Filtering tables

```toml
Expand Down
2 changes: 1 addition & 1 deletion v2.1/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In addition, the Kafka version of TiDB Binlog is also provided.

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-v2.1.16-linux-amd64.tar.gz](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.16-linux-amd64.sha256](http://download.pingcap.org/tidb-v2.1.16-linux-amd64.sha256)|
| [tidb-v2.1.17-linux-amd64.tar.gz](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.tar.gz) (TiDB Binlog, TiDB Lightning) | Linux | amd64 |[tidb-v2.1.17-linux-amd64.sha256](https://download.pingcap.org/tidb-v2.1.17-linux-amd64.sha256)|
| [tidb-binlog-kafka-linux-amd64.tar.gz](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.tar.gz) (the Kafka version of TiDB Binlog) | Linux | amd64 |[tidb-binlog-kafka-linux-amd64.sha256](http://download.pingcap.org/tidb-binlog-kafka-linux-amd64.sha256)|

## DM (Data Migration)
Expand Down
2 changes: 1 addition & 1 deletion v3.0/reference/tools/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you want to download the 3.0 version of [TiDB Lightning](/v3.0/reference/tool

| Package name | OS | Architecture | SHA256 checksum |
|:---|:---|:---|:---|
| [tidb-toolkit-v3.0.3-linux-amd64.tar.gz](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.3-linux-amd64.sha256](http://download.pingcap.org/tidb-toolkit-v3.0.3-linux-amd64.sha256) |
| [tidb-toolkit-v3.0.5-linux-amd64.tar.gz](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.tar.gz) | Linux | amd64 | [tidb-toolkit-v3.0.5-linux-amd64.sha256](https://download.pingcap.org/tidb-toolkit-v3.0.5-linux-amd64.sha256) |

## DM (Data Migration)

Expand Down
Loading