title | summary | category | aliases | |
---|---|---|---|---|
TiDB-Lightning Overview |
Learn about Lightning and the whole architecture. |
reference |
|
TiDB-Lightning is a tool used for fast full import of large amounts of data into a TiDB cluster. Currently, TiDB-Lightning supports reading SQL dump exported via mydumper or CSV data source. You can use it in the following two scenarios:
- Importing large amounts of new data quickly
- Back up and restore all the data
The TiDB-Lightning tool set consists of two components:
-
tidb-lightning
(the "front end") reads the data source and imports the database structure into the TiDB cluster, and also transforms the data into Key-Value (KV) pairs and sends them totikv-importer
. -
tikv-importer
(the "back end") combines and sorts the KV pairs and then imports these sorted pairs as a whole into the TiKV cluster.
The complete import process is as follows:
-
Before importing,
tidb-lightning
switches the TiKV cluster to "import mode", which optimizes the cluster for writing and disables automatic compaction. -
tidb-lightning
creates the skeleton of all tables from the data source. -
Each table is split into multiple continuous batches, so that data from a huge table (200 GB+) can be delivered incrementally.
-
For each batch,
tidb-lightning
informstikv-importer
via gRPC to create engine files to store KV pairs.tidb-lightning
then reads the data source in parallel, transforms each row into KV pairs according to the TiDB rules, and sends them totikv-importer
's engine files. -
Once a complete engine file is written,
tikv-importer
divides and schedules these data and imports them into the target TiKV cluster.There are two kinds of engine files: data engines and index engines, each corresponding to two kinds of KV pairs: the row data and secondary indices. Normally, the row data are entirely sorted in the data source, while the secondary indices are out of order. Because of this, the data engines are uploaded as soon as a batch is completed, while the index engines are imported only after all batches of the entire table are encoded.
-
After all engines associated to a table are imported,
tidb-lightning
performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process, and tells TiDB toANALYZE
all imported tables, to prepare for optimal query planning. -
Finally,
tidb-lightning
switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.