Skip to content

Commit

Permalink
update MD by dispatch event pingcap/docs release-8.5
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions committed Jan 9, 2025
1 parent c63ddf1 commit 8410411
Show file tree
Hide file tree
Showing 6 changed files with 319 additions and 13 deletions.
2 changes: 2 additions & 0 deletions markdown-pages/en/tidb/release-8.5/TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,7 @@
- [Use Load Base Split](/configure-load-base-split.md)
- [Use Store Limit](/configure-store-limit.md)
- [DDL Execution Principles and Best Practices](/ddl-introduction.md)
- [Batch Processing](/batch-processing.md)
- Use PD Microservices
- [PD Microservices Overview](/pd-microservices.md)
- [Scale PD Microservice Nodes Using TiUP](/scale-microservices-using-tiup.md)
Expand Down Expand Up @@ -938,6 +939,7 @@
- [Optimistic Transactions](/optimistic-transaction.md)
- [Pessimistic Transactions](/pessimistic-transaction.md)
- [Non-Transactional DML Statements](/non-transactional-dml.md)
- [Pipelined DML](/pipelined-dml.md)
- [Views](/views.md)
- [Partitioning](/partitioned-table.md)
- [Temporary Tables](/temporary-tables.md)
Expand Down
108 changes: 108 additions & 0 deletions markdown-pages/en/tidb/release-8.5/batch-processing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: Batch Processing
summary: Introduce batch processing features in TiDB, including Pipelined DML, non-transactional DML, the `IMPORT INTO` statement, and the deprecated batch-dml feature.
---

# Batch Processing

Batch processing is a common and essential operation in real-world scenarios. It enables efficient handling of large datasets for tasks such as data migration, bulk imports, archiving, and large-scale updates.

To optimize performance for batch operations, TiDB introduces various features over its version evolution:

- Data import
- `IMPORT INTO` statement (introduced in TiDB v7.2.0 and GA in v7.5.0)
- Data inserts, updates, and deletions
- Pipelined DML (experimental, introduced in TiDB v8.0.0)
- Non-transactional DML (introduced in TiDB v6.1.0)
- Batch-dml (deprecated)

This document outlines the key benefits, limitations, and use cases of these features to help you choose the most suitable solution for efficient batch processing.

## Data import

The `IMPORT INTO` statement is designed for data import tasks. It enables you to quickly import data in formats such as CSV, SQL, or PARQUET into an empty TiDB table, without the need to deploy [TiDB Lightning](https://docs.pingcap.com/tidb/stable/tidb-lightning-overview) separately.

### Key benefits

- Extremely fast import speed
- Easier to use compared to TiDB Lightning

### Limitations

<CustomContent platform="tidb">

- No transactional [ACID](/glossary.md#acid) guarantees
- Subject to various usage restrictions

</CustomContent>

<CustomContent platform="tidb-cloud">

- No transactional [ACID](/tidb-cloud/tidb-cloud-glossary.md#acid) guarantees
- Subject to various usage restrictions

</CustomContent>

### Use cases

- Suitable for data import scenarios such as data migration or recovery. It is recommended to use `IMPORT INTO` instead of TiDB Lightning where applicable.

For more information, see [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md).

## Data inserts, updates, and deletions

### Pipelined DML

Pipelined DML is an experimental feature introduced in TiDB v8.0.0. In v8.5.0, the feature is enhanced with significant performance improvements.

#### Key benefits

- Streams data to the storage layer during transaction execution instead of buffering it entirely in memory, allowing transaction size no longer limited by TiDB memory and supporting ultra-large-scale data processing
- Achieves better performance compared to standard DML
- Can be enabled through system variables without SQL modifications

#### Limitations

- Only supports [autocommit](/transaction-overview.md#autocommit) `INSERT`, `REPLACE`, `UPDATE`, and `DELETE` statements.

#### Use cases

- Suitable for general batch processing tasks, such as bulk data inserts, updates, and deletions.

For more information, see [Pipelined DML](/pipelined-dml.md).

### Non-transactional DML statements

Non-transactional DML is introduced in TiDB v6.1.0. Initially, only the `DELETE` statement supports this feature. Starting from v6.5.0, `INSERT`, `REPLACE`, and `UPDATE` statements also support this feature.

#### Key benefits

- Splits a single SQL statement into multiple smaller statements, bypassing memory limitations.
- Achieves performance that is slightly faster or comparable to standard DML.

#### Limitations

- Only supports [autocommit](/transaction-overview.md#autocommit) statements
- Requires modifications to SQL statements
- Imposes strict requirements on SQL syntax; some statements might need rewriting
- Lacks full transactional ACID guarantees; in case of failures, partial execution of a statement might occur

#### Use cases

- Suitable for scenarios involving bulk data inserts, updates, and deletions. Due to its limitations, it is recommended to consider non-transactional DML only when Pipelined DML is not applicable.

For more information, see [Non-transactional DML](/non-transactional-dml.md).

### Deprecated batch-dml feature

The batch-dml feature, available in TiDB versions prior to v4.0, is now deprecated and no longer recommended. This feature is controlled by the following system variables:

- `tidb_batch_insert`
- `tidb_batch_delete`
- `tidb_batch_commit`
- `tidb_enable_batch_dml`
- `tidb_dml_batch_size`

Due to the risk of data corruption or loss caused by inconsistent data and indexes, these variables have been deprecated and are planned for removal in future releases.

It is **NOT RECOMMENDED** to use the deprecated batch-dml feature under any circumstances. Instead, consider other alternative features outlined in this document.
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,47 @@ Before performing backup and restore, BR compares the TiDB cluster version with

Starting from v7.0.0, TiDB gradually supports performing backup and restore operations through SQL statements. Therefore, it is strongly recommended to use the BR tool of the same major version as the TiDB cluster when backing up and restoring cluster data, and avoid performing data backup and restore operations across major versions. This helps ensure smooth execution of restore operations and data consistency. Starting from v7.6.0, BR restores data in some `mysql` system tables by default, that is, the `--with-sys-table` option is set to `true` by default. When restoring data to a TiDB cluster with a different version, if you encounter an error similar to `[BR:Restore:ErrRestoreIncompatibleSys]incompatible system table` due to different schemas of system tables, you can set `--with-sys-table=false` to skip restoring the system tables and avoid this error.

#### BR version compatibility matrix before TiDB v6.6.0

The compatibility information for BR before TiDB v6.6.0 is as follows:

| Backup version (vertical) \ Restore version (horizontal) | Restore to TiDB v6.0 | Restore to TiDB v6.1 | Restore to TiDB v6.2 | Restore to TiDB v6.3, v6.4, or v6.5 | Restore to TiDB v6.6 |
| ---- | ---- | ---- | ---- | ---- | ---- |
| TiDB v6.0, v6.1, v6.2, v6.3, v6.4, or v6.5 snapshot backup | Compatible (known issue [#36379](https://github.com/pingcap/tidb/issues/36379): if backup data contains an empty schema, BR might report an error.) | Compatible | Compatible | Compatible | Compatible (BR must be v6.6) |
| TiDB v6.3, v6.4, v6.5, or v6.6 log backup| Incompatible | Incompatible | Incompatible | Compatible | Compatible |

#### BR version compatibility matrix between TiDB v6.5.0 and v8.5.0

This section introduces the BR compatibility information for all [Long-Term Support (LTS)](/releases/versioning.md#long-term-support-releases) versions between TiDB v6.5.0 and v8.5.0 (including v6.5.0, v7.1.0, v7.5.0, v8.1.0, and v8.5.0):

> **Note:**
>
> Known issue: in v7.2.0, some system table fields are changed to case-sensitive, which might cause cross-version backup and restore failures. For more details, see [issue #43717](https://github.com/pingcap/tidb/issues/43717).
The following table lists the compatibility matrix for full backups:

| Backup version | Compatible restore versions | Incompatible restore versions |
|:---------|:----------------|:------------------|
| v6.5.0 | 7.1.0 | v7.5.0 and later |
| v7.1.0 | - | v7.5.0 and later |
| v7.5.0 | v7.5.0 and later | - |
| v8.1.0 | v8.1.0 and later | - |

The following table lists the compatibility matrix for log backups:

| Backup version | Compatible restore versions | Incompatible restore versions |
|:---------|:----------------|:------------------|
| v6.5.0 | v7.1.0 | v7.5.0 and later |
| v7.1.0 | - | v7.5.0 and later |
| v7.5.0 | v7.5.0 and later | - |
| v8.1.0 | v8.1.0 and later | - |

> **Note:**
>
> - When only user data is backed up (full backup or log backup), all versions are compatible with each other.
> - In scenarios where restoring the `mysql` system table is incompatible, you can resolve the problem by setting `--with-sys-table=false` to skip restoring all system tables, or use a more fine-grained filter to just skip incompatible system tables, for example: `--filter '*.*' --filter "__TiDB_BR_Temporary_*.*" --filter '!mysql.*' --filter 'mysql.bind_info' --filter 'mysql.user' --filter 'mysql.global_priv' --filter 'mysql.global_grants' --filter 'mysql.default_roles' --filter 'mysql.role_edges' --filter '!sys.*' --filter '!INFORMATION_SCHEMA.*' --filter '!PERFORMANCE_SCHEMA.*' --filter '!METRICS_SCHEMA.*' --filter '!INSPECTION_SCHEMA.*'`.
> - `-` means that there are no compatibility restrictions for the corresponding scenario.
## See also

- [TiDB Snapshot Backup and Restore Guide](/br/br-snapshot-guide.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ The following steps will set up ProxySQL and TiDB on ports `6033` and `4000` res

## Production environment

For a production environment, it is recommended that you use [TiDB Cloud Dedicated](https://www.pingcap.com/tidb-dedicated/) directly for a fully-managed experience.
For a production environment, it is recommended that you use [TiDB Cloud Dedicated](https://www.pingcap.com/tidb-cloud-dedicated/) directly for a fully-managed experience.

### Prerequisite

Expand Down
Loading

0 comments on commit 8410411

Please sign in to comment.