Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
hfxsd committed Aug 15, 2024
2 parents b55157b + e3250a3 commit 72e1340
Show file tree
Hide file tree
Showing 11 changed files with 358 additions and 127 deletions.
7 changes: 4 additions & 3 deletions benchmark/benchmark-tidb-using-ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ summary: Learn how to run CH-benCHmark test on TiDB.

This document describes how to test TiDB using CH-benCHmark.

CH-benCHmark is a mixed workload containing both [TPC-C](http://www.tpc.org/tpcc/) and [TPC-H](http://www.tpc.org/tpch/) tests. It is the most common workload to test HTAP systems. For more information, see [The mixed workload CH-benCHmark](https://research.tableau.com/sites/default/files/a8-cole.pdf).
CH-benCHmark is a mixed workload containing both [TPC-C](http://www.tpc.org/tpcc/) and [TPC-H](http://www.tpc.org/tpch/) tests. It is the most common workload to test HTAP systems. For more information, see [The mixed workload CH-benCHmark](https://dl.acm.org/doi/10.1145/1988842.1988850).

Before running the CH-benCHmark test, you need to deploy [TiFlash](/tiflash/tiflash-overview.md) first, which is a TiDB's HTAP component. After you deploy TiFlash and [create TiFlash replicas](#create-tiflash-replicas), TiKV will replicate the latest data of TPC-C online transactions to TiFlash in real time, and the TiDB optimizer will automatically push down OLAP queries from TPC-H workload to the MPP engine of TiFlash for efficient execution.

Expand Down Expand Up @@ -85,9 +85,10 @@ In the result of the above statement:

## Collect statistics

To ensure that the TiDB optimizer can generate the optimal execution plan, execute the following SQL statements to collect statistics in advance.
To ensure that the TiDB optimizer can generate the optimal execution plan, execute the following SQL statements to collect statistics in advance. **Be sure to set [`tidb_analyze_column_options`](/system-variables.md#tidb_analyze_column_options-new-in-v830) to `ALL`, otherwise collecting statistics can result in a significant drop in query performance.**

```
set global tidb_analyze_column_options='ALL';
analyze table customer;
analyze table district;
analyze table history;
Expand Down Expand Up @@ -166,4 +167,4 @@ tpmC: 93826.9, efficiency: 729.6%
[Summary] Q7 - Count: 11, Sum(ms): 158928.2, Avg(ms): 14446.3
```

After the test is finished, you can execute the `tiup bench tpcc -H 172.16.5.140 -P 4000 -D tpcc --warehouses 1000 check` command to validate the data correctness.
After the test is finished, you can execute the `tiup bench tpcc -H 172.16.5.140 -P 4000 -D tpcc --warehouses 1000 check` command to validate the data correctness.
12 changes: 8 additions & 4 deletions best-practices/java-app-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,21 @@ For batch inserts, you can use the [`addBatch`/`executeBatch` API](https://www.t

In most scenarios, to improve execution efficiency, JDBC obtains query results in advance and save them in client memory by default. But when the query returns a super large result set, the client often wants the database server to reduce the number of records returned at a time, and waits until the client's memory is ready and it requests for the next batch.

Usually, there are two kinds of processing methods in JDBC:
The following two processing methods are usually used in JDBC:

- [Set `FetchSize` to `Integer.MIN_VALUE`](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html#ResultSet) to ensure that the client does not cache. The client will read the execution result from the network connection through `StreamingResult`.
- The first method: [Set `FetchSize` to `Integer.MIN_VALUE`](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html#ResultSet) to ensure that the client does not cache. The client will read the execution result from the network connection through `StreamingResult`.

When the client uses the streaming read method, it needs to finish reading or close `resultset` before continuing to use the statement to make a query. Otherwise, the error `No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.` is returned.

To avoid such an error in queries before the client finishes reading or closes `resultset`, you can add the `clobberStreamingResults=true` parameter in the URL. Then, `resultset` is automatically closed but the result set to be read in the previous streaming query is lost.

- To use Cursor Fetch, first [set `FetchSize`](http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html) as a positive integer and configure `useCursorFetch=true` in the JDBC URL.
- The second method: Use Cursor Fetch by first [setting `FetchSize`](http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html) as a positive integer and then configuring `useCursorFetch = true` in the JDBC URL.

TiDB supports both methods, but it is preferred that you use the first method, because it is a simpler implementation and has a better execution efficiency.
TiDB supports both methods, but it is recommended that you use the first method that sets `FetchSize` to `Integer.MIN_VALUE`, because it is a simpler implementation and has better execution efficiency.

For the second method, TiDB first loads all data to the TiDB node, and then returns data to the client according to the `FetchSize`. Therefore, it usually consumes more memory than the first method. If [`tidb_enable_tmp_storage_on_oom`](/system-variables.md#tidb_enable_tmp_storage_on_oom) is set to `ON`, TiDB might temporarily write the result to the hard disk.

If the [`tidb_enable_lazy_cursor_fetch`](/system-variables.md#tidb_enable_lazy_cursor_fetch-new-in-v830) system variable is set to `ON`, TiDB tries to read part of the data only when the client fetches it, which uses less memory. For more details and limitations, read the [complete descriptions for the `tidb_enable_lazy_cursor_fetch` system variable](/system-variables.md#tidb_enable_lazy_cursor_fetch-new-in-v830).

### MySQL JDBC parameters

Expand Down
6 changes: 6 additions & 0 deletions br/br-incremental-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ Because restoring the incremental backup relies on the snapshot of the database

The incremental backup does not support batch renaming of tables. If batch renaming of tables occurs during the incremental backup process, the data restore might fail. It is recommended to perform a full backup after batch renaming tables, and use the latest full backup to replace the incremental data during restore.

Starting from v8.3.0, the `--allow-pitr-from-incremental` configuration parameter is introduced to control whether incremental backups and subsequent log backups are compatible. The default value is `true`, which means that incremental backups are compatible with subsequent log backups.

- When you keep the default value `true`, the DDLs that need to be replayed are strictly checked before the incremental restore begins. This mode does not yet support `ADD INDEX`, `MODIFY COLUMN`, or `REORG PARTITION`. If you want to use incremental backups together with log backups, make sure that none of the preceding DDLs exist during the incremental backup process. Otherwise, these three DDLs cannot be replayed correctly.

- If you want to use incremental restores without log backups during the whole recovery process, you can set `--allow-pitr-from-incremental` to `false` to skip the checks in the incremental recovery phase.

## Back up incremental data

To back up incremental data, run the `tiup br backup` command with **the last backup timestamp** `--lastbackupts` specified. In this way, br command-line tool automatically backs up incremental data generated between `lastbackupts` and the current time. To get `--lastbackupts`, run the `validate` command. The following is an example:
Expand Down
4 changes: 4 additions & 0 deletions br/br-pitr-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,10 @@ Expected output:
## Restore to a specified point in time (PITR)
> **Note:**
>
> If you specify `--full-backup-storage` as the incremental backup address for `restore point`, for restores of this backup and any previous incremental backups, you need to set the parameter `--allow-pitr-from-incremental` to `true` to make the incremental backups compatible with the subsequent log backups.
You can run the `tiup br restore point` command to perform a PITR on a new cluster or just restore the log backup data.
Run `tiup br restore point --help` to see the help information:
Expand Down
12 changes: 8 additions & 4 deletions develop/dev-guide-connection-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,17 +143,21 @@ For batch inserts, you can use the [`addBatch`/`executeBatch` API](https://www.t

In most scenarios, to improve execution efficiency, JDBC obtains query results in advance and saves them in client memory by default. But when the query returns a super large result set, the client often wants the database server to reduce the number of records returned at a time and waits until the client's memory is ready and it requests for the next batch.

Usually, there are two kinds of processing methods in JDBC:
The following two processing methods are usually used in JDBC:

- [Set **FetchSize** to `Integer.MIN_VALUE`](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html#ResultSet) to ensure that the client does not cache. The client will read the execution result from the network connection through `StreamingResult`.
- The first method: [Set **FetchSize** to `Integer.MIN_VALUE`](https://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html#ResultSet) to ensure that the client does not cache. The client will read the execution result from the network connection through `StreamingResult`.

When the client uses the streaming read method, it needs to finish reading or close `resultset` before continuing to use the statement to make a query. Otherwise, the error `No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.` is returned.

To avoid such an error in queries before the client finishes reading or closes `resultset`, you can add the `clobberStreamingResults=true` parameter in the URL. Then, `resultset` is automatically closed but the result set to be read in the previous streaming query is lost.

- To use Cursor Fetch, first [set `FetchSize`](http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html) as a positive integer and configure `useCursorFetch=true` in the JDBC URL.
- The second method: Use Cursor Fetch by first [setting `FetchSize`](http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html) as a positive integer and then configuring `useCursorFetch = true` in the JDBC URL.

TiDB supports both methods, but it is preferred that you use the first method, because it is a simpler implementation and has a better execution efficiency.
TiDB supports both methods, but it is recommended that you use the first method that sets `FetchSize` to `Integer.MIN_VALUE`, because it is a simpler implementation and has better execution efficiency.

For the second method, TiDB first loads all data to the TiDB node, and then returns data to the client according to the `FetchSize`. Therefore, it usually consumes more memory than the first method. If [`tidb_enable_tmp_storage_on_oom`](/system-variables.md#tidb_enable_tmp_storage_on_oom) is set to `ON`, TiDB might temporarily write the result to the hard disk.

If the [`tidb_enable_lazy_cursor_fetch`](/system-variables.md#tidb_enable_lazy_cursor_fetch-new-in-v830) system variable is set to `ON`, TiDB tries to read part of the data only when the client fetches it, which uses less memory. For more details and limitations, read the [complete descriptions for the `tidb_enable_lazy_cursor_fetch` system variable](/system-variables.md#tidb_enable_lazy_cursor_fetch-new-in-v830).

### MySQL JDBC parameters

Expand Down
26 changes: 13 additions & 13 deletions sql-plan-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,25 +231,25 @@ The original SQL statement and the bound statement must have the same text after

#### Create a binding according to a historical execution plan

To make the execution plan of a SQL statement fixed to a historical execution plan, you can use `plan_digest` to bind that historical execution plan to the SQL statement, which is more convenient than binding it according to a SQL statement.
To make the execution plan of a SQL statement fixed to a historical execution plan, you can use Plan Digest to bind that historical execution plan to the SQL statement, which is more convenient than binding it according to a SQL statement. In addition, you can bind the execution plan for multiple SQL statements at once. For more details and examples, see [`CREATE [GLOBAL|SESSION] BINDING`](/sql-statements/sql-statement-create-binding.md).

When using this feature, note the following:

- The feature generates hints according to historical execution plans and uses the generated hints for binding. Because historical execution plans are stored in [Statement Summary Tables](/statement-summary-tables.md), before using this feature, you need to enable the [`tidb_enable_stmt_summary`](/system-variables.md#tidb_enable_stmt_summary-new-in-v304) system variable first.
- For TiFlash queries, Join queries with three or more tables, and queries that contain subqueries, the auto-generated hints are not adequate, which might result in the plan not being fully bound. In such cases, a warning will occur when creating a binding.
- If a historical execution plan is for a SQL statement with hints, the hints will be added to the binding. For example, after executing `SELECT /*+ max_execution_time(1000) */ * FROM t`, the binding created with its `plan_digest` will include `max_execution_time(1000)`.
- If a historical execution plan is for a SQL statement with hints, the hints will be added to the binding. For example, after executing `SELECT /*+ max_execution_time(1000) */ * FROM t`, the binding created with its Plan Digest will include `max_execution_time(1000)`.

The SQL statement of this binding method is as follows:

```sql
CREATE [GLOBAL | SESSION] BINDING FROM HISTORY USING PLAN DIGEST 'plan_digest';
CREATE [GLOBAL | SESSION] BINDING FROM HISTORY USING PLAN DIGEST StringLiteralOrUserVariableList;
```

This statement binds an execution plan to a SQL statement using `plan_digest`. The default scope is SESSION. The applicable SQL statements, priorities, scopes, and effective conditions of the created bindings are the same as that of [bindings created according to SQL statements](#create-a-binding-according-to-a-sql-statement).
The preceding statement binds an execution plan to a SQL statement using Plan Digest. The default scope is SESSION. The applicable SQL statements, priorities, scopes, and effective conditions of the created bindings are the same as that of [bindings created according to SQL statements](#create-a-binding-according-to-a-sql-statement).

To use this binding method, you need to first get the `plan_digest` corresponding to the target historical execution plan in `statements_summary`, and then create a binding using the `plan_digest`. The detailed steps are as follows:
To use this binding method, you need to first get the Plan Digest corresponding to the target historical execution plan in `statements_summary`, and then create a binding using the Plan Digest. The detailed steps are as follows:

1. Get the `plan_digest` corresponding to the target execution plan in `statements_summary`.
1. Get the Plan Digest corresponding to the target execution plan in `statements_summary`.

For example:

Expand All @@ -274,9 +274,9 @@ To use this binding method, you need to first get the `plan_digest` correspondin
BINARY_PLAN: 6QOYCuQDCg1UYWJsZVJlYWRlcl83Ev8BCgtTZWxlY3Rpb25fNhKOAQoPBSJQRnVsbFNjYW5fNSEBAAAAOA0/QSkAAQHwW4jDQDgCQAJKCwoJCgR0ZXN0EgF0Uh5rZWVwIG9yZGVyOmZhbHNlLCBzdGF0czpwc2V1ZG9qInRpa3ZfdGFzazp7dGltZTo1NjAuOMK1cywgbG9vcHM6MH1w////CQMEAXgJCBD///8BIQFzCDhVQw19BAAkBX0QUg9lcSgBfCAudC5hLCAxKWrmYQAYHOi0gc6hBB1hJAFAAVIQZGF0YTo9GgRaFAW4HDQuMDVtcywgCbYcMWKEAWNvcF8F2agge251bTogMSwgbWF4OiA1OTguNsK1cywgcHJvY19rZXlzOiAwLCBycGNfBSkAMgkMBVcQIDYwOS4pEPBDY29wcl9jYWNoZV9oaXRfcmF0aW86IDAuMDAsIGRpc3RzcWxfY29uY3VycmVuY3k6IDE1fXCwAXj///////////8BGAE=
```

In this example, you can see that the execution plan corresponding to `plan_digest` is `4e3159169cc63c14b139a4e7d72eae1759875c9a9581f94bb2079aae961189cb`.
In this example, you can see that the execution plan corresponding to Plan Digest is `4e3159169cc63c14b139a4e7d72eae1759875c9a9581f94bb2079aae961189cb`.

2. Use `plan_digest` to create a binding:
2. Use Plan Digest to create a binding:

```sql
CREATE BINDING FROM HISTORY USING PLAN DIGEST '4e3159169cc63c14b139a4e7d72eae1759875c9a9581f94bb2079aae961189cb';
Expand Down Expand Up @@ -317,7 +317,7 @@ SELECT @@LAST_PLAN_FROM_BINDING;

### Remove a binding

You can remove a binding according to a SQL statement or `sql_digest`.
You can remove a binding according to a SQL statement or SQL Digest.

#### Remove a binding according to a SQL statement

Expand All @@ -343,15 +343,15 @@ explain SELECT * FROM t1,t2 WHERE t1.id = t2.id;

In the example above, the dropped binding in the SESSION scope shields the corresponding binding in the GLOBAL scope. The optimizer does not add the `sm_join(t1, t2)` hint to the statement. The top node of the execution plan in the `explain` result is not fixed to MergeJoin by this hint. Instead, the top node is independently selected by the optimizer according to the cost estimation.

#### Remove a binding according to `sql_digest`
#### Remove a binding according to SQL Digest

In addition to removing a binding according to a SQL statement, you can also remove a binding according to `sql_digest`.
In addition to removing a binding according to a SQL statement, you can also remove a binding according to SQL Digest. For more details and examples, see [`DROP [GLOBAL|SESSION] BINDING`](/sql-statements/sql-statement-drop-binding.md).

```sql
DROP [GLOBAL | SESSION] BINDING FOR SQL DIGEST 'sql_digest';
DROP [GLOBAL | SESSION] BINDING FOR SQL DIGEST StringLiteralOrUserVariableList;
```

This statement removes an execution plan binding corresponding to `sql_digest` at the GLOBAL or SESSION level. The default scope is SESSION. You can get the `sql_digest` by [viewing bindings](#view-bindings).
This statement removes an execution plan binding corresponding to SQL Digest at the GLOBAL or SESSION level. The default scope is SESSION. You can get the SQL Digest by [viewing bindings](#view-bindings).

> **Note:**
>
Expand Down
Loading

0 comments on commit 72e1340

Please sign in to comment.