From 3120a92b179af9efed72323ef3fab5757476f035 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Tue, 21 Jul 2020 18:17:29 +0800 Subject: [PATCH 01/12] Update query-execution-plan.md --- query-execution-plan.md | 66 ++++++++++++++++++++++++++++------------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 3809bf2116746..67ce25f420d83 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -6,11 +6,11 @@ aliases: ['/docs/dev/query-execution-plan/','/docs/dev/reference/performance/und # Understand the Query Execution Plan -Based on the details of your tables, the TiDB optimizer chooses the most efficient query execution plan, which consists of a series of operators. This document details the execution plan information returned by the `EXPLAIN` statement in TiDB. +Based on the latest statistics of your tables, the TiDB optimizer chooses the most efficient query execution plan, which consists of a series of operators. This document details the execution plan in TiDB. ## `EXPLAIN` overview -The result of the `EXPLAIN` statement provides information about how TiDB executes SQL queries: +You can use the `EXPLAIN` command in TiDB to view the execution plan. The result of the `EXPLAIN` statement provides information about how TiDB executes SQL queries: - `EXPLAIN` works together with statements such as `SELECT` and `DELETE`. - When you execute the `EXPLAIN` statement, TiDB returns the final optimized physical execution plan. In other words, `EXPLAIN` displays the complete information about how TiDB executes the SQL statement, such as in which order, how tables are joined, and what the expression tree looks like. @@ -84,6 +84,10 @@ Currently, calculation tasks of TiDB can be divided into two categories: cop tas One of the goals of SQL optimization is to push the calculation down to TiKV as much as possible. The Coprocessor in TiKV supports most of the built-in SQL functions (including the aggregate functions and the scalar functions), SQL `LIMIT` operations, index scans, and table scans. However, all `Join` operations can only be performed as root tasks in TiDB. +### Access Object overview + +The data item accessed by the operator, including `table`, `partition`, and `index`(if any). Only operators that directly access the data have this information. + ### Range query In the `WHERE`/`HAVING`/`ON` conditions, the TiDB optimizer analyzes the result returned by the primary key query or the index key query. For example, these conditions might include comparison operators of the numeric and date type, such as `>`, `<`, `=`, `>=`, `<=`, and the character type such as `LIKE`. @@ -153,6 +157,8 @@ The `IndexLookUp_6` operator has two child nodes: `IndexFullScan_4(Build)` and ` This execution plan is not as efficient as using `TableReader` to perform a full table scan, because `IndexLookUp` performs an extra index scan (which comes with additional overhead), apart from the table scan. +For table scan operations, the operator info column in the explain table shows whether the data is sorted. In the above example, the `keep order:false` in the `IndexFullScan` operator indicates that the data is unsorted. The `stats:pseudo` in the operator info means that the statists will not be used for estimation due to no or too old statistics. For other scan operations, the operator info involves similar information. + #### `TableReader` example {{< copyable "sql" >}} @@ -178,32 +184,42 @@ In the above example, the child node of the `TableReader_7` operator is `Selecti #### `IndexMerge` example -{{< copyable "sql" >}} +IndexMerge is a new way to access tables in TiDB 4.0. In the IndexMerge access mode, the optimizer can use multiple indexes in a table and merge the returned results of each index. In some scenarios, this mode can reduce a large amount of unnecessary data scan and improve the efficiency of the query execution. -```sql -set @@tidb_enable_index_merge = 1; -explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; ``` - -```sql -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -| id | estRows | task | access object | operator info | -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -| IndexMerge_16 | 6666.67 | root | | | -| ├─IndexRangeScan_13(Build) | 3333.33 | cop[tikv] | table:t, index:idx_a(a) | range:(1,+inf], keep order:false, stats:pseudo | -| ├─IndexRangeScan_14(Build) | 3333.33 | cop[tikv] | table:t, index:idx_b(b) | range:(1,+inf], keep order:false, stats:pseudo | -| └─TableRowIDScan_15(Probe) | 6666.67 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+------------------------------+---------+-----------+-------------------------+------------------------------------------------+ -4 rows in set (0.00 sec) +mysql> explain select * from t where a = 1 or b = 1; ++-------------------------+----------+-----------+---------------+--------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+--------------------------------------+ +| TableReader_7 | 8000.00 | root | | data:Selection_6 | +| └─Selection_6 | 8000.00 | cop[tikv] | | or(eq(test.t.a, 1), eq(test.t.b, 1)) | +| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+--------------------------------------+ +mysql> set @@tidb_enable_index_merge = 1; +mysql> explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ +| IndexMerge_16 | 6666.67 | root | | | +| ├─IndexRangeScan_13(Build) | 3333.33 | cop[tikv] | table:t, index:idx_a(a) | range:(1,+inf], keep order:false, stats:pseudo | +| ├─IndexRangeScan_14(Build) | 3333.33 | cop[tikv] | table:t, index:idx_b(b) | range:(1,+inf], keep order:false, stats:pseudo | +| └─TableRowIDScan_15(Probe) | 6666.67 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ ``` -`IndexMerge` makes it possible that multiple indexes are used during table scans. In the above example, the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meet the conditions according to these `RowID`s. +In the above example, without IndexMerge, only one index can be used in each table because the filter condition of the query is an expression connected by `OR`. `a = 1` cannot be pushed down to the index `a` and `b = 1` cannot be pushed down to the index `b`. This way makes the efficiency of the full scan very low when the amount of data in `t` is large. For such scenarios, TiDB introduces IndexMerge, a new access mode to tables. + +In the IndexMerge access mode, the optimizer can use multiple indexes in a table, and combine the returned results of each index to generate the execution plan of the latter IndexMerge in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meet the conditions according to these `RowID`s. + +For the table scan that is performed by range such as indexRangeScan/TableRangeScan , the operator info column in the explain table has more information about the range of the scanned data than other scan operations. In the above example, the `range:(1,+inf]` in the IndexRangeScan operator indicates that the operator scans the data from 1 to positive infinity. > **Note:** > -> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. +> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. You can enable `IndexMerge` in two ways: > -> You can enable `IndexMerge` by configuring the `session` or `global` variables: execute the `set @@tidb_enable_index_merge = 1;` statement in the client. +> - Set the system variable `tidb_enable_index_merge` to 1; +> +> - Use SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query; Note: SQL Hint has a higher priority than system variables. ### Read the aggregated execution plan @@ -239,6 +255,8 @@ Generally speaking, `Hash Aggregate` is executed in two stages. - One is on the Coprocessor of TiKV/TiFlash, with the intermediate results of the aggregation function calculated when the table scan operator reads the data. - The other is at the TiDB layer, with the final result calculated through aggregating the intermediate results of all Coprocessor Tasks. +The operator info column in the explain table also records other information about Hash Aggregation. You need to pay attention to what aggregation function that Aggregation uses. In the above example, the operator info of the Hash Aggregation operator is `funcs:count(Column#7)->Column#4`. It means that Hash Aggregation uses the aggregation function `count` for calculation. The operator info of the Stream Aggregation operator in the following example is the same with this one. + #### `Stream Aggregate` example The `Stream Aggregation` operator usually takes up less memory than `Hash Aggregate`. In some scenarios, `Stream Aggregation` executes faster than `Hash Aggregate`. In the case of a large amount of data or insufficient system memory, it is recommended to use the `Stream Aggregate` operator. An example is as follows: @@ -309,6 +327,9 @@ The execution process of `Hash Join` is as follows: 4. Use the data of the `Probe` side to probe the Hash Table. 5. Return qualified data to the user. + +The operator info column in the explain table also records other information about Hash Join, including whether the query is Inner Join or Outer Join, and what are the conditions of join. In the above example, the query is an Inner Join, where the Join condition `equal:[eq(test.t1.id, test.t2.id)]` partly corresponds with the query statement `where t1.id = t2. id`.The operator info of the other Join operators in the following example is similar to this one. + #### `Merge Join` example The `Merge Join` operator usually uses less memory than `Hash Join`. However, `Merge Join` might take longer to be executed. When the amount of data is large, or the system memory is insufficient, it is recommended to use `Merge Join`. The following is an example: @@ -470,9 +491,14 @@ EXPLAIN SELECT count(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00 After adding the index, use `IndexScan_24` to directly read the data that meets the `start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59'` condition. The estimated number of rows to be scanned decreases from 19117643.00 to 8166.73. In the test environment, the execution time of this query decreases from 50.41 seconds to 0.01 seconds. +## Operator-related system variables + +Based on MySQL, TiDB defines some special system variables and syntax to optimize performance. Some system variables are related to specific operators, such as the concurrency of the operator, the upper limit of the operator memory, and whether to use partition tables. These can be controlled by system variables, thereby affecting the efficiency of each operator. + ## See also * [EXPLAIN](/sql-statements/sql-statement-explain.md) * [EXPLAIN ANALYZE](/sql-statements/sql-statement-explain-analyze.md) * [ANALYZE TABLE](/sql-statements/sql-statement-analyze-table.md) * [TRACE](/sql-statements/sql-statement-trace.md) +* [System Variables](/tidb-specific-system-variables.md) \ No newline at end of file From 8d13845486936eefea56c3f658f9ab69e394cf66 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Tue, 21 Jul 2020 18:23:59 +0800 Subject: [PATCH 02/12] Update query-execution-plan.md --- query-execution-plan.md | 1 - 1 file changed, 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 67ce25f420d83..690ae49a3c437 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -327,7 +327,6 @@ The execution process of `Hash Join` is as follows: 4. Use the data of the `Probe` side to probe the Hash Table. 5. Return qualified data to the user. - The operator info column in the explain table also records other information about Hash Join, including whether the query is Inner Join or Outer Join, and what are the conditions of join. In the above example, the query is an Inner Join, where the Join condition `equal:[eq(test.t1.id, test.t2.id)]` partly corresponds with the query statement `where t1.id = t2. id`.The operator info of the other Join operators in the following example is similar to this one. #### `Merge Join` example From 8f59ca0d9bd2466381fead1d2a7d734925339e62 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Tue, 21 Jul 2020 18:31:28 +0800 Subject: [PATCH 03/12] Update query-execution-plan.md --- query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 690ae49a3c437..c27ee838417c0 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -500,4 +500,4 @@ Based on MySQL, TiDB defines some special system variables and syntax to optimiz * [EXPLAIN ANALYZE](/sql-statements/sql-statement-explain-analyze.md) * [ANALYZE TABLE](/sql-statements/sql-statement-analyze-table.md) * [TRACE](/sql-statements/sql-statement-trace.md) -* [System Variables](/tidb-specific-system-variables.md) \ No newline at end of file +* [System Variables](/system-variables.md) \ No newline at end of file From 62ccdfcc012a99afea150c9e2713b23286af9127 Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Thu, 23 Jul 2020 12:54:11 +0800 Subject: [PATCH 04/12] Apply suggestions from code review Co-authored-by: Ran --- query-execution-plan.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index c27ee838417c0..580874accceb7 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -86,7 +86,7 @@ One of the goals of SQL optimization is to push the calculation down to TiKV as ### Access Object overview -The data item accessed by the operator, including `table`, `partition`, and `index`(if any). Only operators that directly access the data have this information. +Access Object is the data item accessed by the operator, including `table`, `partition`, and `index` (if any). Only operators that directly access the data have this information. ### Range query @@ -157,7 +157,7 @@ The `IndexLookUp_6` operator has two child nodes: `IndexFullScan_4(Build)` and ` This execution plan is not as efficient as using `TableReader` to perform a full table scan, because `IndexLookUp` performs an extra index scan (which comes with additional overhead), apart from the table scan. -For table scan operations, the operator info column in the explain table shows whether the data is sorted. In the above example, the `keep order:false` in the `IndexFullScan` operator indicates that the data is unsorted. The `stats:pseudo` in the operator info means that the statists will not be used for estimation due to no or too old statistics. For other scan operations, the operator info involves similar information. +For table scan operations, the operator info column in the `explain` table shows whether the data is sorted. In the above example, the `keep order:false` in the `IndexFullScan` operator indicates that the data is unsorted. The `stats:pseudo` in the operator info means that there is no statistics, or that the statistics will not be used for estimation because it is outdated. For other scan operations, the operator info involves similar information. #### `TableReader` example @@ -184,7 +184,7 @@ In the above example, the child node of the `TableReader_7` operator is `Selecti #### `IndexMerge` example -IndexMerge is a new way to access tables in TiDB 4.0. In the IndexMerge access mode, the optimizer can use multiple indexes in a table and merge the returned results of each index. In some scenarios, this mode can reduce a large amount of unnecessary data scan and improve the efficiency of the query execution. +`IndexMerge` is a new way to access tables, introduced in TiDB 4.0. In the `IndexMerge` access mode, the optimizer can use multiple indexes in a table and merge the results returned by each index. In some scenarios, this mode can reduce a large amount of unnecessary data scan and improve the efficiency of the query execution. ``` mysql> explain select * from t where a = 1 or b = 1; @@ -207,9 +207,9 @@ mysql> explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; +--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ ``` -In the above example, without IndexMerge, only one index can be used in each table because the filter condition of the query is an expression connected by `OR`. `a = 1` cannot be pushed down to the index `a` and `b = 1` cannot be pushed down to the index `b`. This way makes the efficiency of the full scan very low when the amount of data in `t` is large. For such scenarios, TiDB introduces IndexMerge, a new access mode to tables. +In the above example, where the filter condition of the query is an expression connected by `OR`, without IndexMerge, only one index can be used in each table. In such case, `a = 1` cannot be pushed down to the index `a`, and `b = 1` cannot be pushed down to the index `b`. When `t` has a large amount of data, the execution of full table scan is inefficient. For such scenarios, TiDB introduces `IndexMerge`, a new access mode to tables. -In the IndexMerge access mode, the optimizer can use multiple indexes in a table, and combine the returned results of each index to generate the execution plan of the latter IndexMerge in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meet the conditions according to these `RowID`s. +In the IndexMerge access mode, the optimizer can use multiple indexes in a table, and combine the returned results of each index to generate the execution plan of the latter IndexMerge in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meets the conditions according to these `RowID`s. For the table scan that is performed by range such as indexRangeScan/TableRangeScan , the operator info column in the explain table has more information about the range of the scanned data than other scan operations. In the above example, the `range:(1,+inf]` in the IndexRangeScan operator indicates that the operator scans the data from 1 to positive infinity. @@ -255,7 +255,7 @@ Generally speaking, `Hash Aggregate` is executed in two stages. - One is on the Coprocessor of TiKV/TiFlash, with the intermediate results of the aggregation function calculated when the table scan operator reads the data. - The other is at the TiDB layer, with the final result calculated through aggregating the intermediate results of all Coprocessor Tasks. -The operator info column in the explain table also records other information about Hash Aggregation. You need to pay attention to what aggregation function that Aggregation uses. In the above example, the operator info of the Hash Aggregation operator is `funcs:count(Column#7)->Column#4`. It means that Hash Aggregation uses the aggregation function `count` for calculation. The operator info of the Stream Aggregation operator in the following example is the same with this one. +The operator info column in the `explain` table also records other information about `Hash Aggregation`. You need to pay attention to what aggregate function that `Hash Aggregation` uses. In the above example, the operator info of the `Hash Aggregation` operator is `funcs:count(Column#7)->Column#4`. It means that `Hash Aggregation` uses the aggregate function `count` for calculation. The operator info of the `Stream Aggregation` operator in the following example is the same with this one. #### `Stream Aggregate` example @@ -327,7 +327,7 @@ The execution process of `Hash Join` is as follows: 4. Use the data of the `Probe` side to probe the Hash Table. 5. Return qualified data to the user. -The operator info column in the explain table also records other information about Hash Join, including whether the query is Inner Join or Outer Join, and what are the conditions of join. In the above example, the query is an Inner Join, where the Join condition `equal:[eq(test.t1.id, test.t2.id)]` partly corresponds with the query statement `where t1.id = t2. id`.The operator info of the other Join operators in the following example is similar to this one. +The operator info column in the `explain` table also records other information about `Hash Join`, including whether the query is Inner Join or Outer Join, and what are the conditions of Join. In the above example, the query is an Inner Join, where the Join condition `equal:[eq(test.t1.id, test.t2.id)]` partly corresponds with the query statement `where t1.id = t2. id`. The operator info of the other Join operators in the following examples is similar to this one. #### `Merge Join` example @@ -500,4 +500,4 @@ Based on MySQL, TiDB defines some special system variables and syntax to optimiz * [EXPLAIN ANALYZE](/sql-statements/sql-statement-explain-analyze.md) * [ANALYZE TABLE](/sql-statements/sql-statement-analyze-table.md) * [TRACE](/sql-statements/sql-statement-trace.md) -* [System Variables](/system-variables.md) \ No newline at end of file +* [System Variables](/system-variables.md) From 0cffb123f4427971ba252bf1b2db847a45e3ab23 Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Thu, 23 Jul 2020 12:56:55 +0800 Subject: [PATCH 05/12] Update query-execution-plan.md Co-authored-by: Ran --- query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 580874accceb7..6369d1c0d52f6 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -492,7 +492,7 @@ After adding the index, use `IndexScan_24` to directly read the data that meets ## Operator-related system variables -Based on MySQL, TiDB defines some special system variables and syntax to optimize performance. Some system variables are related to specific operators, such as the concurrency of the operator, the upper limit of the operator memory, and whether to use partition tables. These can be controlled by system variables, thereby affecting the efficiency of each operator. +Based on MySQL, TiDB defines some special system variables and syntax to optimize performance. Some system variables are related to specific operators, such as the concurrency of the operator, the upper limit of the operator memory, and whether to use partitioned tables. These can be controlled by system variables, thereby affecting the efficiency of each operator. ## See also From d79b068c4993cdf7b14f27379962984adf88cf57 Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Thu, 23 Jul 2020 13:22:21 +0800 Subject: [PATCH 06/12] Update query-execution-plan.md --- query-execution-plan.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 6369d1c0d52f6..762d676459a03 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -184,7 +184,7 @@ In the above example, the child node of the `TableReader_7` operator is `Selecti #### `IndexMerge` example -`IndexMerge` is a new way to access tables, introduced in TiDB 4.0. In the `IndexMerge` access mode, the optimizer can use multiple indexes in a table and merge the results returned by each index. In some scenarios, this mode can reduce a large amount of unnecessary data scan and improve the efficiency of the query execution. +`IndexMerge` is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. ``` mysql> explain select * from t where a = 1 or b = 1; @@ -207,19 +207,21 @@ mysql> explain select * from t use index(idx_a, idx_b) where a > 1 or b > 1; +--------------------------------+---------+-----------+-------------------------+------------------------------------------------+ ``` -In the above example, where the filter condition of the query is an expression connected by `OR`, without IndexMerge, only one index can be used in each table. In such case, `a = 1` cannot be pushed down to the index `a`, and `b = 1` cannot be pushed down to the index `b`. When `t` has a large amount of data, the execution of full table scan is inefficient. For such scenarios, TiDB introduces `IndexMerge`, a new access mode to tables. +In the above query, the filter condition is a `WHERE` clause that uses `OR` as the connector. Without `IndexMerge`, you can use only one index per table. `a = 1` cannot be pushed down to the index `a`; neither can `b = 1` be pushed down to the index `b`. The full table scan is inefficient when a huge volume of data exists in `t`. To handle such a scenario, `IndexMerge` is introduced in TiDB to access tables. -In the IndexMerge access mode, the optimizer can use multiple indexes in a table, and combine the returned results of each index to generate the execution plan of the latter IndexMerge in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meets the conditions according to these `RowID`s. +`IndexMerge` allows the optimizer to use multiple indexes per table, and merge the results returned by each index to generate the execution plan of the latter `IndexMerge` in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meets the conditions according to these `RowID`s. For the table scan that is performed by range such as indexRangeScan/TableRangeScan , the operator info column in the explain table has more information about the range of the scanned data than other scan operations. In the above example, the `range:(1,+inf]` in the IndexRangeScan operator indicates that the operator scans the data from 1 to positive infinity. > **Note:** > -> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. You can enable `IndexMerge` in two ways: +> `IndexMerge` is disabled by default. Enable the `IndexMerge` in one of two ways: > -> - Set the system variable `tidb_enable_index_merge` to 1; +> - Set the `tidb_enable_index_merge` system variable to 1; > -> - Use SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query; Note: SQL Hint has a higher priority than system variables. +> - Use the SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query. +> +> SQL Hint has a higher priority than system variables. ### Read the aggregated execution plan From 7aeaa9d68ade2861eb14f30b37f51b1fb2bac5aa Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Thu, 23 Jul 2020 13:26:32 +0800 Subject: [PATCH 07/12] Update query-execution-plan.md --- query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 762d676459a03..9690e4e93ea37 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -215,7 +215,7 @@ For the table scan that is performed by range such as indexRangeScan/TableRangeS > **Note:** > -> `IndexMerge` is disabled by default. Enable the `IndexMerge` in one of two ways: +> At present, the `IndexMerge` feature is disabled by default in TiDB 4.0.0-rc.1. In addition, the currently supported scenarios of `IndexMerge` in TiDB 4.0 are limited to the disjunctive normal form (expressions connected by `or`). The conjunctive normal form (expressions connected by `and`) will be supported in later versions. Enable the `IndexMerge` in one of two ways: > > - Set the `tidb_enable_index_merge` system variable to 1; > From 5ea618e9c64ecd7c1e80f9ad0d6fbcb8f61f5bd9 Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Fri, 24 Jul 2020 16:23:58 +0800 Subject: [PATCH 08/12] Update query-execution-plan.md Co-authored-by: Zhang Jian --- query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 9690e4e93ea37..c44bb4e7fb160 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -211,7 +211,7 @@ In the above query, the filter condition is a `WHERE` clause that uses `OR` as t `IndexMerge` allows the optimizer to use multiple indexes per table, and merge the results returned by each index to generate the execution plan of the latter `IndexMerge` in the figure above. Here the `IndexMerge_16` operator has three child nodes, among which `IndexRangeScan_13` and `IndexRangeScan_14` get all the `RowID`s that meet the conditions based on the result of range scan, and then the `TableRowIDScan_15` operator accurately reads all the data that meets the conditions according to these `RowID`s. -For the table scan that is performed by range such as indexRangeScan/TableRangeScan , the operator info column in the explain table has more information about the range of the scanned data than other scan operations. In the above example, the `range:(1,+inf]` in the IndexRangeScan operator indicates that the operator scans the data from 1 to positive infinity. +For the scan operation that is performed on a specific range of data, such as `IndexRangeScan`/`TableRangeScan`, the `operator info` column in the result has additional information about the scan range compared with other scan operations like `IndexFullScan`/`TableFullScan`. In the above example, the `range:(1,+inf]` in the `IndexRangeScan_13` operator indicates that the operator scans the data from 1 to positive infinity. > **Note:** > From 19b408c2065a8796fee4a2c2ec483dbf44e98109 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Mon, 27 Jul 2020 10:07:33 +0800 Subject: [PATCH 09/12] align with pingcap/docs-cn#3201 --- TOC.md | 1 - index-merge.md | 100 ------------------------------------------------- 2 files changed, 101 deletions(-) delete mode 100644 index-merge.md diff --git a/TOC.md b/TOC.md index e51e92595040e..e32af453fafc3 100644 --- a/TOC.md +++ b/TOC.md @@ -120,7 +120,6 @@ + Control Execution Plan + [Optimizer Hints](/optimizer-hints.md) + [SQL Plan Management](/sql-plan-management.md) - + [Access Tables Using `IndexMerge`](/index-merge.md) + [The Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md) + Tutorials + [Multiple Data Centers in One City Deployment](/multi-data-centers-in-one-city-deployment.md) diff --git a/index-merge.md b/index-merge.md deleted file mode 100644 index efecf2eb0192c..0000000000000 --- a/index-merge.md +++ /dev/null @@ -1,100 +0,0 @@ ---- -title: Access Tables Using `IndexMerge` -summary: Learn how to access tables using the `IndexMerge` query execution plan. -aliases: ['/docs/dev/index-merge/','/docs/dev/reference/performance/index-merge/'] ---- - -# Access Tables Using `IndexMerge` - -`IndexMerge` is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. - -This document introduces the applicable scenarios, a use case, and how to enable `IndexMerge`. - -## Applicable scenarios - -For each table involved in the SQL query, the TiDB optimizer during the physical optimization used to choose one of the following three access methods based on the cost estimation: - -- `TableScan`: Scans the table data, with `_tidb_rowid` as the key. -- `IndexScan`: Scans the index data, with the index column values as the key. -- `IndexLookUp`: Gets the `_tidb_rowid` set from the index, with the index column values as the key, and then retrieves the corresponding data rows of the tables. - -The above methods can use only one index per table. In some cases, the selected execution plan is not optimal. For example: - -{{< copyable "sql" >}} - -```sql -create table t(a int, b int, c int, unique key(a), unique key(b)); -explain select * from t where a = 1 or b = 1; -``` - -In the above query, the filter condition is a `WHERE` clause that uses `OR` as the connector. Because you can use only one index per table, `a = 1` cannot be pushed down to the index `a`; neither can `b = 1` be pushed down to the index `b`. To ensure that the result is correct, the execution plan of `TableScan` is generated for the query: - -``` -+-------------------------+----------+-----------+---------------+--------------------------------------+ -| id | estRows | task | access object | operator info | -+-------------------------+----------+-----------+---------------+--------------------------------------+ -| TableReader_7 | 8000.00 | root | | data:Selection_6 | -| └─Selection_6 | 8000.00 | cop[tikv] | | or(eq(test.t.a, 1), eq(test.t.b, 1)) | -| └─TableFullScan_5 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+-------------------------+----------+-----------+---------------+--------------------------------------+ -``` - -The full table scan is inefficient when a huge volume of data exists in `t`, but the query returns only two rows at most. To handle such a scenario, `IndexMerge` is introduced in TiDB to access tables. - -## Use case - -`IndexMerge` allows the optimizer to use multiple indexes per table, and merge the results returned by each index before further operation. Take the [above query](#applicable-scenarios) as an example, the generated execution plan is shown as follows: - -``` -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| id | estRows | task | access object | operator info | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| IndexMerge_11 | 2.00 | root | | | -| ├─IndexRangeScan_8(Build) | 1.00 | cop[tikv] | table:t, index:a(a) | range:[1,1], keep order:false, stats:pseudo | -| ├─IndexRangeScan_9(Build) | 1.00 | cop[tikv] | table:t, index:b(b) | range:[1,1], keep order:false, stats:pseudo | -| └─TableRowIDScan_10(Probe) | 2.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -``` - -The structure of the `IndexMerge` execution plan is similar to that of the `IndexLookUp`, both of which consist of index scans and full table scans. However, the index scan part of `IndexMerge` might include multiple `IndexScan`s. When the primary key index of the table is the integer type, index scans might even include `TableScan`. For example: - -{{< copyable "sql" >}} - -```sql -create table t(a int primary key, b int, c int, unique key(b)); -``` - -``` -Query OK, 0 rows affected (0.01 sec) -``` - -{{< copyable "sql" >}} - -```sql -explain select * from t where a = 1 or b = 1; -``` - -``` -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| id | estRows | task | access object | operator info | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -| IndexMerge_11 | 2.00 | root | | | -| ├─TableRangeScan_8(Build) | 1.00 | cop[tikv] | table:t | range:[1,1], keep order:false, stats:pseudo | -| ├─IndexRangeScan_9(Build) | 1.00 | cop[tikv] | table:t, index:b(b) | range:[1,1], keep order:false, stats:pseudo | -| └─TableRowIDScan_10(Probe) | 2.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | -+--------------------------------+---------+-----------+---------------------+---------------------------------------------+ -4 rows in set (0.01 sec) -``` - -Note that `IndexMerge` is used only when the optimizer cannot use a single index to access the table. If the condition in the query expression is `a = 1 and b = 1`, the optimizer uses the index `a` or the index `b`, instead of `IndexMerge`, to access the table. - -## Enable `IndexMerge` - -`IndexMerge` is disabled by default. Enable the `IndexMerge` in one of two ways: - -- Set the `tidb_enable_index_merge` system variable to `1`; -- Use the SQL Hint [`USE_INDEX_MERGE`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) in the query. - - > **Note:** - > - > The SQL Hint has a higher priority over the system variable. From e7fdadea7a4edc47ba4b50ac7b91058233303aa9 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Mon, 27 Jul 2020 10:32:23 +0800 Subject: [PATCH 10/12] Update whats-new-in-tidb-4.0.md --- whats-new-in-tidb-4.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/whats-new-in-tidb-4.0.md b/whats-new-in-tidb-4.0.md index 580804d7baef8..6829c1d746ede 100644 --- a/whats-new-in-tidb-4.0.md +++ b/whats-new-in-tidb-4.0.md @@ -56,7 +56,7 @@ TiUP is a new package manager tool introduced in v4.0 that is used to manage all - Add the `FLASHBACK` statement to support recovering the truncated tables. See [`Flashback Table`](/sql-statements/sql-statement-flashback-table.md) for details. - Support writing the intermediate results of Join and Sort to the local disk when you make queries, which avoids the Out of Memory (OOM) issue because the queries occupy excessive memory. This also improves system stability. - Optimize the output of `EXPLAIN` and `EXPLAIN ANALYZE`. More information is shown in the result, which improves troubleshooting efficiency. See [Explain Analyze](/sql-statements/sql-statement-explain-analyze.md) and [Explain](/sql-statements/sql-statement-explain.md) for details. -- Support using the Index Merge feature to access tables. When you make a query on a single table, the TiDB optimizer automatically reads multiple index data according to the query condition and makes a union of the result, which improves the performance of querying on a single table. See [Index Merge](/index-merge.md) for details. +- Support using the Index Merge feature to access tables. When you make a query on a single table, the TiDB optimizer automatically reads multiple index data according to the query condition and makes a union of the result, which improves the performance of querying on a single table. See [Index Merge](/query-execution-plan.md#indexmerge-example) for details. - Support the expression index feature (**experimental**). The expression index is also called the function-based index. When you create an index, the index fields do not have to be a specific column but can be an expression calculated from one or more columns. This feature is useful for quickly accessing the calculation-based tables. See [Expression index](/sql-statements/sql-statement-create-index.md) for details. - Support `AUTO_RANDOM` keys as an extended syntax for the TiDB columnar attribute (**experimental**). `AUTO_RANDOM` is designed to address the hotspot issue caused by the auto-increment column and provides a low-cost migration solution from MySQL for users who work with auto-increment columns. See [`AUTO_RANDOM` Key](/auto-random.md) for details. - Add system tables that provide information of cluster topology, configuration, logs, hardware, operating systems, and slow queries, which helps DBAs to quickly learn, analyze system metrics. See [SQL Diagnosis](/information-schema/information-schema-sql-diagnostics.md) for details. From fcdfa067df8948ae4e7a50300e1da7bfad2c2766 Mon Sep 17 00:00:00 2001 From: ireneontheway Date: Mon, 27 Jul 2020 11:43:48 +0800 Subject: [PATCH 11/12] Update query-execution-plan.md --- query-execution-plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index c44bb4e7fb160..768552b530a26 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -1,7 +1,7 @@ --- title: Understand the Query Execution Plan summary: Learn about the execution plan information returned by the `EXPLAIN` statement in TiDB. -aliases: ['/docs/dev/query-execution-plan/','/docs/dev/reference/performance/understanding-the-query-execution-plan/'] +aliases: ['/docs/dev/query-execution-plan/','/docs/dev/reference/performance/understanding-the-query-execution-plan/','/docs/dev/index-merge/','/docs/dev/reference/performance/index-merge/'] --- # Understand the Query Execution Plan From 729e2af273458cb8971f51ad69a6dc2377ae1956 Mon Sep 17 00:00:00 2001 From: ireneontheway <48651140+ireneontheway@users.noreply.github.com> Date: Mon, 27 Jul 2020 13:10:37 +0800 Subject: [PATCH 12/12] Apply suggestions from code review Co-authored-by: Keke Yi <40977455+yikeke@users.noreply.github.com> --- query-execution-plan.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/query-execution-plan.md b/query-execution-plan.md index 768552b530a26..09dcc996ae624 100644 --- a/query-execution-plan.md +++ b/query-execution-plan.md @@ -1,7 +1,7 @@ --- title: Understand the Query Execution Plan summary: Learn about the execution plan information returned by the `EXPLAIN` statement in TiDB. -aliases: ['/docs/dev/query-execution-plan/','/docs/dev/reference/performance/understanding-the-query-execution-plan/','/docs/dev/index-merge/','/docs/dev/reference/performance/index-merge/'] +aliases: ['/docs/dev/query-execution-plan/','/docs/dev/reference/performance/understanding-the-query-execution-plan/','/docs/dev/index-merge/','/docs/dev/reference/performance/index-merge/','/tidb/dev/index-merge'] --- # Understand the Query Execution Plan @@ -186,7 +186,7 @@ In the above example, the child node of the `TableReader_7` operator is `Selecti `IndexMerge` is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. -``` +```sql mysql> explain select * from t where a = 1 or b = 1; +-------------------------+----------+-----------+---------------+--------------------------------------+ | id | estRows | task | access object | operator info |