From ac593523debf018603046640d367faf72d21252d Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Thu, 21 May 2020 21:42:57 +0800 Subject: [PATCH 1/7] perf-tuning: add docs for DISTINCT optimizations --- agg-distinct-optimization.md | 52 +++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index e3056c59fa42..84c0ed29edec 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -3,4 +3,54 @@ title: Distinct 优化 category: performance --- -# Distinct 优化 \ No newline at end of file +# Distinct 优化 + +## 简单 DISTINCT + +通常来说, 简单的 DISTINCT 会被优化成 GROUP BY 来执行。例如: +```sql +mysql> explain select DISTINCT a from t; ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +| HashAgg_6 | 2.40 | root | | group by:test.t.a, funcs:firstrow(test.t.a)->test.t.a | +| └─TableReader_11 | 3.00 | root | | data:TableFullScan_10 | +| └─TableFullScan_10 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++--------------------------+---------+-----------+---------------+-------------------------------------------------------+ +3 rows in set (0.00 sec) +``` + +TODO: 当 LIMIT __row_count__ 和 DISTINCT 组合使用时, TiDB 应立即返回 __row_count__ 个不同的行。 [#15284](https://github.com/pingcap/tidb/issues/15284) + +## 聚合函数 DISTINCT + +通常来说,带有 DISTINCT 的聚合函数会单线程的在 TiDB 侧执行。 +使用系统变量 [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) 或者 TiDB 的配置项 [distinct-agg-push-down](/tidb-configuration-file.md#distinct-agg-push-down) 控制优化器是否执行带有 `Distinct` 的聚合函数(比如 `select count(distinct a) from t`)下推到 Coprocessor 的优化操作。 + +在以下示例中,`tidb_opt_distinct_agg_push_down` 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 `disctinct`。`tidb_opt_distinct_agg_push_down` 开启后, `distinct a` 被下推到了 Coprocessor,在 `HashAgg_5` 里新增里一个 `group by` 列 `test.t.a`。 + +```sql +mysql> desc select count(distinct a) from test.t; ++-------------------------+----------+-----------+---------------+------------------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+----------+-----------+---------------+------------------------------------------+ +| StreamAgg_6 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#4 | +| └─TableReader_10 | 10000.00 | root | | data:TableFullScan_9 | +| └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+----------+-----------+---------------+------------------------------------------+ +3 rows in set (0.01 sec) + +mysql> set session tidb_opt_distinct_agg_push_down = 1; +Query OK, 0 rows affected (0.00 sec) + +mysql> desc select count(distinct a) from test.t; ++---------------------------+----------+-----------+---------------+------------------------------------------+ +| id | estRows | task | access object | operator info | ++---------------------------+----------+-----------+---------------+------------------------------------------+ +| HashAgg_8 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#3 | +| └─TableReader_9 | 1.00 | root | | data:HashAgg_5 | +| └─HashAgg_5 | 1.00 | cop[tikv] | | group by:test.t.a, | +| └─TableFullScan_7 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++---------------------------+----------+-----------+---------------+------------------------------------------+ +4 rows in set (0.00 sec) +``` From c2f79e025d0981b873cf22d6c6f9e377e8a18759 Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Thu, 21 May 2020 21:46:59 +0800 Subject: [PATCH 2/7] lint --- agg-distinct-optimization.md | 1 + 1 file changed, 1 insertion(+) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 84c0ed29edec..50cb7f0d48a3 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -8,6 +8,7 @@ category: performance ## 简单 DISTINCT 通常来说, 简单的 DISTINCT 会被优化成 GROUP BY 来执行。例如: + ```sql mysql> explain select DISTINCT a from t; +--------------------------+---------+-----------+---------------+-------------------------------------------------------+ From 7d2ba23cf582c38eae23b9618dea5d7c972dd16f Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Fri, 22 May 2020 15:23:55 +0800 Subject: [PATCH 3/7] address comment --- agg-distinct-optimization.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 50cb7f0d48a3..9dc12f4ea913 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -7,7 +7,7 @@ category: performance ## 简单 DISTINCT -通常来说, 简单的 DISTINCT 会被优化成 GROUP BY 来执行。例如: +通常简单的 `DISTINCT` 会被优化成 GROUP BY 来执行。例如: ```sql mysql> explain select DISTINCT a from t; @@ -21,14 +21,14 @@ mysql> explain select DISTINCT a from t; 3 rows in set (0.00 sec) ``` -TODO: 当 LIMIT __row_count__ 和 DISTINCT 组合使用时, TiDB 应立即返回 __row_count__ 个不同的行。 [#15284](https://github.com/pingcap/tidb/issues/15284) +TODO([#15284](https://github.com/pingcap/tidb/issues/15284)): 当 LIMIT __row_count__ 和 `DISTINCT` 组合使用时, TiDB 应立即返回 __row_count__ 个不同的行。 ## 聚合函数 DISTINCT -通常来说,带有 DISTINCT 的聚合函数会单线程的在 TiDB 侧执行。 -使用系统变量 [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) 或者 TiDB 的配置项 [distinct-agg-push-down](/tidb-configuration-file.md#distinct-agg-push-down) 控制优化器是否执行带有 `Distinct` 的聚合函数(比如 `select count(distinct a) from t`)下推到 Coprocessor 的优化操作。 +通常来说,带有 `DISTINCT` 的聚合函数会单线程的在 TiDB 侧执行。 +使用系统变量 [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) 或者 TiDB 的配置项 [distinct-agg-push-down](/tidb-configuration-file.md#distinct-agg-push-down) 控制优化器是否执行带有 `DISTINCT` 的聚合函数(比如 `select count(distinct a) from t`)下推到 Coprocessor 的优化操作。 -在以下示例中,`tidb_opt_distinct_agg_push_down` 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 `disctinct`。`tidb_opt_distinct_agg_push_down` 开启后, `distinct a` 被下推到了 Coprocessor,在 `HashAgg_5` 里新增里一个 `group by` 列 `test.t.a`。 +在以下示例中,`tidb_opt_distinct_agg_push_down` 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 `disctinct`。`tidb_opt_distinct_agg_push_down` 开启后, `distinct a` 被下推到了 Coprocessor,在 `HashAgg_5` 里新增了一个 `group by` 列 `test.t.a`。 ```sql mysql> desc select count(distinct a) from test.t; From 2f0f93ec3de8a336593c3ae699958435bcf407a9 Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Fri, 22 May 2020 15:59:19 +0800 Subject: [PATCH 4/7] Update agg-distinct-optimization.md Co-authored-by: Lilian Lee --- agg-distinct-optimization.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 9dc12f4ea913..e74ae62f83cd 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -21,8 +21,6 @@ mysql> explain select DISTINCT a from t; 3 rows in set (0.00 sec) ``` -TODO([#15284](https://github.com/pingcap/tidb/issues/15284)): 当 LIMIT __row_count__ 和 `DISTINCT` 组合使用时, TiDB 应立即返回 __row_count__ 个不同的行。 - ## 聚合函数 DISTINCT 通常来说,带有 `DISTINCT` 的聚合函数会单线程的在 TiDB 侧执行。 From 6c34db75146b4b185f3bb0996fc36f3b7539f998 Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Fri, 22 May 2020 16:00:17 +0800 Subject: [PATCH 5/7] Update agg-distinct-optimization.md Co-authored-by: Lilian Lee --- agg-distinct-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index e74ae62f83cd..d32d4ac64465 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -26,7 +26,7 @@ mysql> explain select DISTINCT a from t; 通常来说,带有 `DISTINCT` 的聚合函数会单线程的在 TiDB 侧执行。 使用系统变量 [`tidb_opt_distinct_agg_push_down`](/tidb-specific-system-variables.md#tidb_opt_distinct_agg_push_down) 或者 TiDB 的配置项 [distinct-agg-push-down](/tidb-configuration-file.md#distinct-agg-push-down) 控制优化器是否执行带有 `DISTINCT` 的聚合函数(比如 `select count(distinct a) from t`)下推到 Coprocessor 的优化操作。 -在以下示例中,`tidb_opt_distinct_agg_push_down` 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 `disctinct`。`tidb_opt_distinct_agg_push_down` 开启后, `distinct a` 被下推到了 Coprocessor,在 `HashAgg_5` 里新增了一个 `group by` 列 `test.t.a`。 +在以下示例中,`tidb_opt_distinct_agg_push_down` 开启前,TiDB 需要从 TiKV 读取所有数据,并在 TiDB 侧执行 `disctinct`。`tidb_opt_distinct_agg_push_down` 开启后,`distinct a` 被下推到了 Coprocessor,在 `HashAgg_5` 里新增了一个 `group by` 列 `test.t.a`。 ```sql mysql> desc select count(distinct a) from test.t; From 2e790239c56812079f20d99f007259c58dd059bd Mon Sep 17 00:00:00 2001 From: Feng Liyuan Date: Fri, 22 May 2020 16:15:51 +0800 Subject: [PATCH 6/7] add summary --- agg-distinct-optimization.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index 9dc12f4ea913..5e4711d2c33c 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -5,6 +5,8 @@ category: performance # Distinct 优化 +这一节讨论可以用于 `DISTINCT` 的优化。 + ## 简单 DISTINCT 通常简单的 `DISTINCT` 会被优化成 GROUP BY 来执行。例如: From cb3c86eca0ffd120bee9ca61d170d9c6abfa0bbb Mon Sep 17 00:00:00 2001 From: TomShawn <41534398+TomShawn@users.noreply.github.com> Date: Fri, 22 May 2020 16:23:20 +0800 Subject: [PATCH 7/7] Update agg-distinct-optimization.md --- agg-distinct-optimization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/agg-distinct-optimization.md b/agg-distinct-optimization.md index cfcf357a6377..6d8f763fd75a 100644 --- a/agg-distinct-optimization.md +++ b/agg-distinct-optimization.md @@ -5,7 +5,7 @@ category: performance # Distinct 优化 -这一节讨论可以用于 `DISTINCT` 的优化。 +本文档介绍可用于 `DISTINCT` 的优化,包括简单 `DISTINCT` 和聚合函数 `DISTINCT` 的优化。 ## 简单 DISTINCT