Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix the estimation error on normal column when collation enabled (#18104) #18311

Merged
merged 9 commits into from
Sep 8, 2020

Conversation

ti-srebot
Copy link
Contributor

@ti-srebot ti-srebot commented Jul 1, 2020

cherry-pick #18104 to release-4.0


What problem does this PR solve?

Issue Number: close #14689

Problem Summary:

For index, its key is already generated by the sort key by the collation information. And when we query the index estimation, we also use EncodeKey which will first convert the column value to sort key then encode it. So it's automatically correct for index without any additional change.

But for a column, when sampling we use its original value thus when query it with count-min sketch or histogram. We'll get a wrong answer since the order information is lost.

What is changed and how it works?

What's Changed:

For tikv part, use the sort key generated by the collation as the sampling data. tikv/tikv#8105
And when querying, convert to sort key first.

How it Works:

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test -> Unit test is removed. mock-tikv doesn't support the new row format. It's a little hard to add test for it. But unistore is not included in 4.0
  • Integration test

Side effects

  • Performance regression
    • Consumes more CPU

Release note

  • Fix the row count estimation error for a non-index column with collation enabled.

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@winoros please accept the invitation then you can push to the cherry-pick pull requests.
https://github.com/ti-srebot/tidb/invitations

@zz-jason zz-jason modified the milestones: v4.0.2, v4.0.3 Jul 10, 2020
Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many unnecessary changes in this PR, please fix them @winoros

@qw4990 qw4990 self-requested a review July 10, 2020 08:23
@winoros winoros modified the milestones: v4.0.3, v4.0.4 Jul 15, 2020
@imtbkcat imtbkcat modified the milestones: v4.0.4, v4.0.5, v4.0.6 Jul 28, 2020
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winoros please resolve conflicts and fix CI.

Copy link
Member

@wjhuang2016 wjhuang2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 7, 2020
Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 7, 2020
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Sep 7, 2020
@zz-jason
Copy link
Member

zz-jason commented Sep 7, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Sep 7, 2020
@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 19834
  • 19835

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@ti-srebot merge failed.

@winoros
Copy link
Member

winoros commented Sep 8, 2020

Tested with tikv/tikv#8620

MySQL [test]> create table t(a varchar(20) collate utf8mb4_general_ci);
Query OK, 0 rows affected (0.10 sec)

MySQL [test]> insert into t values('aaa'), ('bbb'), ('AAA'), ('BBB');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

MySQL [test]> analyze table t;
Query OK, 0 rows affected (0.05 sec)

MySQL [test]> explain select * from t where a='aÄa';
+-------------------------+---------+-----------+---------------+----------------------+
| id                      | estRows | task      | access object | operator info        |
+-------------------------+---------+-----------+---------------+----------------------+
| TableReader_7           | 2.00    | root      |               | data:Selection_6     |
| └─Selection_6           | 2.00    | cop[tikv] |               | eq(test.t.a, "aÄa")  |
|   └─TableFullScan_5     | 4.00    | cop[tikv] | table:t       | keep order:false     |
+-------------------------+---------+-----------+---------------+----------------------+
3 rows in set (0.00 sec)

MySQL [test]> show stats_buckets;
+---------+------------+----------------+-------------+----------+-----------+-------+---------+-------------+-------------+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Bucket_id | Count | Repeats | Lower_Bound | Upper_Bound |
+---------+------------+----------------+-------------+----------+-----------+-------+---------+-------------+-------------+
| test    | t          |                | a           |        0 |         0 |     2 |       2 |  A A A      |  A A A      |
| test    | t          |                | a           |        0 |         1 |     4 |       2 |  B B B      |  B B B      |
+---------+------------+----------------+-------------+----------+-----------+-------+---------+-------------+-------------+
2 rows in set (0.00 sec)

@winoros
Copy link
Member

winoros commented Sep 8, 2020

/run-all-tests

@winoros
Copy link
Member

winoros commented Sep 8, 2020

It's safe to merge this one before the related tikv pr.

@AilinKid
Copy link
Contributor

AilinKid commented Sep 8, 2020

Is it a 4.0.6 release blocker?

@winoros
Copy link
Member

winoros commented Sep 8, 2020

@AilinKid The tikv side will be merged. So this one should be merged too.

@winoros winoros merged commit ddc6c0d into pingcap:release-4.0 Sep 8, 2020
@winoros winoros deleted the release-4.0-ceec9d9c63c8 branch September 8, 2020 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT3 The PR has already had 3 LGTM. type/bugfix This PR fixes a bug. type/4.0-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants