-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect row count estimation in TiDB 2.1.1 #8921
Comments
It is actually using index scan according to the pictures you pasted? Please confirm the plan, and provide statistics of table if it is really table scan by using: curl -G "http://${tidb-server-ip}:${tidb-server-status-port}/stats/dump/${db_name}/${table_name}" > ${table_name}_stats.json |
@eurekaka Yes it is showing Index scan but it is scanning all rows in 2.1.1 as seen in |
@Guru107 Does the table have changes between these 2 explains? seems like 2.1.1 uses |
@eurekaka actually the data is sensitive. I had changed the column names in the sql query so that's why I am not able to share. But it can be reproduced on any test table. Here are the steps I did to reproduce the issue.
I got the following output.
|
@Guru107 The column |
Yes, |
@Guru107 thanks, please provide the latest statistics information again, then we would check if the estimation count change is reasonable. |
@eurekaka Here's the table stats after |
@Guru107 Thanks. |
The behavior change is caused by these lines in 458 // If the `countAfterAccess` is less than `stats.RowCount`, there must be some inconsistent stats info.
459 // We prefer the `stats.RowCount` because it could use more stats info to calculate the selectivity.
460 if path.countAfterAccess < ds.stats.RowCount {
461 path.countAfterAccess = math.Min(ds.stats.RowCount/selectionFactor, float64(ds.statisticTable.Count))
462 } this part is different in 2.0.8 and 2.1.1. @Guru107 thanks for the report, we will try to improve it. |
@eurekaka Does this impact query performance? |
@Guru107 In your case, the query plan chosen is correct, so no performance impaction. |
The change is introduced in #7233, @lamxTyler PTAL if we should make some adjustment. |
The change is reasonable and needs no change. The root cause of the problem is the selectivity cannot process DNF now, refer https://github.com/pingcap/tidb/blob/master/statistics/selectivity.go#L249. |
So if filter is in DNF form, 460 if path.countAfterAccess < ds.stats.RowCount && notInDNF {
461 path.countAfterAccess = math.Min(ds.stats.RowCount/selectionFactor, float64(ds.statisticTable.Count))
462 } |
Then I think it is better to let selectivity also support DNF. |
Bug Report
Please answer these questions before submitting your issue. Thanks!
What did you do?
Create a table with a composite primary key.
Run query
explain select column1, column2 from `table1` where (column1=01092213 AND column2=1) OR (column1=01091904 AND column2=2) OR (column1=01091970 AND column2=2) OR (column1=01092706 AND column2=1) OR (column1=01092355 AND column2=1) OR (column1=01092578 AND column2=2) OR (column1=01092623 AND column2=4) OR (column1=01092651 AND column2=1) OR (column1=01092369 AND column2=2) OR (column1=01092365 AND column2=1) OR (column1=01092238 AND column2=2) OR (column1=01092799 AND column2=1) OR (column1=01092770 AND column2=3) OR (column1=01092363 AND column2=2) OR (column1=01092365 AND column2=2) OR (column1=01093579 AND column2=1) OR (column1=01093051 AND column2=2) OR (column1=01093032 AND column2=1) OR (column1=01093100 AND column2=2) OR (column1=01093534 AND column2=1) OR (column1=01093432 AND column2=1) OR (column1=01092914 AND column2=1) OR (column1=01092861 AND column2=2) OR (column1=01093054 AND column2=2) OR (column1=01093298 AND column2=1) OR (column1=01093293 AND column2=3);
Running
explain
on TiDB 2.0.8 returnsRunning
explain
on TiDB 2.1.1 returns a full table scanWhat version of TiDB are you using (
tidb-server -V
or runselect tidb_version();
on TiDB)?TiDB v2.1.1
The text was updated successfully, but these errors were encountered: