Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: fix incorrect estRows with global index and json column #55842

Merged
merged 9 commits into from
Sep 6, 2024

Conversation

hawkingrei
Copy link
Member

@hawkingrei hawkingrei commented Sep 4, 2024

What problem does this PR solve?

Issue Number: close #55818

Problem Summary:

What changed and how does it work?

Because the JSON column will be without stats by default, the row count will be 0. But when calculating selectivity, we cannot divide by zero. it will get NaN row count.

image

why is the result wrong?

image

It will return NaN from Selectivity and break all stats info by ds.TableStats.Scale(selectivity)

image image

Final, The problem will be created at the joinReorderGreedySolver.solve-> constructConnectedJoinTree

image

checkConnectionAndMakeJoin will create a right join plan. but when to calculate the cost, it will get NaN because of the pollute stats info. we cannot get best join because of Nan > math.MaxFloat64. makeBushyJoin will not consider the EqCondtion. Many condition will lose. This is the reason why it will get wrong result.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

Due to incorrect statistics, the filter condition was completely removed.

before



+---------------------------+---------+-----------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                        | estRows | task      | access object   | operator info                                                                                                                                                                            |
+---------------------------+---------+-----------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_12             | NaN     | root      |                 | 1->Column#18                                                                                                                                                                             |
| └─HashJoin_13             | NaN     | root      |                 | CARTESIAN inner join                                                                                                                                                                     |
|   ├─TableReader_19(Build) | 38.00   | root      |                 | data:TableFullScan_18                                                                                                                                                                    |
|   │ └─TableFullScan_18    | 38.00   | cop[tikv] | table:t61a85298 | keep order:false, stats:pseudo                                                                                                                                                           |
|   └─TableReader_17(Probe) | NaN     | root      | partition:all   | data:Selection_16                                                                                                                                                                        |
|     └─Selection_16        | NaN     | cop[tikv] |                 | or(json_memberof(cast(16739493649928310215, json BINARY), test.tceb7972c.col_17), not(istrue_with_null(json_contains(test.tceb7972c.col_17, cast("6019730272580550835", json BINARY))))) |
|       └─TableFullScan_15  | 12.00   | cop[tikv] | table:tceb7972c | keep order:false, stats:partial[col_17:unInitialized]                                                                                                                                    |
+---------------------------+---------+-----------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

after

id  estRows task  access object operator info
Projection_11 6.00  root    1->Column#18
└─HashJoin_13 6.00  root    inner join, equal:[eq(test.tceb7972c.col_19, Column#19)]
  ├─TableReader_16(Build) 4.80  root  partition:all data:Selection_15
  │ └─Selection_15  4.80  cop[tikv]   or(json_memberof(cast(16739493649928310215, json BINARY), test.tceb7972c.col_17), not(istrue_with_null(json_contains(test.tceb7972c.col_17, cast("6019730272580550835", json BINARY)))))
  │   └─TableFullScan_14  6.00  cop[tikv] table:tceb7972c keep order:false, stats:partial[col_17:missing]
  └─Projection_17(Probe)  10000.00  root    cast(test.t61a85298.col_71, double BINARY)->Column#19
    └─TableReader_19  10000.00  root    data:TableFullScan_18
      └─TableFullScan_18  10000.00  cop[tikv] table:t61a85298 keep order:false, stats:pseudo
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/needs-triage-completed and removed do-not-merge/needs-linked-issue labels Sep 4, 2024
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.7427%. Comparing base (2b54db5) to head (4382573).
Report is 38 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #55842         +/-   ##
=================================================
- Coverage   72.8695%   56.7427%   -16.1269%     
=================================================
  Files          1588       1759        +171     
  Lines        443554     647686     +204132     
=================================================
+ Hits         323216     367515      +44299     
- Misses       100456     255197     +154741     
- Partials      19882      24974       +5092     
Flag Coverage Δ
integration 38.6513% <100.0000%> (?)
unit 72.0245% <100.0000%> (+0.0652%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9567% <ø> (ø)
parser ∅ <ø> (∅)
br 51.8079% <ø> (+6.3936%) ⬆️

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 4, 2024
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/needs-tests-checked do-not-merge/needs-triage-completed labels Sep 4, 2024
@hawkingrei hawkingrei changed the title planner: fix incorrect estRows and result with global index planner: fix incorrect estRows with global index and json column Sep 4, 2024
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 4, 2024
@hawkingrei
Copy link
Member Author

/retest

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause here is that the global index (idx_10 and idx_9) here is not analyzed actually, but they are marked as analyzed (according to ColAndIdxExistenceMap) and fully loaded (according to StatsLoadedStatus).
So we will try to use this here, then the idxStats.TotalRowCount() is 0 in GetScaledRealtimeAndModifyCnt(), causing the problem.

@hawkingrei
Copy link
Member Author

And I think we can't close the issue in this PR, right? Since the wrong result is not fixed.

The wrong result has been fixed.

BTW, col_17 is the JSON column. We cannot analyze it by default. so the index is related with col_17 will be without stats.

@time-and-fate
Copy link
Member

MV index has stats, but idx_10 and idx_9 don't have stats here. It's likely because they are global index. I think the root cause is still that the not analyzed stats are marked analyzed and fully loaded.
For a temporary fix, I think we can add a check for idxStats.TotalRowCount() == 0 in GetScaledRealtimeAndModifyCnt().

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Member Author

MV index has stats, but idx_10 and idx_9 don't have stats here. It's likely because they are global index. I think the root cause is still that the not analyzed stats are marked analyzed and fully loaded. For a temporary fix, I think we can add a check for idxStats.TotalRowCount() == 0 in GetScaledRealtimeAndModifyCnt().

I have updated it again.

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Member Author

/retest

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore tidb_enable_global_index after the tests.

Copy link

ti-chi-bot bot commented Sep 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qw4990, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented Sep 6, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-05 06:18:51.084216555 +0000 UTC m=+511655.602269493: ☑️ agreed by qw4990.
  • 2024-09-06 10:45:07.042135536 +0000 UTC m=+7576.782559475: ☑️ agreed by time-and-fate.

@hawkingrei
Copy link
Member Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2024
Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>
@hawkingrei
Copy link
Member Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2024
@hawkingrei
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit dd18083 into pingcap:master Sep 6, 2024
24 checks passed
@hawkingrei
Copy link
Member Author

/cherrypick release-8.1
/cherrypick release-7.5

@hawkingrei hawkingrei deleted the 55818 branch September 6, 2024 14:05
@ti-chi-bot
Copy link
Member

@hawkingrei: new pull request created to branch release-7.5: #55923.

In response to this:

/cherrypick release-8.1
/cherrypick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Sep 6, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Sep 6, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

@hawkingrei: new pull request created to branch release-8.1: #55924.

In response to this:

/cherrypick release-8.1
/cherrypick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

breezewish added a commit to breezewish/tidb that referenced this pull request Sep 8, 2024
* origin/master: (33 commits)
  build(deps): bump github.com/prometheus/common from 0.55.0 to 0.57.0 (pingcap#55762)
  build(deps): bump golang.org/x/sys from 0.24.0 to 0.25.0 (pingcap#55894)
  planner: fix incorrect estRows with global index and json column (pingcap#55842)
  ddl, stat: switch to new struct for add/truncate/drop partition (pingcap#55893)
  planner: hide instance plan cache eviction log if no plan is evicted (pingcap#55918)
  expression: support tidb encode key function (pingcap#51678)
  planner: fix incorrect maintenance of `handleColHelper` for recursive CTE (pingcap#55732)
  executor: some code refine of hash join v2 (pingcap#55887)
  infoschema, meta: fix wrong auto id after `rename table` (pingcap#55847)
  ddl/ingest: set `minCommitTS` when detect remote duplicate keys (pingcap#55588)
  planner: move index advisor into the kernel (pingcap#55874)
  ddl, stats: switch to new struct for create/truncate table (pingcap#55811)
  executor: avoid new small objects in probe stage of hash join v2 (pingcap#55855)
  *: Add tidbCPU/tikvCPU into system tables (pingcap#55455)
  ddl: use static contexts in `NewReorgCopContext` (pingcap#55823)
  executor: fix index out of range bug in hash join v2 (pingcap#55864)
  executor: record index usage for the clustered primary keys (pingcap#55602)
  domain: load all non-public tables into infoschema (pingcap#55142)
  test: fix unstable TestShowViewPriv (pingcap#55868)
  ttl: add `varbinary` case for `TestSplitTTLScanRangesWithBytes` (pingcap#55863)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incorrect estRows and result using CTE
4 participants