Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

time-and-fate · 2022-07-14T06:39:38Z

Background

Currently, we meet some cases where the cardinality estimation of string matching functions (like and regexp) is bad. When the pattern is complex, we are unable to build the function into Ranges to estimate it with higher precision.

Enhancement

To mitigate this, we can use the statistics to estimate selectivity. Specifically, we can (1) evaluate the expressions with values in the TopN, and (2) consider the NULL count by evaluating the expressions with NULL.
Currently, for convenience and due to the limitation of current statistics, we only apply this optimization on expressions that only involve one column, only when this column is not a non-binary collation string column, and statistics on this column (or a single column index on this column) is available and it's ver2,

And we can also provide a variable to control the default selectivity of the string matching functions.

Currently, we will make the default behavior unchanged, and the user can use the variable to control the feature.

The text was updated successfully, but these errors were encountered:

…n for string matching functions (#36210) close #36209

…tions estimation (#40338) ref #36209

time-and-fate added the type/enhancement The issue or PR belongs to an enhancement. label Jul 14, 2022

time-and-fate self-assigned this Jul 14, 2022

time-and-fate mentioned this issue Jul 14, 2022

statistics, sessionctx: introduce topn assisted cardinality estimation for string matching functions #36210

Merged

12 tasks

ti-chi-bot closed this as completed in #36210 Jul 20, 2022

ti-chi-bot pushed a commit that referenced this issue Jul 20, 2022

statistics, sessionctx: introduce topn assisted cardinality estimatio…

f0717df

…n for string matching functions (#36210) close #36209

This was referenced Feb 28, 2023

statistics: use histogram buckets bounds to enhance string match functions estimation #40338

Merged

statistics: use histogram buckets bounds to enhance string match functions estimation #40340

Merged

ti-chi-bot pushed a commit that referenced this issue Feb 28, 2023

statistics: use histogram buckets bounds to enhance string match func…

96e345d

…tions estimation (#40338) ref #36209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

time-and-fate commented Jul 14, 2022 •

edited

Loading

Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

Comments

time-and-fate commented Jul 14, 2022 • edited Loading

Background

Enhancement

time-and-fate commented Jul 14, 2022 •

edited

Loading