Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to enhance cardinality estimation of like/regexp using the TopN in stats #36209

Closed
time-and-fate opened this issue Jul 14, 2022 · 0 comments · Fixed by #36210
Closed
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@time-and-fate
Copy link
Member

time-and-fate commented Jul 14, 2022

Background

Currently, we meet some cases where the cardinality estimation of string matching functions (like and regexp) is bad. When the pattern is complex, we are unable to build the function into Ranges to estimate it with higher precision.

Enhancement

To mitigate this, we can use the statistics to estimate selectivity. Specifically, we can (1) evaluate the expressions with values in the TopN, and (2) consider the NULL count by evaluating the expressions with NULL.
Currently, for convenience and due to the limitation of current statistics, we only apply this optimization on expressions that only involve one column, only when this column is not a non-binary collation string column, and statistics on this column (or a single column index on this column) is available and it's ver2,

And we can also provide a variable to control the default selectivity of the string matching functions.

Currently, we will make the default behavior unchanged, and the user can use the variable to control the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
1 participant