Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for analyze predicate/user-specified columns #27828

Open
11 of 14 tasks
xuyifangreeneyes opened this issue Sep 6, 2021 · 4 comments
Open
11 of 14 tasks

Tracking issue for analyze predicate/user-specified columns #27828

xuyifangreeneyes opened this issue Sep 6, 2021 · 4 comments
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@xuyifangreeneyes
Copy link
Contributor

xuyifangreeneyes commented Sep 6, 2021

Proposal

related issue: #27358
design doc: #28878

Currently ANALYZ TABLE collects the statistics for all columns. However, only some columns' statistics are used in making query plans while others are not used. If some column occurs in filter conditions such as where/join conditions, its statistics will be used for calculating selectivity. We call such column predicate column. If we only collect the statistics for the predicate columns, the cost of ANALYZE TABLE can be reduced.

Hence we want to do the following things:

  1. track which columns are predicate columns
  2. support only analyze predicate columns
  3. support only analyze user-specified columns.

Development Task

@xuyifangreeneyes xuyifangreeneyes added the type/enhancement The issue or PR belongs to an enhancement. label Sep 6, 2021
@xuyifangreeneyes
Copy link
Contributor Author

/sig planner

@ti-chi-bot ti-chi-bot added the sig/planner SIG: Planner label Sep 6, 2021
@xuyifangreeneyes
Copy link
Contributor Author

/assign

@feitian124
Copy link
Contributor

Did some research and find in MySQL, a predicate is a Boolean expression that evaluates to TRUE, FALSE, or UNKNOWN.
together with predicate definition, predicate column is much clean now.

Is this an official terminology and added to docs?

@xuyifangreeneyes
Copy link
Contributor Author

Did some research and find in MySQL, a predicate is a Boolean expression that evaluates to TRUE, FALSE, or UNKNOWN. together with predicate definition, predicate column is much clean now.

Is this an official terminology and added to docs?

The design doc in #28878 has description on predicate column. You can also take a look at redshift's doc/blog on predicate columns. Here are the links:

  1. https://aws.amazon.com/blogs/big-data/collect-data-statistics-up-to-5x-faster-by-analyzing-only-predicate-columns-with-amazon-redshift/
  2. https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants