Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](scan) Support pushdown execute expr ctx #15917

Merged
merged 5 commits into from
Mar 10, 2023

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Jan 13, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

motivation

In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

  1. Read part of the column first, and calculate the row ids with a simple push-down predicate.
  2. Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.

This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

  1. Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
  2. Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
  3. Use row ids to read the remaining columns and pass them to the scanner.

expected

  1. select a,b,c,... from tbl where func(a) = 1. The performance will be significantly improved, and the effect depends on the number of selected columns and the filtering rate.

example: 10 times better performance

SELECT * FROM hits WHERE UPPER(URL)="HTTP://SAMARA.IRR.RU/CATALOG_GOOGLETBR%26AD%3D278885%26BT%3D430001216";
  1. select a from tbl where func(a) = 1. Performance is expected to remain the same, as it behaves the same as before.
    (In the test, there is a 2% loss in performance. I found that VExprContext::filter_block performs differently in different locations, which may be related to cache miss, but the tool did not detect it)

  2. clickbench test

test cost(s)
default 134
set enable_remaining_expr_pushdown = true; 110.25

default:

query1: 0.03,0.03,0.03
query2: 0.11,0.04,0.04
query3: 0.09,0.07,0.07
query4: 0.22,0.08,0.07
query5: 0.66,0.62,0.54
query6: 0.97,0.90,0.93
query7: 0.02,0.01,0.01
query8: 0.05,0.05,0.04
query9: 1.55,1.38,1.41
query10: 1.98,1.93,1.94
query11: 0.36,0.33,0.35
query12: 0.37,0.34,0.38
query13: 0.61,0.63,0.59
query14: 1.46,1.46,1.44
query15: 0.84,0.85,0.88
query16: 0.51,0.48,0.47
query17: 1.78,1.96,1.82
query18: 0.45,0.43,0.42
query19: 4.25,4.25,4.19
query20: 0.01,0.02,0.01
query21: 1.20,0.31,0.31
query22: 0.12,0.11,0.11
query23: 1.13,0.29,0.28
query24: 8.33,9.99,9.83
query25: 0.13,0.10,0.11
query26: 0.10,0.10,0.11
query27: 0.11,0.09,0.11
query28: 0.48,0.37,0.38
query29: 4.74,4.63,4.64
query30: 1.95,1.98,1.91
query31: 0.43,0.38,0.41
query32: 0.61,0.59,0.58
query33: 4.04,4.03,3.94
query34: 4.52,3.86,3.94
query35: 4.07,3.63,3.79
query36: 1.41,1.43,1.37
query37: 0.07,0.06,0.05
query38: 0.03,0.03,0.03
query39: 0.02,0.02,0.03
query40: 0.15,0.13,0.14
query41: 0.04,0.02,0.03
query42: 0.03,0.02,0.02
query43: 0.04,0.04,0.03

set enable_remaining_expr_pushdown = true;

query1: 0.02,0.02,0.01
query2: 0.03,0.04,0.03
query3: 0.06,0.07,0.07
query4: 0.07,0.07,0.07
query5: 0.67,0.60,0.56
query6: 0.87,0.84,0.84
query7: 0.01,0.01,0.01
query8: 0.05,0.03,0.04
query9: 1.48,1.36,1.41
query10: 2.02,1.88,1.90
query11: 0.32,0.44,0.35
query12: 0.35,0.34,0.42
query13: 0.59,0.70,0.64
query14: 1.46,1.48,1.44
query15: 0.81,0.92,0.82
query16: 0.49,0.46,0.46
query17: 1.98,1.98,1.92
query18: 0.43,0.42,0.44
query19: 4.31,4.29,4.38
query20: 0.01,0.01,0.01
query21: 0.30,0.31,0.32
query22: 0.11,0.12,0.11
query23: 0.26,0.27,0.26
query24: 0.44,0.44,0.43
query25: 0.11,0.10,0.09
query26: 0.11,0.11,0.11
query27: 0.10,0.10,0.11
query28: 0.37,0.38,0.39
query29: 4.71,4.59,4.62
query30: 2.04,2.04,1.95
query31: 0.43,0.39,0.38
query32: 0.61,0.63,0.58
query33: 3.98,4.08,3.93
query34: 4.21,3.97,4.08
query35: 3.89,3.98,3.56
query36: 1.36,1.32,1.35
query37: 0.06,0.05,0.06
query38: 0.03,0.02,0.03
query39: 0.02,0.02,0.02
query40: 0.15,0.14,0.14
query41: 0.02,0.02,0.02
query42: 0.02,0.02,0.02
query43: 0.04,0.03,0.02

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from 9eb8d18 to a874c3e Compare January 15, 2023 17:06
@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from a874c3e to 0ca40dc Compare January 16, 2023 01:40
@hello-stephen
Copy link
Contributor

hello-stephen commented Jan 16, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.17 seconds
stream load tsv: 480 seconds loaded 74807831229 Bytes, about 148 MB/s
stream load json: 41 seconds loaded 2358488459 Bytes, about 54 MB/s
stream load orc: 74 seconds loaded 1101869774 Bytes, about 14 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230309135950_clickbench_pr_111585.html

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch 2 times, most recently from c2084b9 to c7f6225 Compare January 16, 2023 19:49
@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from c7f6225 to dcf698e Compare January 30, 2023 11:10
@github-actions github-actions bot added the area/sql/function Issues or PRs related to the SQL functions label Jan 30, 2023
@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from dcf698e to 8f86ee4 Compare February 9, 2023 12:20
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch 2 times, most recently from 60d573d to 9a5398c Compare February 10, 2023 04:33
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch 2 times, most recently from 176f705 to 8ef09d0 Compare February 17, 2023 06:55
@xinyiZzz
Copy link
Contributor Author

run p0

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch 2 times, most recently from 7f1b9e2 to 72f0935 Compare February 20, 2023 17:36
@xinyiZzz
Copy link
Contributor Author

run buildall

@xinyiZzz
Copy link
Contributor Author

run arm

@xinyiZzz
Copy link
Contributor Author

run p0

@xinyiZzz
Copy link
Contributor Author

run buildall

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from 72f0935 to 8d67ce7 Compare February 21, 2023 03:52
@xinyiZzz
Copy link
Contributor Author

run buildall

@xinyiZzz
Copy link
Contributor Author

run clickbench

1 similar comment
@xinyiZzz
Copy link
Contributor Author

run clickbench

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from 504c929 to 07a2765 Compare March 6, 2023 20:11
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 6, 2023

run buildall

yiguolei
yiguolei previously approved these changes Mar 7, 2023
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2023

PR approved by anyone and no changes requested.

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from 07a2765 to f19a253 Compare March 7, 2023 16:18
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 7, 2023
@yiguolei
Copy link
Contributor

yiguolei commented Mar 7, 2023

./run buildall

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from f19a253 to 267b620 Compare March 8, 2023 11:02
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 8, 2023

./run buildall

1 similar comment
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 8, 2023

./run buildall

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 8, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from 858fd7b to de817b8 Compare March 9, 2023 01:38
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 9, 2023

run buildall

@xinyiZzz xinyiZzz force-pushed the 20221228_expr_pushdown_10 branch from de817b8 to deca217 Compare March 9, 2023 02:47
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 9, 2023

run buildall

yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
function pushdown: apache#10355
NGram BloomFilter Index apply like pushdown: apache#11579

Enabled by default, make sure it stays active.

If NGram BloomFilter Index is not used, this like pushdown can be replaced by apache#15917, which can push down all expressions including like.
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Mar 9, 2023

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit f9baf9c into apache:master Mar 10, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql/function Issues or PRs related to the SQL functions area/vectorization kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants