Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add projection push down to csv and parquet #4113

Merged
merged 2 commits into from
Aug 21, 2024
Merged

Conversation

andyfengHKU
Copy link
Contributor

@andyfengHKU andyfengHKU commented Aug 20, 2024

Description

Add projection push down when we scan from csv and parquet.

For parquet, the improvement is linear

400 ms when scanning single column & 1.2s when scanning three columns

For csv, the improvement is less noticeable as we still have to read row by row.

@acquamarin should propagate this to scan relational table.
@mxwli should propagate this to scan python object.

Simply propagate columnSkips flag in TableFuncBindData to each reader.

Fixes # (issue)

Contributor agreement

Copy link

Benchmark Result

Master commit hash: 4020110b4c018ac6789bdb31f5af64ee8dc0469f
Branch commit hash: eba3d602ba8259a53c76e3aa64ea4dacfb4cbc23

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 663.34 671.95 -8.61 (-1.28%)
aggregation q28 11657.74 11910.39 -252.65 (-2.12%)
filter q14 145.33 151.38 -6.04 (-3.99%)
filter q15 145.84 154.79 -8.95 (-5.78%)
filter q16 324.08 328.20 -4.12 (-1.26%)
filter q17 466.42 476.84 -10.42 (-2.19%)
filter q18 1987.44 1900.48 86.96 (4.58%)
fixed_size_expr_evaluator q07 568.05 563.22 4.82 (0.86%)
fixed_size_expr_evaluator q08 774.12 775.78 -1.66 (-0.21%)
fixed_size_expr_evaluator q09 778.49 774.32 4.17 (0.54%)
fixed_size_expr_evaluator q10 261.77 265.41 -3.64 (-1.37%)
fixed_size_expr_evaluator q11 258.97 259.34 -0.36 (-0.14%)
fixed_size_expr_evaluator q12 254.60 258.57 -3.97 (-1.54%)
fixed_size_expr_evaluator q13 1493.81 1484.77 9.04 (0.61%)
fixed_size_seq_scan q23 140.25 145.20 -4.95 (-3.41%)
join q31 12.19 12.12 0.07 (0.62%)
ldbc_snb_ic q35 777.78 1037.84 -260.06 (-25.06%)
ldbc_snb_ic q36 57.41 52.55 4.86 (9.24%)
ldbc_snb_is q32 9.58 10.35 -0.77 (-7.48%)
ldbc_snb_is q33 19.81 17.76 2.05 (11.55%)
ldbc_snb_is q34 8.63 7.58 1.05 (13.81%)
multi-rel multi-rel-large-scan 2858.36 2861.49 -3.13 (-0.11%)
multi-rel multi-rel-lookup 78.65 50.94 27.71 (54.40%)
multi-rel multi-rel-small-scan 48.14 49.08 -0.94 (-1.92%)
order_by q25 149.51 157.19 -7.68 (-4.88%)
order_by q26 468.41 478.13 -9.72 (-2.03%)
order_by q27 1441.11 1414.46 26.65 (1.88%)
scan_after_filter q01 193.50 199.24 -5.74 (-2.88%)
scan_after_filter q02 178.73 187.30 -8.57 (-4.58%)
shortest_path_ldbc100 q39 115.68 114.31 1.38 (1.20%)
var_size_expr_evaluator q03 2090.67 2086.16 4.51 (0.22%)
var_size_expr_evaluator q04 2298.78 2216.28 82.50 (3.72%)
var_size_expr_evaluator q05 2631.50 2699.20 -67.70 (-2.51%)
var_size_expr_evaluator q06 1405.73 1375.88 29.84 (2.17%)
var_size_seq_scan q19 1496.21 1474.04 22.17 (1.50%)
var_size_seq_scan q20 3200.51 3167.00 33.51 (1.06%)
var_size_seq_scan q21 2423.71 2405.87 17.85 (0.74%)
var_size_seq_scan q22 134.75 130.96 3.78 (2.89%)

Copy link

Benchmark Result

Master commit hash: e90380ea89fd37a7fd29bf933d7e81ab1af51318
Branch commit hash: 980b3514a44427cf68c94fcf1ce3de8f90947742

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 664.23 673.41 -9.18 (-1.36%)
aggregation q28 11738.32 11855.83 -117.52 (-0.99%)
filter q14 144.64 150.16 -5.51 (-3.67%)
filter q15 145.66 154.33 -8.68 (-5.62%)
filter q16 323.77 325.04 -1.27 (-0.39%)
filter q17 464.75 475.59 -10.84 (-2.28%)
filter q18 1978.99 1926.26 52.72 (2.74%)
fixed_size_expr_evaluator q07 566.15 564.10 2.05 (0.36%)
fixed_size_expr_evaluator q08 787.36 775.26 12.10 (1.56%)
fixed_size_expr_evaluator q09 790.71 773.71 17.00 (2.20%)
fixed_size_expr_evaluator q10 262.64 267.27 -4.63 (-1.73%)
fixed_size_expr_evaluator q11 255.57 262.27 -6.70 (-2.55%)
fixed_size_expr_evaluator q12 256.99 258.08 -1.09 (-0.42%)
fixed_size_expr_evaluator q13 1495.80 1489.55 6.24 (0.42%)
fixed_size_seq_scan q23 138.73 143.99 -5.26 (-3.65%)
join q31 12.04 11.80 0.24 (2.04%)
ldbc_snb_ic q35 990.54 766.22 224.32 (29.28%)
ldbc_snb_ic q36 51.51 46.78 4.74 (10.13%)
ldbc_snb_is q32 10.14 9.51 0.63 (6.59%)
ldbc_snb_is q33 17.88 18.00 -0.12 (-0.64%)
ldbc_snb_is q34 7.30 8.17 -0.86 (-10.59%)
multi-rel multi-rel-large-scan 2817.80 5573.63 -2755.82 (-49.44%)
multi-rel multi-rel-lookup 46.79 50.17 -3.38 (-6.74%)
multi-rel multi-rel-small-scan 47.87 74.62 -26.75 (-35.85%)
order_by q25 148.64 156.62 -7.98 (-5.10%)
order_by q26 468.19 478.48 -10.29 (-2.15%)
order_by q27 1445.08 1422.63 22.45 (1.58%)
scan_after_filter q01 195.08 200.39 -5.31 (-2.65%)
scan_after_filter q02 179.45 187.77 -8.32 (-4.43%)
shortest_path_ldbc100 q39 113.30 112.21 1.09 (0.97%)
var_size_expr_evaluator q03 2093.77 2078.03 15.75 (0.76%)
var_size_expr_evaluator q04 2287.78 2212.82 74.96 (3.39%)
var_size_expr_evaluator q05 2631.11 2731.49 -100.38 (-3.68%)
var_size_expr_evaluator q06 1404.78 1374.81 29.98 (2.18%)
var_size_seq_scan q19 1491.93 1467.09 24.84 (1.69%)
var_size_seq_scan q20 3187.18 3135.81 51.37 (1.64%)
var_size_seq_scan q21 2425.40 2402.99 22.40 (0.93%)
var_size_seq_scan q22 134.53 131.96 2.57 (1.95%)

src/include/function/table/bind_data.h Outdated Show resolved Hide resolved
@andyfengHKU andyfengHKU force-pushed the projection-push-down branch from f8ba4d9 to d8cc8f4 Compare August 21, 2024 16:53
Copy link

Benchmark Result

Master commit hash: 700b10f71c4e9ea19b4cd7e57ec864a38033225e
Branch commit hash: dc6bfbfa595bd137be39184a8fe08b2c54270bca

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 665.56 668.53 -2.97 (-0.44%)
aggregation q28 12010.95 11953.42 57.52 (0.48%)
copy node-Comment 68763.20 68567.14 196.06 (0.29%)
copy node-Forum 5174.71 5609.97 -435.26 (-7.76%)
copy node-Organisation 1564.94 1542.22 22.72 (1.47%)
copy node-Person 2855.31 2926.52 -71.21 (-2.43%)
copy node-Place 1462.67 1462.74 -0.07 (-0.00%)
copy node-Post 27448.99 27296.52 152.47 (0.56%)
copy node-Tag 1471.82 1626.23 -154.41 (-9.49%)
copy node-Tagclass 647.85 1400.46 -752.61 (-53.74%)
copy rel-comment-hasCreator 60352.73 55925.00 4427.73 (7.92%)
copy rel-comment-hasTag 77897.31 77494.42 402.89 (0.52%)
copy rel-comment-isLocatedIn 64497.17 64822.00 -324.83 (-0.50%)
copy rel-containerOf 16815.92 16836.69 -20.77 (-0.12%)
copy rel-forum-hasTag 4098.34 4038.90 59.44 (1.47%)
copy rel-hasInterest 2568.75 2867.67 -298.92 (-10.42%)
copy rel-hasMember 52813.11 53480.62 -667.51 (-1.25%)
copy rel-hasModerator 2024.56 1873.10 151.46 (8.09%)
copy rel-hasType 601.46 719.59 -118.13 (-16.42%)
copy rel-isPartOf 694.71 575.30 119.41 (20.76%)
copy rel-isSubclassOf 546.88 387.40 159.48 (41.17%)
copy rel-knows 5753.00 6089.40 -336.40 (-5.52%)
copy rel-likes-comment 87456.08 86746.41 709.67 (0.82%)
copy rel-likes-post 33163.37 34424.23 -1260.86 (-3.66%)
copy rel-organisation-isLocatedIn 651.63 466.05 185.58 (39.82%)
copy rel-person-isLocatedIn 779.59 918.55 -138.96 (-15.13%)
copy rel-post-hasCreator 17446.95 18421.75 -974.80 (-5.29%)
copy rel-post-hasTag 21593.71 22652.77 -1059.06 (-4.68%)
copy rel-post-isLocatedIn 17648.97 18384.76 -735.79 (-4.00%)
copy rel-replyOf-comment 72254.77 64846.55 7408.22 (11.42%)
copy rel-replyOf-post 48112.27 47921.58 190.69 (0.40%)
copy rel-studyAt 900.06 975.53 -75.47 (-7.74%)
copy rel-workAt 1088.99 1164.92 -75.93 (-6.52%)
filter q14 143.04 144.67 -1.63 (-1.13%)
filter q15 146.55 147.65 -1.10 (-0.75%)
filter q16 324.27 321.50 2.77 (0.86%)
filter q17 466.13 465.22 0.92 (0.20%)
filter q18 1918.29 1993.28 -74.99 (-3.76%)
fixed_size_expr_evaluator q07 558.86 555.62 3.25 (0.58%)
fixed_size_expr_evaluator q08 769.51 777.35 -7.84 (-1.01%)
fixed_size_expr_evaluator q09 766.06 778.60 -12.54 (-1.61%)
fixed_size_expr_evaluator q10 258.44 257.71 0.73 (0.28%)
fixed_size_expr_evaluator q11 252.76 251.64 1.12 (0.45%)
fixed_size_expr_evaluator q12 252.33 250.60 1.73 (0.69%)
fixed_size_expr_evaluator q13 1488.94 1484.57 4.37 (0.29%)
fixed_size_seq_scan q23 136.24 133.27 2.97 (2.23%)
join q31 12.22 11.39 0.83 (7.26%)
ldbc_snb_ic q35 795.02 771.29 23.73 (3.08%)
ldbc_snb_ic q36 47.90 48.19 -0.29 (-0.60%)
ldbc_snb_is q32 9.08 8.69 0.39 (4.51%)
ldbc_snb_is q33 18.13 15.30 2.83 (18.49%)
ldbc_snb_is q34 8.35 7.73 0.62 (8.05%)
multi-rel multi-rel-large-scan 2899.14 2881.53 17.62 (0.61%)
multi-rel multi-rel-lookup 65.80 66.20 -0.41 (-0.62%)
multi-rel multi-rel-small-scan 55.58 53.93 1.65 (3.06%)
order_by q25 150.62 148.91 1.71 (1.15%)
order_by q26 466.17 464.11 2.06 (0.44%)
order_by q27 1424.58 1422.50 2.08 (0.15%)
scan_after_filter q01 192.49 192.10 0.39 (0.20%)
scan_after_filter q02 181.55 182.94 -1.40 (-0.76%)
shortest_path_ldbc100 q39 162.67 97.14 65.52 (67.45%)
var_size_expr_evaluator q03 2108.87 2071.60 37.27 (1.80%)
var_size_expr_evaluator q04 2244.44 2263.46 -19.02 (-0.84%)
var_size_expr_evaluator q05 2644.96 2624.07 20.89 (0.80%)
var_size_expr_evaluator q06 1389.57 1337.83 51.74 (3.87%)
var_size_seq_scan q19 1527.30 1484.97 42.33 (2.85%)
var_size_seq_scan q20 3150.82 3194.70 -43.88 (-1.37%)
var_size_seq_scan q21 2458.76 2430.11 28.65 (1.18%)
var_size_seq_scan q22 136.39 133.10 3.30 (2.48%)

@andyfengHKU andyfengHKU merged commit f37cdd1 into master Aug 21, 2024
@andyfengHKU andyfengHKU deleted the projection-push-down branch August 21, 2024 18:49
ted-wq-x pushed a commit to ted-wq-x/kuzu that referenced this pull request Nov 14, 2024
* Add projection push down to csv and parquet

* Run clang-format

---------

Co-authored-by: CI Bot <andyfengHKU@users.noreply.github.com>
(cherry picked from commit f37cdd1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants