Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http file cache improve #4159

Merged
merged 3 commits into from
Aug 28, 2024
Merged

Http file cache improve #4159

merged 3 commits into from
Aug 28, 2024

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Aug 28, 2024

Description

This PR improves the http file cache performance by avoiding retriving the metadata information if the file cache is present.

Solves #4010

Performance number:

We run the same query twice in the same transaction:
load from 's3://kuzu-test/dataset/ldbc10/comment_0_0.csv'(header=true,delim='|') return *;
LDBC10: 2.1 GB
This branch:

Run ID compiling time execution time total time
1 27.32s 0.846s 28.166 s
2 < 1ms 0.812s 0.812 s

Master:

Run ID compiling time execution time total time
1 27.05s 4.59s 31.64 s
2 0.249s 4.367s 5.616 s

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.05%. Comparing base (27c5ef5) to head (efa9e04).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4159   +/-   ##
=======================================
  Coverage   84.05%   84.05%           
=======================================
  Files        1330     1330           
  Lines       53174    53174           
  Branches     7418     7418           
=======================================
+ Hits        44694    44695    +1     
+ Misses       8308     8307    -1     
  Partials      172      172           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark Result

Master commit hash: 27c5ef56639743e13372f9035f6af6ac4c5f423f
Branch commit hash: a3ce919ef500d7aa90484f377d2510b3a66905a1

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 681.03 680.12 0.91 (0.13%)
aggregation q28 11404.01 11059.16 344.85 (3.12%)
filter q14 160.50 159.66 0.84 (0.53%)
filter q15 164.72 156.35 8.38 (5.36%)
filter q16 342.18 343.03 -0.85 (-0.25%)
filter q17 482.07 485.85 -3.78 (-0.78%)
filter q18 1985.02 1949.98 35.03 (1.80%)
fixed_size_expr_evaluator q07 572.01 573.27 -1.26 (-0.22%)
fixed_size_expr_evaluator q08 784.01 786.23 -2.22 (-0.28%)
fixed_size_expr_evaluator q09 780.75 784.76 -4.01 (-0.51%)
fixed_size_expr_evaluator q10 274.14 274.46 -0.32 (-0.12%)
fixed_size_expr_evaluator q11 268.11 268.49 -0.38 (-0.14%)
fixed_size_expr_evaluator q12 269.94 268.22 1.72 (0.64%)
fixed_size_expr_evaluator q13 1501.57 1504.87 -3.30 (-0.22%)
fixed_size_seq_scan q23 153.29 152.26 1.03 (0.68%)
join q31 13.94 13.31 0.63 (4.73%)
ldbc_snb_ic q35 763.48 776.96 -13.48 (-1.74%)
ldbc_snb_ic q36 45.89 48.66 -2.76 (-5.67%)
ldbc_snb_is q32 9.36 9.49 -0.13 (-1.41%)
ldbc_snb_is q33 13.26 18.76 -5.51 (-29.34%)
ldbc_snb_is q34 8.67 8.66 0.01 (0.07%)
multi-rel multi-rel-large-scan 3792.05 2787.21 1004.84 (36.05%)
multi-rel multi-rel-lookup 76.01 65.77 10.24 (15.56%)
multi-rel multi-rel-small-scan 52.65 54.51 -1.86 (-3.41%)
order_by q25 166.83 166.71 0.12 (0.07%)
order_by q26 482.05 481.83 0.21 (0.04%)
order_by q27 1460.06 1461.24 -1.18 (-0.08%)
scan_after_filter q01 207.49 208.62 -1.12 (-0.54%)
scan_after_filter q02 196.59 197.92 -1.34 (-0.68%)
shortest_path_ldbc100 q39 77.82 73.84 3.97 (5.38%)
var_size_expr_evaluator q03 2099.22 2118.99 -19.76 (-0.93%)
var_size_expr_evaluator q04 2329.04 2284.59 44.46 (1.95%)
var_size_expr_evaluator q05 2664.95 2660.79 4.17 (0.16%)
var_size_expr_evaluator q06 1399.56 1405.44 -5.88 (-0.42%)
var_size_seq_scan q19 1501.23 1502.23 -1.00 (-0.07%)
var_size_seq_scan q20 3238.93 3222.81 16.12 (0.50%)
var_size_seq_scan q21 2476.01 2473.84 2.17 (0.09%)
var_size_seq_scan q22 138.44 137.95 0.49 (0.36%)

Copy link

Benchmark Result

Master commit hash: 27c5ef56639743e13372f9035f6af6ac4c5f423f
Branch commit hash: 5cb42f8d9f91bdb338b5bb62e8ea7152e76ad002

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 680.31 680.12 0.19 (0.03%)
aggregation q28 11027.56 11059.16 -31.60 (-0.29%)
filter q14 160.47 159.66 0.81 (0.51%)
filter q15 163.24 156.35 6.89 (4.41%)
filter q16 342.30 343.03 -0.73 (-0.21%)
filter q17 481.35 485.85 -4.50 (-0.93%)
filter q18 1944.74 1949.98 -5.24 (-0.27%)
fixed_size_expr_evaluator q07 576.43 573.27 3.16 (0.55%)
fixed_size_expr_evaluator q08 795.44 786.23 9.21 (1.17%)
fixed_size_expr_evaluator q09 789.61 784.76 4.84 (0.62%)
fixed_size_expr_evaluator q10 273.39 274.46 -1.07 (-0.39%)
fixed_size_expr_evaluator q11 268.00 268.49 -0.49 (-0.18%)
fixed_size_expr_evaluator q12 266.73 268.22 -1.49 (-0.56%)
fixed_size_expr_evaluator q13 1501.41 1504.87 -3.46 (-0.23%)
fixed_size_seq_scan q23 153.29 152.26 1.03 (0.68%)
join q31 12.73 13.31 -0.58 (-4.38%)
ldbc_snb_ic q35 769.27 776.96 -7.69 (-0.99%)
ldbc_snb_ic q36 49.71 48.66 1.05 (2.17%)
ldbc_snb_is q32 10.31 9.49 0.82 (8.65%)
ldbc_snb_is q33 18.09 18.76 -0.67 (-3.59%)
ldbc_snb_is q34 8.59 8.66 -0.07 (-0.83%)
multi-rel multi-rel-large-scan 2799.47 2787.21 12.26 (0.44%)
multi-rel multi-rel-lookup 70.79 65.77 5.02 (7.63%)
multi-rel multi-rel-small-scan 49.77 54.51 -4.74 (-8.69%)
order_by q25 162.66 166.71 -4.05 (-2.43%)
order_by q26 480.88 481.83 -0.95 (-0.20%)
order_by q27 1458.48 1461.24 -2.76 (-0.19%)
scan_after_filter q01 207.96 208.62 -0.65 (-0.31%)
scan_after_filter q02 196.23 197.92 -1.69 (-0.86%)
shortest_path_ldbc100 q39 74.35 73.84 0.50 (0.68%)
var_size_expr_evaluator q03 2104.30 2118.99 -14.69 (-0.69%)
var_size_expr_evaluator q04 2311.72 2284.59 27.13 (1.19%)
var_size_expr_evaluator q05 2655.29 2660.79 -5.49 (-0.21%)
var_size_expr_evaluator q06 1398.92 1405.44 -6.52 (-0.46%)
var_size_seq_scan q19 1500.23 1502.23 -2.00 (-0.13%)
var_size_seq_scan q20 3238.61 3222.81 15.80 (0.49%)
var_size_seq_scan q21 2476.38 2473.84 2.54 (0.10%)
var_size_seq_scan q22 137.08 137.95 -0.87 (-0.63%)

@acquamarin acquamarin merged commit f100588 into master Aug 28, 2024
30 checks passed
@acquamarin acquamarin deleted the http-file-cache-improve branch August 28, 2024 13:56
ted-wq-x pushed a commit to ted-wq-x/kuzu that referenced this pull request Nov 14, 2024
(cherry picked from commit f100588)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants