Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stats updates #3582

Merged
merged 1 commit into from
Jun 4, 2024
Merged

Fix stats updates #3582

merged 1 commit into from
Jun 4, 2024

Conversation

benjaminwinger
Copy link
Collaborator

Make use of the data offset when calculating the minimum and maximum values of column updates.
Also make use of null data to avoid including stats for placeholder data to avoid including stats for placeholder data used for null values.

Fixes #3572.

I added some tests for testing updates to multiple nodes, as that wasn't previously covered, as well as some negatives (for an issue which I suspected but turned out to something else, but it's probably useful extra coverage, especially since I noticed that few of the existing statistics tests were doing in-place updates that increase the min/max range).

Make use of the data offset when calculating the minimum and maximum values of column updates
Also make use of null data to avoid including stats for placeholder data
used for null values
Copy link

github-actions bot commented Jun 3, 2024

Benchmark Result

Master commit hash: 032bbd827659b71d0491ad0ccd2fe61c6bd8ca95
Branch commit hash: 816eb59c3efe6d0602a31ed0da019cf3e0500d5d

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 650.08 650.63 -0.55 (-0.09%)
aggregation q28 14643.51 13467.04 1176.47 (8.74%)
filter q14 132.12 132.66 -0.55 (-0.41%)
filter q15 134.59 130.85 3.74 (2.86%)
filter q16 308.32 311.15 -2.84 (-0.91%)
filter q17 452.29 458.97 -6.67 (-1.45%)
filter q18 1941.48 1933.43 8.05 (0.42%)
fixed_size_expr_evaluator q07 573.09 573.96 -0.87 (-0.15%)
fixed_size_expr_evaluator q08 803.10 799.67 3.44 (0.43%)
fixed_size_expr_evaluator q09 802.09 799.81 2.28 (0.29%)
fixed_size_expr_evaluator q10 246.83 248.96 -2.13 (-0.85%)
fixed_size_expr_evaluator q11 242.05 241.68 0.37 (0.15%)
fixed_size_expr_evaluator q12 243.26 241.80 1.46 (0.60%)
fixed_size_expr_evaluator q13 1482.53 1479.54 2.99 (0.20%)
fixed_size_seq_scan q23 122.13 122.32 -0.19 (-0.16%)
join q29 704.41 670.75 33.66 (5.02%)
join q30 1424.58 1448.53 -23.95 (-1.65%)
join q31 47.26 43.89 3.37 (7.68%)
ldbc_snb_ic q35 3302.36 3309.92 -7.56 (-0.23%)
ldbc_snb_ic q36 130.10 130.96 -0.86 (-0.66%)
ldbc_snb_is q32 12.61 12.49 0.13 (1.00%)
ldbc_snb_is q33 99.13 96.18 2.95 (3.07%)
ldbc_snb_is q34 100.73 102.81 -2.08 (-2.02%)
order_by q25 135.15 131.76 3.39 (2.57%)
order_by q26 438.06 437.09 0.97 (0.22%)
order_by q27 1418.46 1419.36 -0.90 (-0.06%)
scan_after_filter q01 172.86 171.35 1.51 (0.88%)
scan_after_filter q02 155.48 155.78 -0.30 (-0.19%)
shortest_path_ldbc100 q39 55.66 56.88 -1.22 (-2.15%)
var_size_expr_evaluator q03 2051.18 2051.60 -0.43 (-0.02%)
var_size_expr_evaluator q04 2253.59 2249.22 4.37 (0.19%)
var_size_expr_evaluator q05 2546.97 2551.60 -4.63 (-0.18%)
var_size_expr_evaluator q06 1392.80 1392.35 0.45 (0.03%)
var_size_seq_scan q19 1467.31 1469.83 -2.52 (-0.17%)
var_size_seq_scan q20 3033.27 3091.03 -57.76 (-1.87%)
var_size_seq_scan q21 2417.46 2391.72 25.74 (1.08%)
var_size_seq_scan q22 134.19 131.57 2.61 (1.99%)

@andyfengHKU andyfengHKU merged commit 35eee92 into master Jun 4, 2024
19 checks passed
@andyfengHKU andyfengHKU deleted the stats-fix branch June 4, 2024 01:57
ted-wq-x pushed a commit to ted-wq-x/kuzu that referenced this pull request Nov 14, 2024
Make use of the data offset when calculating the minimum and maximum values of column updates
Also make use of null data to avoid including stats for placeholder data
used for null values

(cherry picked from commit 35eee92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect statistics on relationship table.
3 participants