Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vacuum dropped columns during checkpoint #4074

Merged
merged 1 commit into from
Aug 14, 2024
Merged

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Aug 13, 2024

Description

The PR is to fix a bug due to not vacuum dropped columns during checkpoint. The bug happens when we recover after dropped a column and performed checkpoint. During the recovery, we will have some columns as nullptr, which lead to seg faults when we collect data types from table columns.

This PR solves the issue by always vacuuming dropped columns from storage during checkpoint. Note that the column ID inside property will be reset due to the vacuuming, that's why the checkpoint of catalog now happens after the checkpoint of tables. Letting storage modifies catalog may not be a good practice, alternatively, we can let TableCatalogEntry reset its column ids during checkpoint independently, which sounds like a better idea to me.

In the future, we should also reclaim disk pages of persistent column chunks being dropped during checkpoint.

A minor refactoring:

  • aligned columnID for rel table properties (nextColumnID starts from 1 to skip the anonymous NBR_ID column inside RelTableCatalogEntry). thus removed the special implementation of RelTableCatalogEntry::getColumnID.

Copy link

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 97.29730% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.53%. Comparing base (95d338e) to head (93840bb).

Files Patch % Lines
src/storage/store/csr_node_group.cpp 66.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4074      +/-   ##
==========================================
+ Coverage   84.52%   84.53%   +0.01%     
==========================================
  Files        1300     1300              
  Lines       51024    51048      +24     
  Branches     7051     7058       +7     
==========================================
+ Hits        43127    43153      +26     
+ Misses       7758     7755       -3     
- Partials      139      140       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ray6080 ray6080 force-pushed the fix-drop-column-checkpoint branch from 399fef3 to 93840bb Compare August 14, 2024 00:11
Copy link

Benchmark Result

Master commit hash: 95d338e22f58405ebc255286f8f6594c01e9f0b0
Branch commit hash: a4de5b2712556e88541aebbd808bd4e5c6eeab3b

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 677.57 675.04 2.53 (0.38%)
aggregation q28 11764.08 11363.26 400.81 (3.53%)
filter q14 153.99 150.84 3.15 (2.09%)
filter q15 151.16 152.91 -1.75 (-1.14%)
filter q16 331.38 328.89 2.49 (0.76%)
filter q17 471.55 471.30 0.25 (0.05%)
filter q18 1941.75 1941.45 0.30 (0.02%)
fixed_size_expr_evaluator q07 564.69 577.38 -12.69 (-2.20%)
fixed_size_expr_evaluator q08 773.10 779.19 -6.09 (-0.78%)
fixed_size_expr_evaluator q09 771.32 782.45 -11.13 (-1.42%)
fixed_size_expr_evaluator q10 264.56 267.65 -3.09 (-1.15%)
fixed_size_expr_evaluator q11 259.23 262.35 -3.12 (-1.19%)
fixed_size_expr_evaluator q12 259.43 261.64 -2.22 (-0.85%)
fixed_size_expr_evaluator q13 1493.75 1497.71 -3.96 (-0.26%)
fixed_size_seq_scan q23 142.63 146.35 -3.72 (-2.54%)
join q31 12.37 12.28 0.09 (0.72%)
ldbc_snb_ic q35 775.77 1087.33 -311.56 (-28.65%)
ldbc_snb_ic q36 46.74 41.66 5.08 (12.19%)
ldbc_snb_is q32 8.93 8.51 0.42 (4.92%)
ldbc_snb_is q33 19.17 18.52 0.65 (3.52%)
ldbc_snb_is q34 7.93 8.90 -0.97 (-10.90%)
multi-rel multi-rel-large-scan 2767.03 2849.47 -82.44 (-2.89%)
multi-rel multi-rel-lookup 57.29 60.15 -2.86 (-4.76%)
multi-rel multi-rel-small-scan 50.55 63.32 -12.76 (-20.16%)
order_by q25 151.22 153.82 -2.61 (-1.69%)
order_by q26 470.88 475.39 -4.50 (-0.95%)
order_by q27 1424.71 1434.29 -9.58 (-0.67%)
scan_after_filter q01 201.56 198.20 3.35 (1.69%)
scan_after_filter q02 187.42 186.16 1.26 (0.68%)
shortest_path_ldbc100 q39 156.99 155.86 1.12 (0.72%)
var_size_expr_evaluator q03 2072.47 2086.81 -14.34 (-0.69%)
var_size_expr_evaluator q04 2254.53 2263.25 -8.72 (-0.39%)
var_size_expr_evaluator q05 2677.69 2640.74 36.95 (1.40%)
var_size_expr_evaluator q06 1358.60 1346.79 11.82 (0.88%)
var_size_seq_scan q19 1481.62 1490.11 -8.49 (-0.57%)
var_size_seq_scan q20 3168.07 3153.42 14.65 (0.46%)
var_size_seq_scan q21 2419.20 2455.25 -36.05 (-1.47%)
var_size_seq_scan q22 135.41 134.86 0.54 (0.40%)

@ray6080 ray6080 merged commit a8e50b2 into master Aug 14, 2024
23 checks passed
@ray6080 ray6080 deleted the fix-drop-column-checkpoint branch August 14, 2024 01:01
ted-wq-x pushed a commit to ted-wq-x/kuzu that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants