feat(resharding) - delete snapshot immediately after resharding is finished #10450

wacban · 2024-01-17T14:39:07Z

As long as the snapshot is present the storage overhead will keep increasing. This PR makes it so that the snapshot is deleted immediately after resharding is finished. The snapshot is only deleted if the node is not configured to take snapshots every epoch.

Initially I wanted to put the trigger logic somewhere between sync jobs actor and chain but there it was hard to tell if resharding is finished for all the shards. It was made trickier by the fact that we only resharding shards that we will track next epoch and should only check those. At the end of the day I placed it in the state sync logic where it'll be triggered once all state syncs and hence all reshardings are done.

…nished

wacban · 2024-01-17T14:46:21Z

chain/chain/src/chain.rs

-                let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback;
-                delete_snapshot_callback();
-            }
+        let Some(snapshot_callbacks) = &self.snapshot_callbacks else { return Ok(()) };


Only a small refactoring in this method, should be noop.

wacban · 2024-01-17T14:55:07Z

cc @posvyatokum and @VanBarbascu who asked for this feature to be implemented
I had a look at logs from the resharding nayduck test and all seems ok. That being said in nayduck the resharding completes near instantly so it may not be the best test. The proper way to see it working would be to do a mocknet test. @posvyatokum since you're doing it anyway for the release, can you check if the snapshot is properly deleted at the right time? Or just point me to a mocknet host after the test is finished and I'll have a quick look at the logs.

wacban · 2024-01-17T15:30:47Z

Actually the integration tests cover this case. Still would be good to see it on a real db but even without it I'm fairly confident it should work.

shreyan-gupta · 2024-01-17T15:45:25Z

chain/client/src/sync/state.rs

@@ -702,6 +702,7 @@ impl StateSync {
        )?;

        if all_done {
+            chain.process_snapshot_after_resharding()?;


Would it make more sense to call this in build_state_for_split_shards_postprocessing? Why tag into the whole state sync process, which may run even if there's no resharding happening.

yeah I wanted to but postprocessing is per shard

I think it should be fine to call the delete_state_snapshot function in trie multiple times (once per shard).

I'm scared the call to chain.process_snapshot_after_resharding() would be lost here in state.rs as this isn't where all the resharding code lives and it'll get hard in the future to track where all the call sites are.

Upto you tho!

If you call it per shard then the snapshot will get deleted after the first shard is finished and the remaining shards will fail.

Oops, you're right! Apologies, ignore my comment!

codecov · 2024-01-17T15:50:58Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (db20a66) 72.01% compared to head (6d816fe) 72.00%.
Report is 4 commits behind head on master.

Files	Patch %	Lines
chain/chain/src/chain.rs	82.14%	0 Missing and 5 partials ⚠️
chain/client/src/sync/state.rs	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10450      +/-   ##
==========================================
- Coverage   72.01%   72.00%   -0.01%     
==========================================
  Files         718      718              
  Lines      144922   145020      +98     
  Branches   144922   145020      +98     
==========================================
+ Hits       104360   104420      +60     
- Misses      35768    35802      +34     
- Partials     4794     4798       +4

Flag	Coverage Δ
backward-compatibility	`0.08% <0.00%> (-0.01%)`	⬇️
db-migration	`0.08% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.26% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.88% <79.31%> (+0.01%)`	⬆️
linux	`71.43% <79.31%> (-0.10%)`	⬇️
linux-nightly	`71.57% <79.31%> (+<0.01%)`	⬆️
macos	`55.51% <75.86%> (+0.03%)`	⬆️
pytests	`1.48% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.27% <0.00%> (-0.01%)`	⬇️
unittests	`68.08% <75.86%> (-0.04%)`	⬇️
upgradability	`0.13% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nished (#10450)

wacban added 2 commits January 17, 2024 14:30

feat(resharding) - delete snapshot immediately after resharding is fi…

ce66b42

…nished

nits and comments

6e06ea2

wacban force-pushed the waclaw-resharding-snapshot branch from 3f6d071 to 6e06ea2 Compare January 17, 2024 14:47

wacban commented Jan 17, 2024

View reviewed changes

wacban requested a review from shreyan-gupta January 17, 2024 14:49

wacban marked this pull request as ready for review January 17, 2024 14:52

wacban requested a review from a team as a code owner January 17, 2024 14:52

fix integration test

6d816fe

shreyan-gupta reviewed Jan 17, 2024

View reviewed changes

shreyan-gupta approved these changes Jan 17, 2024

View reviewed changes

wacban added this pull request to the merge queue Jan 17, 2024

Merged via the queue into master with commit dda19e3 Jan 17, 2024
26 checks passed

wacban deleted the waclaw-resharding-snapshot branch January 17, 2024 18:14

wacban mentioned this pull request Jan 23, 2024

1.37 release timeline #10404

Closed

posvyatokum pushed a commit that referenced this pull request Jan 23, 2024

feat(resharding) - delete snapshot immediately after resharding is fi…

390b452

…nished (#10450)

wacban mentioned this pull request Feb 7, 2024

🔷 [Tracking issue] Resharding v2 #8992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(resharding) - delete snapshot immediately after resharding is finished #10450

feat(resharding) - delete snapshot immediately after resharding is finished #10450

wacban commented Jan 17, 2024 •

edited

Loading

wacban Jan 17, 2024

wacban commented Jan 17, 2024

wacban commented Jan 17, 2024

shreyan-gupta Jan 17, 2024

wacban Jan 17, 2024

shreyan-gupta Jan 17, 2024

wacban Jan 17, 2024

shreyan-gupta Jan 17, 2024

codecov bot commented Jan 17, 2024

feat(resharding) - delete snapshot immediately after resharding is finished #10450

feat(resharding) - delete snapshot immediately after resharding is finished #10450

Conversation

wacban commented Jan 17, 2024 • edited Loading

wacban Jan 17, 2024

Choose a reason for hiding this comment

wacban commented Jan 17, 2024

wacban commented Jan 17, 2024

shreyan-gupta Jan 17, 2024

Choose a reason for hiding this comment

wacban Jan 17, 2024

Choose a reason for hiding this comment

shreyan-gupta Jan 17, 2024

Choose a reason for hiding this comment

wacban Jan 17, 2024

Choose a reason for hiding this comment

shreyan-gupta Jan 17, 2024

Choose a reason for hiding this comment

codecov bot commented Jan 17, 2024

Codecov Report

wacban commented Jan 17, 2024 •

edited

Loading