Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding) - delete snapshot immediately after resharding is finished #10450

Merged
merged 3 commits into from
Jan 17, 2024

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Jan 17, 2024

As long as the snapshot is present the storage overhead will keep increasing. This PR makes it so that the snapshot is deleted immediately after resharding is finished. The snapshot is only deleted if the node is not configured to take snapshots every epoch.

Initially I wanted to put the trigger logic somewhere between sync jobs actor and chain but there it was hard to tell if resharding is finished for all the shards. It was made trickier by the fact that we only resharding shards that we will track next epoch and should only check those. At the end of the day I placed it in the state sync logic where it'll be triggered once all state syncs and hence all reshardings are done.

let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback;
delete_snapshot_callback();
}
let Some(snapshot_callbacks) = &self.snapshot_callbacks else { return Ok(()) };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small refactoring in this method, should be noop.

@wacban wacban marked this pull request as ready for review January 17, 2024 14:52
@wacban wacban requested a review from a team as a code owner January 17, 2024 14:52
@wacban
Copy link
Contributor Author

wacban commented Jan 17, 2024

cc @posvyatokum and @VanBarbascu who asked for this feature to be implemented
I had a look at logs from the resharding nayduck test and all seems ok. That being said in nayduck the resharding completes near instantly so it may not be the best test. The proper way to see it working would be to do a mocknet test. @posvyatokum since you're doing it anyway for the release, can you check if the snapshot is properly deleted at the right time? Or just point me to a mocknet host after the test is finished and I'll have a quick look at the logs.

@wacban
Copy link
Contributor Author

wacban commented Jan 17, 2024

Actually the integration tests cover this case. Still would be good to see it on a real db but even without it I'm fairly confident it should work.

@@ -702,6 +702,7 @@ impl StateSync {
)?;

if all_done {
chain.process_snapshot_after_resharding()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to call this in build_state_for_split_shards_postprocessing? Why tag into the whole state sync process, which may run even if there's no resharding happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wanted to but postprocessing is per shard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine to call the delete_state_snapshot function in trie multiple times (once per shard).

I'm scared the call to chain.process_snapshot_after_resharding() would be lost here in state.rs as this isn't where all the resharding code lives and it'll get hard in the future to track where all the call sites are.

Upto you tho!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call it per shard then the snapshot will get deleted after the first shard is finished and the remaining shards will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right! Apologies, ignore my comment!

Copy link

codecov bot commented Jan 17, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (db20a66) 72.01% compared to head (6d816fe) 72.00%.
Report is 4 commits behind head on master.

Files Patch % Lines
chain/chain/src/chain.rs 82.14% 0 Missing and 5 partials ⚠️
chain/client/src/sync/state.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10450      +/-   ##
==========================================
- Coverage   72.01%   72.00%   -0.01%     
==========================================
  Files         718      718              
  Lines      144922   145020      +98     
  Branches   144922   145020      +98     
==========================================
+ Hits       104360   104420      +60     
- Misses      35768    35802      +34     
- Partials     4794     4798       +4     
Flag Coverage Δ
backward-compatibility 0.08% <0.00%> (-0.01%) ⬇️
db-migration 0.08% <0.00%> (-0.01%) ⬇️
genesis-check 1.26% <0.00%> (-0.01%) ⬇️
integration-tests 36.88% <79.31%> (+0.01%) ⬆️
linux 71.43% <79.31%> (-0.10%) ⬇️
linux-nightly 71.57% <79.31%> (+<0.01%) ⬆️
macos 55.51% <75.86%> (+0.03%) ⬆️
pytests 1.48% <0.00%> (-0.01%) ⬇️
sanity-checks 1.27% <0.00%> (-0.01%) ⬇️
unittests 68.08% <75.86%> (-0.04%) ⬇️
upgradability 0.13% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wacban wacban added this pull request to the merge queue Jan 17, 2024
Merged via the queue into master with commit dda19e3 Jan 17, 2024
26 checks passed
@wacban wacban deleted the waclaw-resharding-snapshot branch January 17, 2024 18:14
@wacban wacban mentioned this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants