-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(resharding) - delete snapshot immediately after resharding is finished #10450
Conversation
3f6d071
to
6e06ea2
Compare
let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback; | ||
delete_snapshot_callback(); | ||
} | ||
let Some(snapshot_callbacks) = &self.snapshot_callbacks else { return Ok(()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a small refactoring in this method, should be noop.
cc @posvyatokum and @VanBarbascu who asked for this feature to be implemented |
Actually the integration tests cover this case. Still would be good to see it on a real db but even without it I'm fairly confident it should work. |
@@ -702,6 +702,7 @@ impl StateSync { | |||
)?; | |||
|
|||
if all_done { | |||
chain.process_snapshot_after_resharding()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to call this in build_state_for_split_shards_postprocessing
? Why tag into the whole state sync process, which may run even if there's no resharding happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I wanted to but postprocessing is per shard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine to call the delete_state_snapshot function in trie multiple times (once per shard).
I'm scared the call to chain.process_snapshot_after_resharding()
would be lost here in state.rs as this isn't where all the resharding code lives and it'll get hard in the future to track where all the call sites are.
Upto you tho!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call it per shard then the snapshot will get deleted after the first shard is finished and the remaining shards will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you're right! Apologies, ignore my comment!
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10450 +/- ##
==========================================
- Coverage 72.01% 72.00% -0.01%
==========================================
Files 718 718
Lines 144922 145020 +98
Branches 144922 145020 +98
==========================================
+ Hits 104360 104420 +60
- Misses 35768 35802 +34
- Partials 4794 4798 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
As long as the snapshot is present the storage overhead will keep increasing. This PR makes it so that the snapshot is deleted immediately after resharding is finished. The snapshot is only deleted if the node is not configured to take snapshots every epoch.
Initially I wanted to put the trigger logic somewhere between sync jobs actor and chain but there it was hard to tell if resharding is finished for all the shards. It was made trickier by the fact that we only resharding shards that we will track next epoch and should only check those. At the end of the day I placed it in the state sync logic where it'll be triggered once all state syncs and hence all reshardings are done.