Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding) - delete snapshot immediately after resharding is finished #10450

Merged
merged 3 commits into from
Jan 17, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 38 additions & 18 deletions chain/chain/src/chain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3486,33 +3486,53 @@ impl Chain {
)))
}

/// Function to create a new snapshot if needed
/// Function to create or delete a snapshot if necessary.
fn process_snapshot(&mut self) -> Result<(), Error> {
let (make_snapshot, delete_snapshot) = self.should_make_or_delete_snapshot()?;
if !make_snapshot && !delete_snapshot {
return Ok(());
}
if let Some(snapshot_callbacks) = &self.snapshot_callbacks {
if make_snapshot {
let head = self.head()?;
let epoch_height =
self.epoch_manager.get_epoch_height_from_prev_block(&head.prev_block_hash)?;
let shard_uids = self
.epoch_manager
.get_shard_layout_from_prev_block(&head.prev_block_hash)?
.shard_uids()
.collect();
let last_block = self.get_block(&head.last_block_hash)?;
let make_snapshot_callback = &snapshot_callbacks.make_snapshot_callback;
make_snapshot_callback(head.prev_block_hash, epoch_height, shard_uids, last_block);
} else if delete_snapshot {
let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback;
delete_snapshot_callback();
}
let Some(snapshot_callbacks) = &self.snapshot_callbacks else { return Ok(()) };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small refactoring in this method, should be noop.

if make_snapshot {
let head = self.head()?;
let prev_hash = head.prev_block_hash;
let epoch_height = self.epoch_manager.get_epoch_height_from_prev_block(&prev_hash)?;
let shard_layout = &self.epoch_manager.get_shard_layout_from_prev_block(&prev_hash)?;
let shard_uids = shard_layout.shard_uids().collect();
let last_block = self.get_block(&head.last_block_hash)?;
let make_snapshot_callback = &snapshot_callbacks.make_snapshot_callback;
make_snapshot_callback(prev_hash, epoch_height, shard_uids, last_block);
} else if delete_snapshot {
let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback;
delete_snapshot_callback();
}
Ok(())
}

// Similar to `process_snapshot` but only called after resharding is done.
// This is to speed up the snapshot removal once resharding is finished in
// order to minimize the storage overhead.
pub fn process_snapshot_after_resharding(&mut self) -> Result<(), Error> {
let Some(snapshot_callbacks) = &self.snapshot_callbacks else { return Ok(()) };

let tries = self.runtime_adapter.get_tries();
let snapshot_config = tries.state_snapshot_config();
let delete_snapshot = match snapshot_config.state_snapshot_type {
// Do not delete snapshot if the node is configured to snapshot every epoch.
StateSnapshotType::EveryEpoch => false,
// Delete the snapshot if it was created only for resharding.
StateSnapshotType::ForReshardingOnly => true,
};

if delete_snapshot {
tracing::debug!(target: "resharding", "deleting snapshot after resharding");
let delete_snapshot_callback = &snapshot_callbacks.delete_snapshot_callback;
delete_snapshot_callback();
}

Ok(())
}

/// Function to check whether we need to create a new snapshot while processing the current block
/// Note that this functions is called as a part of block preprocesing, so the head is not updated to current block
fn should_make_or_delete_snapshot(&mut self) -> Result<(bool, bool), Error> {
Expand Down
1 change: 1 addition & 0 deletions chain/client/src/sync/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -702,6 +702,7 @@ impl StateSync {
)?;

if all_done {
chain.process_snapshot_after_resharding()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to call this in build_state_for_split_shards_postprocessing? Why tag into the whole state sync process, which may run even if there's no resharding happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wanted to but postprocessing is per shard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine to call the delete_state_snapshot function in trie multiple times (once per shard).

I'm scared the call to chain.process_snapshot_after_resharding() would be lost here in state.rs as this isn't where all the resharding code lives and it'll get hard in the future to track where all the call sites are.

Upto you tho!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call it per shard then the snapshot will get deleted after the first shard is finished and the remaining shards will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right! Apologies, ignore my comment!

Ok(StateSyncResult::Completed)
} else {
Ok(StateSyncResult::InProgress)
Expand Down
Loading