Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding): flat storage resharding mvp #12164

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
51f9f44
fix typo
Trisfald Aug 30, 2024
eb421de
skeleton for flat storage resharding V3 update
Trisfald Sep 3, 2024
c1a71a6
skeleton of flat storage resharder
Trisfald Sep 3, 2024
9234160
add TODO details
Trisfald Sep 3, 2024
c39b476
add API stub to resume or start resharding of flat storage
Trisfald Sep 20, 2024
f977326
flat storage resharding start fn
Trisfald Sep 24, 2024
d6823d3
many improvements
Trisfald Sep 25, 2024
0195907
add test
Trisfald Sep 26, 2024
c7a6875
improve resharder api
Trisfald Sep 26, 2024
4c7e9b2
Merge remote-tracking branch 'origin/master' into flat-storage-reshar…
Trisfald Sep 26, 2024
79730a4
use the new flat store adapter
Trisfald Sep 27, 2024
ad83e0b
implement simple key value copy
Trisfald Sep 27, 2024
ae82453
add implemnetation of flat storage resharding interrupt
Trisfald Sep 30, 2024
e735755
change tests setup to use a test chain
Trisfald Sep 30, 2024
fc38090
handle more DB columns
Trisfald Sep 30, 2024
75c42b5
improve comments
Trisfald Sep 30, 2024
3848c40
Merge remote-tracking branch 'origin/master' into flat-storage-reshar…
Trisfald Sep 30, 2024
306c502
fix unit tests
Trisfald Sep 30, 2024
b9f11d4
add parent shard deletion in task
Trisfald Oct 1, 2024
7fe014e
update protocol schema
Trisfald Oct 1, 2024
d02fc52
remove todo
Trisfald Oct 1, 2024
cf8ec95
remove parent flat storage through the manager
Trisfald Oct 1, 2024
f45cc9e
multi line imports
Trisfald Oct 1, 2024
3c188c5
apply code review suggestions
Trisfald Oct 1, 2024
ac6d642
add note about parent shard flat storage invariant
Trisfald Oct 1, 2024
9269c37
add test reject_split_shard_if_parent_is_not_ready
Trisfald Oct 1, 2024
7ad602e
use a dedicated type to represent task statuses
Trisfald Oct 1, 2024
d0b3125
refactor copy_kv_to_child into a free function
Trisfald Oct 1, 2024
df869fc
address code review comments
Trisfald Oct 2, 2024
e1f3b5d
refactor ReshardingEventType
Trisfald Oct 2, 2024
6d4188c
update protocol schema
Trisfald Oct 2, 2024
d6b5ab3
improve code clarity
Trisfald Oct 3, 2024
c52b947
remove FlatStorageResharderInner
Trisfald Oct 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions chain/chain-primitives/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,9 @@ pub enum Error {
/// GC error.
#[error("GC Error: {0}")]
GCError(String),
/// Resharding error.
#[error("Resharding Error: {0}")]
ReshardingError(String),
/// Anything else
#[error("Other Error: {0}")]
Other(String),
Expand Down Expand Up @@ -269,6 +272,7 @@ impl Error {
| Error::CannotBeFinalized
| Error::StorageError(_)
| Error::GCError(_)
| Error::ReshardingError(_)
| Error::DBNotFoundErr(_) => false,
Error::InvalidBlockPastTime(_, _)
| Error::InvalidBlockFutureTime(_)
Expand Down Expand Up @@ -392,6 +396,7 @@ impl Error {
Error::NotAValidator(_) => "not_a_validator",
Error::NotAChunkValidator => "not_a_chunk_validator",
Error::InvalidChallengeRoot => "invalid_challenge_root",
Error::ReshardingError(_) => "resharding_error",
}
}
}
Expand Down
19 changes: 18 additions & 1 deletion chain/chain/src/flat_storage_creator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
//! `CatchingUp`: moves flat storage head forward, so it may reach chain final head.
//! `Ready`: flat storage is created and it is up-to-date.

use crate::flat_storage_resharder::FlatStorageResharder;
use crate::types::RuntimeAdapter;
use crate::{ChainStore, ChainStoreAccess};
use assert_matches::assert_matches;
Expand Down Expand Up @@ -388,6 +389,12 @@ impl FlatStorageShardCreator {
FlatStorageStatus::Disabled => {
panic!("initiated flat storage creation for shard {shard_id} while it is disabled");
}
// If the flat storage is undergoing resharding it means it was previously created successfully,
// but resharding itself hasn't been finished.
// This case is a no-op because the flat storage resharder has already been created in `create_flat_storage_for_current_epoch`.
Comment on lines +392 to +394
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The formatting seems a bit off, I use an vs code extension called "rewrap" it has some on the fly and on demand formatting (shortcut).

FlatStorageStatus::Resharding(_) => {
return Ok(true);
}
};
Ok(false)
}
Expand All @@ -403,10 +410,13 @@ pub struct FlatStorageCreator {
impl FlatStorageCreator {
/// For each of tracked shards, either creates flat storage if it is already stored on DB,
/// or starts migration to flat storage which updates DB in background and creates flat storage afterwards.
///
/// Also resumes any resharding operation which was already in progress.
pub fn new(
epoch_manager: Arc<dyn EpochManagerAdapter>,
runtime: Arc<dyn RuntimeAdapter>,
chain_store: &ChainStore,
flat_storage_resharder: &FlatStorageResharder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't completely understand why is flat_storage_resharder a part of FlatStorageCreator? The FlatStorageCreator is created at the time of calling new() in client. The Self::create_flat_storage_for_current_epoch is also thus called only in the beginning right? How do we later set the status of some flat storage to FlatStorageStatus::Resharding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the FlatStorageResharder is part of client.

FlatStorageCreator uses a reference to the resharder to handle the case when the node restarts and flat storage resharding was in progress. I did like this because the creator already has logic to handle resuming flat storage from the previous state.

In normal flow the resharding of flat storage would be initiated by the resharding manager, but that part is not done yet

num_threads: usize,
) -> Result<Option<Self>, Error> {
let flat_storage_manager = runtime.get_flat_storage_manager();
Expand All @@ -420,6 +430,7 @@ impl FlatStorageCreator {
&epoch_manager,
&flat_storage_manager,
&runtime,
&flat_storage_resharder,
)?;

// Create flat storage for the shards in the next epoch. This only
Expand Down Expand Up @@ -447,6 +458,7 @@ impl FlatStorageCreator {
epoch_manager: &Arc<dyn EpochManagerAdapter>,
flat_storage_manager: &FlatStorageManager,
runtime: &Arc<dyn RuntimeAdapter>,
_flat_storage_resharder: &FlatStorageResharder,
) -> Result<HashMap<ShardUId, FlatStorageShardCreator>, Error> {
let epoch_id = &chain_head.epoch_id;
tracing::debug!(target: "store", ?epoch_id, "creating flat storage for the current epoch");
Expand All @@ -473,6 +485,10 @@ impl FlatStorageCreator {
);
}
FlatStorageStatus::Disabled => {}
FlatStorageStatus::Resharding(_status) => {
// TODO(Trisfald): call resume
// flat_storage_resharder.resume(shard_uid, &status, ...)?;
Comment on lines +489 to +490
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a todo for this PR or later? I'm fine with later but I just saw that the resume method is already implemented so might as well call it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant for a second PR, I'll need to see how to link everything together

}
}
}

Expand Down Expand Up @@ -502,7 +518,8 @@ impl FlatStorageCreator {
}
FlatStorageStatus::Empty
| FlatStorageStatus::Creation(_)
| FlatStorageStatus::Disabled => {
| FlatStorageStatus::Disabled
| FlatStorageStatus::Resharding(_) => {
// The flat storage for children shards will be created
// separately in the resharding process.
}
Expand Down
Loading
Loading