Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding): flat storage resharding mvp #12164

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

Trisfald
Copy link
Contributor

@Trisfald Trisfald commented Sep 27, 2024

The PR adds early MVP capabilities of resharding flat storage (V3).

Main addition is FlatStorageResharder and all the toolings around that. Also, you can see traces of an early attempt to tie-in the resharder to existing flat storage code, mainly the flat storage creator.

  • FlatStorageResharder takes care of everything related to resharding the flat storage.
    • Its running tasks can be interrupted by the a controller (integrated with existing resharding handle)
    • Uses the concept of a scheduler to run tasks in the background
  • ReshardingEventType is an utility enum to represent types of resharding. There's one for now, but it makes easier adding more.

Achievements

  • Preparing flat storage's content for children after a shard split, for account-id based keys only.
  • Deletion of parent flat storage

Missing pieces

  • Catchup phase for children and creation of proper flat storage
  • Handling more complex key-values (not account-id based)
  • Integration with resharding manager and flat storage creator
  • Additional tests
  • Metrics

Missing pieces will likely be done in another PR.


EDIT: integrated with ShardLayoutV2, fixed all unit tests, re-arranged description.

@@ -176,6 +177,8 @@ pub struct Client {
/// Cached precomputed set of TIER1 accounts.
/// See send_network_chain_info().
tier1_accounts_cache: Option<(EpochId, Arc<AccountKeys>)>,
/// Takes care of performing resharding on the flat storage.
pub flat_storage_resharder: FlatStorageResharder,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be moved inside the resharder manager @shreyan-gupta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree if it's simple enough

/// UId of the right child shard. Will contain everything greater or equal than boundary account.
pub right_child_shard: ShardUId,
/// The new shard layout.
pub shard_layout: ShardLayout,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about storing shard_layout here, but it makes things so much more self-contained in FlatStorageResharder and testing easier as well.

/// * `block_hash`: block hash of the block in which the split happens
/// * `scheduler`: component used to schedule the background tasks
/// * `controller`: manages the execution of the background tasks
pub fn start_resharding_from_new_shard_layout(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API can be improved if event_type_from_shard_layout or an equivalent is moved into the resharding manager


/// Struct used to destructure a new shard layout definition into the resulting resharding event.
#[cfg_attr(test, derive(PartialEq, Eq))]
enum ReshardingEventType {
Copy link
Contributor Author

@Trisfald Trisfald Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found this object useful to convey the essence of the resharding event. Yes, only Split is available.. for now

Maybe this and event_type_from_shard_layout could be moved in the resharding manager as they aren't really flat storage specific @shreyan-gupta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand side I love the idea. On the other hand I feel like that it only makes sense if we do it everywhere. It would be strange to have it in flat storage but not in memtrie etc. I'm undecided so I'll leave it to you ;)

/// Helps control the flat storage resharder operation. More specifically,
/// it has a way to know when the background task is done or to interrupt it.
#[derive(Clone)]
pub struct FlatStorageResharderController {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extends ReshardingHandle with notification capability. If it doesn't prove useful it can be easily be swapped with the handle directly

}

/// Represent the capability of scheduling the background tasks spawned by flat storage resharding.
pub trait FlatStorageResharderScheduler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with this trait to abstract the task scheduling. It works well in tests, I hope it can be plugged nicely with actors or whatever else we will end up using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just grepping for the "scheduler" keyword it seems that those are typically Senders e.g.:

        state_parts_task_scheduler: &Sender<ApplyStatePartsRequest>,
        load_memtrie_scheduler: &Sender<LoadMemtrieRequest>,
        block_catch_up_task_scheduler: &Sender<BlockCatchUpRequest>,

This API should work out of the box in TestLoop

@Trisfald Trisfald marked this pull request as ready for review September 30, 2024 15:08
@Trisfald Trisfald requested a review from a team as a code owner September 30, 2024 15:08
Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 86.15783% with 107 lines in your changes missing coverage. Please review.

Project coverage is 71.68%. Comparing base (eb7ceec) to head (c52b947).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/flat_storage_resharder.rs 87.78% 58 Missing and 18 partials ⚠️
chain/chain/src/resharding/manager.rs 0.00% 8 Missing ⚠️
chain/chain/src/resharding/event_type.rs 92.70% 7 Missing ⚠️
chain/chain/src/flat_storage_creator.rs 33.33% 6 Missing ⚠️
core/primitives/src/shard_layout.rs 72.22% 5 Missing ⚠️
core/store/src/flat/types.rs 33.33% 4 Missing ⚠️
chain/chain-primitives/src/error.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12164      +/-   ##
==========================================
+ Coverage   71.60%   71.68%   +0.08%     
==========================================
  Files         824      826       +2     
  Lines      165348   166265     +917     
  Branches   165348   166265     +917     
==========================================
+ Hits       118390   119181     +791     
- Misses      41830    41934     +104     
- Partials     5128     5150      +22     
Flag Coverage Δ
backward-compatibility 0.17% <0.00%> (-0.01%) ⬇️
db-migration 0.17% <0.00%> (-0.01%) ⬇️
genesis-check 1.25% <0.00%> (-0.01%) ⬇️
integration-tests 38.53% <1.94%> (-0.17%) ⬇️
linux 71.47% <86.15%> (+0.07%) ⬆️
linux-nightly 71.26% <86.15%> (+0.08%) ⬆️
macos 54.33% <86.02%> (+0.25%) ⬆️
pytests 1.57% <0.00%> (+0.04%) ⬆️
sanity-checks 1.37% <0.00%> (+0.04%) ⬆️
unittests 65.46% <86.02%> (+0.12%) ⬆️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great stuff!

I left a few comments but nothing major. I'll have another look and approve once it all settles in my head.

chain/chain/src/flat_storage_resharder.rs Outdated Show resolved Hide resolved
Comment on lines +489 to +490
// TODO(Trisfald): call resume
// flat_storage_resharder.resume(shard_uid, &status, ...)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a todo for this PR or later? I'm fine with later but I just saw that the resume method is already implemented so might as well call it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant for a second PR, I'll need to see how to link everything together

}

/// Represent the capability of scheduling the background tasks spawned by flat storage resharding.
pub trait FlatStorageResharderScheduler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just grepping for the "scheduler" keyword it seems that those are typically Senders e.g.:

        state_parts_task_scheduler: &Sender<ApplyStatePartsRequest>,
        load_memtrie_scheduler: &Sender<LoadMemtrieRequest>,
        block_catch_up_task_scheduler: &Sender<BlockCatchUpRequest>,

This API should work out of the box in TestLoop


/// Struct used to destructure a new shard layout definition into the resulting resharding event.
#[cfg_attr(test, derive(PartialEq, Eq))]
enum ReshardingEventType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand side I love the idea. On the other hand I feel like that it only makes sense if we do it everywhere. It would be strange to have it in flat storage but not in memtrie etc. I'm undecided so I'll leave it to you ;)


#[derive(Debug)]
#[cfg_attr(test, derive(PartialEq, Eq))]
struct ReshardingSplitParams {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that the boundary account is not part of it. Do you get it from elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReshardingSplitParams is the output of processing the shard layout and the latter has already the boundary. Well, conceptually it would be nice to have the boundary in ReshardingSplitParams (I started with that), but I found the boundary to be pretty useless on its own: one needs the shard layout and the account-id to get the destination shard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the split method, it takes the params and the shard layout. Would it make sense to shove the shard layout inside of the params? You can then get rid of left/ right child because they can be obtained from the shard layout.

chain/chain/src/flat_storage_resharder.rs Outdated Show resolved Hide resolved
chain/chain/src/flat_storage_resharder.rs Outdated Show resolved Hide resolved
chain/chain/src/flat_storage_resharder.rs Outdated Show resolved Hide resolved
chain/chain/src/flat_storage_resharder.rs Outdated Show resolved Hide resolved
Comment on lines +53 to +54
BorshSerialize,
BorshDeserialize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need borsh serialize and protocol schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borsh because I'm storing ShardLayout inside the flat storage resharding status and ProtocolSchema as a consequence to make devs more aware of changes in serialization

There are alternatives such as retrieving ShardLayout from epoch manager. In such case I'd need to add a dependency on EpochManager in FlatStorageResharder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, borsh always scares me but we should be good since ShardLayout is already versioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpochManager is everywhere anywya so I wouldn't mind using it.

Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly unclear about how the flow of resharding the flat storage would look like exactly, like, how are the different components like flat storage creator, resharding manager, any potential actors, etc. working? We can discuss this during the offsite, or in the resharding meeting.

Comment on lines +53 to +54
BorshSerialize,
BorshDeserialize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, borsh always scares me but we should be good since ShardLayout is already versioned

CreatingChild,
/// We apply deltas from disk until the head reaches final head.
/// Includes block hash of flat storage head.
CatchingUp(CryptoHash),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, I'm curious if we need a specific catching up phase or whether FlatStorage should automatically be able to handle this after unlocking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends what we want to do

Option 1: do catchup phase for children and their flat storage will be created at chain head height or very close
Option 2: no catchup, create children at parent flat store height. This means the first block postprocessing has to go through quite a lot of deltas

Any opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In option 2, would it happen in the main client thread? I don't think this would be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 2 is "rate limited" in implementation so should be fine if it simplifies code.
(Note: We may need to do catchup manually in case we are looking to load memtrie from flat storage; to be checked)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'm happy with either.

@@ -266,6 +270,12 @@ impl<'a> FlatStoreUpdateAdapter<'a> {
self.remove_range_by_shard_uid(shard_uid, DBCol::FlatStateDeltaMetadata);
}

pub fn remove_flat_storage(&mut self, shard_uid: ShardUId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix the nomenclature a bit, remove_all and remove_flat_storage can get confusing

&self,
parent_shard: ShardUId,
status: &SplittingParentStatus,
scheduler: &dyn FlatStorageResharderScheduler,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing seems to implement FlatStorageResharderScheduler right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for now it's used only in test

I could make this into a sender or have the actor impl FlatStorageResharderScheduler 🤔

pub fn new(
epoch_manager: Arc<dyn EpochManagerAdapter>,
runtime: Arc<dyn RuntimeAdapter>,
chain_store: &ChainStore,
flat_storage_resharder: &FlatStorageResharder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't completely understand why is flat_storage_resharder a part of FlatStorageCreator? The FlatStorageCreator is created at the time of calling new() in client. The Self::create_flat_storage_for_current_epoch is also thus called only in the beginning right? How do we later set the status of some flat storage to FlatStorageStatus::Resharding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the FlatStorageResharder is part of client.

FlatStorageCreator uses a reference to the resharder to handle the case when the node restarts and flat storage resharding was in progress. I did like this because the creator already has logic to handle resuming flat storage from the previous state.

In normal flow the resharding of flat storage would be initiated by the resharding manager, but that part is not done yet

/// * `status`: resharding status of the shard
/// * `scheduler`: component used to schedule the background tasks
/// * `controller`: manages the execution of the background tasks
pub fn resume(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remind me when is resume called and by who? Also, when is start_resharding_from_new_shard_layout called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resume will be called by flat storage creator at node restart, if necessary

start_resharding_from_new_shard_layout will be called by resharding manager


// Prepare the store object for commits and the iterator over parent's flat storage.
let flat_store = resharder.runtime.store().flat_store();
let mut iter = flat_store.iter(parent_shard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Wac means here is that we need to check the following

  • Flat head is at height which is the first block of the new epoch with the new shard layout
  • Flat head is indeed locked/frozen

Btw, I don't think the assumption where we say flat head is same as chain head is true. Flat head when locked can be behind chain head.

store_update: &mut FlatStoreUpdateAdapter,
status: &SplittingParentStatus,
) -> Result<(), std::io::Error> {
if key.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well just panic here, this should never ever ever happen.

value: FlatStateValue,
store_update: &mut FlatStoreUpdateAdapter,
account_id_parser: impl FnOnce(&[u8]) -> Result<AccountId, std::io::Error>,
) -> Result<(), std::io::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure why are we using std::io::Error instead of a more concrete type like resharding error? If we really do want to use a generic error type, we can go for anyhow::Error here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the error of parse_account_id_ functions, but I'm fine to use resharding error

/// A value of `true` means that the operation completed successfully.
completion_sender: Sender<FlatStorageReshardingTaskStatus>,
/// Corresponding receiver for `completion_sender`.
pub completion_receiver: Receiver<FlatStorageReshardingTaskStatus>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, why exactly would the FlatStorageResharderController hold the completion_receiver? Shouldn't that be sent to someone who is waiting for the call the resharding of flat storage to be finished? Someone in client?

Also, is this the patter we would like to follow? I thought the convention was to send actix messages once jobs were done? Btw, where is the job of flat storage resharding being done? As in, which actor is it a part of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was anyone interested could clone the controller or the receiver

The actor is not done yet. I planned to add that part in a second PR, maybe reusing a shared resharding actor. The actor anyway should be very a very thin layer, I hope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we reload memtrie immediately after FS is ready or delay until node restart?

@Trisfald
Copy link
Contributor Author

Trisfald commented Oct 2, 2024

I did some refactoring aside from applying suggestions.

Main change is that now ReshardingEventType is independent from flat storage and has more elegant logic to detect a shard split. Also trying to use ReshardingEventType in resharding manager.

@wacban
Copy link
Contributor

wacban commented Oct 2, 2024

I did some refactoring aside from applying suggestions.

Main change is that now ReshardingEventType is independent from flat storage and has more elegant logic to detect a shard split. Also trying to use ReshardingEventType in resharding manager.

Cool, please re-request review once it's ready for review.

#[derive(
BorshSerialize, BorshDeserialize, Clone, Debug, PartialEq, Eq, serde::Serialize, ProtocolSchema,
)]
pub struct SplittingParentStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It took me a while to figure out that this is to describe a status of a shard's flat storage rather than the status of resharding. Can you explain that in the comment please?

chain/chain/src/flat_storage_resharder.rs Show resolved Hide resolved
..
} = split_params;
info!(target: "resharding", ?split_params, "initiating flat storage shard split");
self.check_no_resharding_in_progress()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this pr
Eventually we'd want to handle the case when there is a at the end of the epoch and resharding is triggered multiple times. In that case you may want to clean up everything and start over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it could be done through by interrupting through ReshardingHandle or automatically depending on the block hash


#[derive(Debug)]
#[cfg_attr(test, derive(PartialEq, Eq))]
struct ReshardingSplitParams {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the split method, it takes the params and the shard layout. Would it make sense to shove the shard layout inside of the params? You can then get rid of left/ right child because they can be obtained from the shard layout.

Comment on lines 146 to 156
let flat_head = if let FlatStorageStatus::Ready(FlatStorageReadyStatus { flat_head }) =
store
.get_flat_storage_status(parent_shard)
.map_err(|err| Into::<StorageError>::into(err))?
{
flat_head
} else {
let err_msg = "flat storage parent shard is not ready!";
error!(target: "resharding", ?parent_shard, err_msg);
return Err(Error::ReshardingError(err_msg.to_owned()));
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you move it to a helper method and multi-line it a bit?

chain/chain/src/flat_storage_resharder.rs Show resolved Hide resolved

/// Struct used to destructure a new shard layout definition into the resulting resharding event.
#[derive(Debug)]
#[cfg_attr(test, derive(PartialEq, Eq))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the conditional derive? Are those not used outside of tests for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in tests for now :)

Comment on lines +53 to +54
BorshSerialize,
BorshDeserialize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpochManager is everywhere anywya so I wouldn't mind using it.

@Trisfald Trisfald requested a review from wacban October 3, 2024 09:39
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly just styling comments, for the rest feel free to follow up separately

Comment on lines +392 to +394
// If the flat storage is undergoing resharding it means it was previously created successfully,
// but resharding itself hasn't been finished.
// This case is a no-op because the flat storage resharder has already been created in `create_flat_storage_for_current_epoch`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The formatting seems a bit off, I use an vs code extension called "rewrap" it has some on the fly and on demand formatting (shortcut).

/// * `status`: resharding status of the shard
/// * `scheduler`: component used to schedule the background tasks
/// * `controller`: manages the execution of the background tasks
pub fn resume(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: Can you put the start method first (before resume)? It seems more natural.

Self { runtime, resharding_event }
}

/// Resumes a resharding event that was in progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... that was interrupted

It feels strange to be resuming something that is in progress.

Can you also add what can cause resharding to be halted and resumed? Is node restart / crashing the only event that may cause this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question - I think a fork may also cause interruption.

) -> Result<(), Error> {
match status {
FlatStorageReshardingStatus::CreatingChild => {
// Nothing to do here because the parent will take care of resuming work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised to see some cleanup logic here but it's also fine to do it from parent.

#[derive(
BorshSerialize, BorshDeserialize, Clone, Debug, PartialEq, Eq, serde::Serialize, ProtocolSchema,
)]
pub enum FlatStorageReshardingStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the future, the next event that we'll be adding will be merging. Can you make the names of events future proof - e.g. think what will be good names for the merging parents and children and make it so that it's consistent when we add those.

Comment on lines +29 to +30
/// Hash of the first block having the new shard layout.
pub block_hash: CryptoHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash of the first block with new shard layout is not known at the time when the flat storage resharding is triggered (unless maybe I'm missing something). My thinking was that this will be triggered together with all of the rest of resharding logic in post process of the last block of old shard layout.

}
// Parent shard is no longer part of this shard layout.
let parent_shard =
ShardUId { version: shard_layout.version(), shard_id: *parent_id as u32 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're relying here on the version not changing starting from the ShardLayoutV2. Just saying because it looked sus at first.

Comment on lines +85 to +87
// Find the boundary account between the two children.
let Some(boundary_account_index) =
shard_layout.shard_ids().position(|id| id == left_child_shard.shard_id())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move that to shard layout method.
maybe even the entire method
fine to do later

return Ok(());
}
assert_eq!(children_shard_uids.len(), 2);
let Ok(Some(ReshardingEventType::SplitShard(split_shard_event))) = resharding_event_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not handling the error case here - Err and Ok(None) are treated the same.

  • if intentional - please comment
  • if accidental - please add ? or handle it explicitly

Debug,
PartialEq,
Eq,
ProtocolSchema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocol schema is missing from the other structs - maybe good to add it there too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants